2009-03-10 05:57 AM
Load multiple instruction timing
2011-05-17 04:05 AM
Hello!
I have noted some differences between cortex-m3 theretical performnces and stm32 true performnces when executing load instructions... The appliation note abaut the stm32 DMA you can find at this link:http://www.st.com/stonline/products/literature/anp/13529.htm
tells that the CPU have to pay an AHB cycle when exsecuting load instruction due to bus arbitration whit the DMA, so it seems that every load instruction takes a minimum of three CPU cycles: 1 AHB cycle (arbitration) + 2 CPU cycles (execution) The contex-m3 can execute a load multiple instructions in (nReg + 1) cycles, where nReg is the number of registers loaded. In the STM32, due to bus arbitration, this value is (2 x nReg +1) cycles instead. Can anyone confirm my idea?2011-05-17 04:05 AM
Hi gioacchino03,
Quote:
.... so it seems that every load instruction takes a minimum of three CPU cycles: 1 AHB cycle (arbitration) + 2 CPU cycles (execution) The contex-m3 can execute a load multiple instructions in (nReg + 1) cycles, where nReg is the number of registers loaded. In the STM32, due to bus arbitration, this value is (2 x nReg +1) cycles instead. Could you let me know how you did the calculation ? I suggest you To check it on real STM32 silicon, you can run a small piece of code in assembly having some Multiple loads and then share with us the timing cycles. Cheers, STOne-32.2011-05-17 04:05 AM
Hello,
tanks for your interest... My previous post was based on a simulation whit Keil uVision. Now I have done some test on real hardware: My settings are: * external crystal 4MHz * PllMul = 16 * Pre-fetch buffer whit 2 wait-states * AHBPRE = 1 This leads: * core cycle time = (1/(4Mhz * 16)) = 15.6ns STM32 taken about 11us to execute 100 instructions like this: LDMIA SP,{R2,R3,R4,R5,R6,R7} so each LDM instruction takes (11us/100) = 110ns = 7 x 15.6ns ! Conclusion: the minimum (DMA disabled) number of cycles to transfer n word from memory to registers is n+1 ! :D2011-05-17 04:05 AM
Hi,
Excellent job ! So, You confirmed that STM32 is behaving exactly as stated in the theory of the instruction timing as defined by ARM Cortex-M3 core :) even the flash is with 2 wait states and CPU running at 64Mhz , let's try now when the DMA is enabled.. Cheers, STOne-32.