2025-05-27 10:27 PM
Hello
for some specific usage I am evaluating possibilities of using CM33 with HCLK 250MHz for maximum code execution performance.
To do so I have crafted procedure in assembler and calculated the cycles using ARM reference manual. The code is doing only some data conversation from one data buffer into another one.
Then I place this code in SRAM1, enabling I-Cache, setting RCC and I try to measure code execution time.
Let’s say execution cycles calculated by hand is about 3000.
later work I am basing on the code generated by CubeMX and inject my bare metal stuff.
1) using DWT->CYCCNT I see no difference in execution from FLASH or SRAM1 - result is about 7900
2) using TIM6 started before my func and stopped after it thus TIM->CNT has time in 4ns intervals - formal cycles. Running from SRAM -same result.
Example of the „issue”
LDR r0, =label
During step debugging for this instruction the TIM6->CNT is incremented by 6 units.
Same for
TST r0, #1
6 units of CNT change thus 6 cycles. 6x slower than it should be executed.
My question is:
Is the STM32H5 value line prepared in the way that peripherals can but cpu core ca not work with 250MHz HCLK?
Or I am not initializing this CPU properly so with HCLK 250MHz I am achieving similar code performance as with G0 at 48MHz?
I feel I am missing something…