2025-05-27 10:27 PM
Hello
for some specific usage I am evaluating possibilities of using CM33 with HCLK 250MHz for maximum code execution performance.
To do so I have crafted procedure in assembler and calculated the cycles using ARM reference manual. The code is doing only some data conversation from one data buffer into another one.
Then I place this code in SRAM1, enabling I-Cache, setting RCC and I try to measure code execution time.
Let’s say execution cycles calculated by hand is about 3000.
later work I am basing on the code generated by CubeMX and inject my bare metal stuff.
1) using DWT->CYCCNT I see no difference in execution from FLASH or SRAM1 - result is about 7900
2) using TIM6 started before my func and stopped after it thus TIM->CNT has time in 4ns intervals - formal cycles. Running from SRAM -same result.
Example of the „issue”
LDR r0, =label
During step debugging for this instruction the TIM6->CNT is incremented by 6 units.
Same for
TST r0, #1
6 units of CNT change thus 6 cycles. 6x slower than it should be executed.
My question is:
Is the STM32H5 value line prepared in the way that peripherals can but cpu core ca not work with 250MHz HCLK?
Or I am not initializing this CPU properly so with HCLK 250MHz I am achieving similar code performance as with G0 at 48MHz?
I feel I am missing something…
2025-06-02 5:34 AM
>Also does this matter if all my test code is written in pure assembler?
You know, how these cpus working ? (and what the optimizer doing ?)
So maybe : the fastest code is generated by a compiler with optimizer, not by writing in asm.
Just try it ! And set optimizer -O2 or so...
btw
H5 is an M33 core, so read :
-> so just think: H563 description -> 250 MHz, 375 MIPS ...? Obviously it can execute more than 1 instruction per cycle, what the optimizer doing : arranges the code to be able to do this.
Only if you have the same "skills" , then your asm will get same speed.