cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H503 code execution performance issue

ArkadiuszRaj
Associate II

Hello

for some specific usage I am evaluating possibilities of using CM33 with HCLK 250MHz for maximum code execution performance. 

To do so I have crafted procedure in assembler and calculated the cycles using ARM reference manual. The code is doing only some data conversation from one data buffer into another one. 

Then I place this code in SRAM1, enabling I-Cache, setting RCC and I try to measure code execution time. 

Let’s say execution cycles calculated by hand is about 3000. 

later work I am basing on the code generated by CubeMX and inject my bare metal stuff. 

1) using DWT->CYCCNT I see no difference in execution from FLASH or SRAM1 - result is about 7900
2) using TIM6 started before my func and stopped after it thus TIM->CNT has time in 4ns intervals - formal cycles. Running from SRAM -same result. 

 

Example of the „issue” 

LDR r0, =label

 

During step debugging for this instruction the TIM6->CNT is incremented by 6 units. 

 

Same for 

TST r0, #1

6 units of CNT change thus 6 cycles. 6x slower than it should be executed. 

My question is: 

Is the STM32H5 value line prepared in the way that peripherals can but cpu core ca not work with 250MHz HCLK?

Or I am not initializing this CPU properly so with HCLK 250MHz I am achieving similar code performance as with G0 at 48MHz?

I feel I am missing something…

10 REPLIES 10

>Also does this matter if all my test code is written in pure assembler? 

You know, how these cpus working ? (and what the optimizer doing ?)

So maybe : the fastest code is generated by a compiler with optimizer, not by writing in asm.

Just try it ! And set optimizer -O2 or so...

btw

H5 is an M33 core, so read :

https://developer.arm.com/documentation#numberOfResults=48&q=cortex%20m33%20technical%20reference%20manual

 

https://developer.arm.com/documentation/100230/0100/Introduction/About-the-processor-architecture?lang=en

 

-> so just think: H563 description -> 250 MHz, 375 MIPS ...? Obviously it can execute more than 1 instruction per cycle, what the optimizer doing : arranges the code to be able to do this. 

Only if you have the same "skills" , then your asm will get same speed.

If you feel a post has answered your question, please click "Accept as Solution".