STM32H503 code execution performance issue

ArkadiuszRaj · ‎2025-05-27

Hello

for some specific usage I am evaluating possibilities of using CM33 with HCLK 250MHz for maximum code execution performance.

To do so I have crafted procedure in assembler and calculated the cycles using ARM reference manual. The code is doing only some data conversation from one data buffer into another one.

Then I place this code in SRAM1, enabling I-Cache, setting RCC and I try to measure code execution time.

Let’s say execution cycles calculated by hand is about 3000.

later work I am basing on the code generated by CubeMX and inject my bare metal stuff.

1) using DWT->CYCCNT I see no difference in execution from FLASH or SRAM1 - result is about 7900
2) using TIM6 started before my func and stopped after it thus TIM->CNT has time in 4ns intervals - formal cycles. Running from SRAM -same result.

Example of the „issue”

LDR r0, =label

During step debugging for this instruction the TIM6->CNT is incremented by 6 units.

Same for

TST r0, #1

6 units of CNT change thus 6 cycles. 6x slower than it should be executed.

My question is:

Is the STM32H5 value line prepared in the way that peripherals can but cpu core ca not work with 250MHz HCLK?

Or I am not initializing this CPU properly so with HCLK 250MHz I am achieving similar code performance as with G0 at 48MHz?

I feel I am missing something…

kadir1 · ‎2025-05-29

I couldn't even get this stm32h503 chip to work. After 10 minutes of working, the reset pin goes to low and it doesn't work anymore. We think this product is completely faulty. Were you able to get it to work? Which product are you using exactly?

AScha.3 · ‎2025-05-29

What optimizer setting you have ? (without ...no fun. :) )

+

You cannot debug ONE instruction , to see its timing.

btw

I tried on a H563 , at 250M core clock, and it works at 4ns cycles - as it should.

I checked just by writing to a pin in a loop ...and check with scope.

If you feel a post has answered your question, please click "Accept as Solution".

ArkadiuszRaj · ‎2025-05-29

Typical STM32CubeIDE settings.
I tried to run my code from flash and RAM (albeit I have not managed to put only that function there using RamFunc - trampoline code added by linker did not work, thumb bit not set this hardfault all the time. But when I have put all my code in the RAM - that worked)

yes i learned that I can not debug single instruction. When I plugged toggling of the GPIO and hooked up the scope - measurement was close to the theoretical calculations.

the same result I have when I placed the measurement (dwt or tim) to the table that I monitor as live view. And run the code without any breakpoints.

AScha.3 · ‎2025-05-29

>Typical STM32CubeIDE settings.

What are these "Typical" settings, regarding the optimizer setting ?

If you feel a post has answered your question, please click "Accept as Solution".

Tesla DeLorean · ‎2025-05-29

>>We think this product is completely faulty.

I'd wager it's a problem on your end

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

kadir1 · ‎2025-05-29

If you know of a problem that will cause the product to burn and break, I am open to any help.

Tesla DeLorean · ‎2025-05-29

Does it fail using just the connections shown?

Anything connected to the other 30 pins?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

AScha.3 · ‎2025-05-29

Double post...

Please don't post here, keep to your threads.

https://community.st.com/t5/stm32-mcus-products/stm32h503cbt6-nrst-and-vcap-behavior-severe-stability-issue/td-p/807820

If you feel a post has answered your question, please click "Accept as Solution".

ArkadiuszRaj · ‎2025-06-02

Typical means I have not changed anything.
Also does this matter if all my test code is written in pure assembler?

I have already learned that I can not use debugging to measure single instruction execution time.
With ICache I am not seeing any noticable difference between running from FLASH or SRAM.

Something for new topic is this issue with trampoline code generated by linker. As I could not manage to put single function in RAM.

To sum up current status.
DWT CYCCNT or TIM CNT gives function execution time very similar to the one I calculated by hand, taking into account time needed by debug mode compilation of :

TIM6->CNT=0;
my_func();
auto dt= TIM6->CNT;