Skip to main content
xchen.3
Associate II
September 26, 2022
Question

STM32H7 why is Dcache faster than DTCMRAM ?

  • September 26, 2022
  • 5 replies
  • 4103 views

I am benchmarking Stm32h7 processor on floating point operations and using CYCCNT to measure the performance difference .

I placed my function in ITCM-RAM. which is just a function that does double math multiplication 10,000 times, the 4 variables used are declared volatile.

I placed my variables in RAM ( with D-cache disable) : CYCCNT = 216849

I placed my variables in RAM ( with D-cache enable) : CYCCNT = 104300

I placed my variables in DTCM-RAM ( with D-cache disabled/enabled same result) : CYCCNT = 115800

my question is why is DTCM slower than RAM1 with cache ??!!

This topic has been closed for replies.

5 replies

Tesla DeLorean
Guru
September 26, 2022

Caching paths tend to be able to service within the current machine cycle, whereas other memory needs at least a cycle.

Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..
xchen.3
xchen.3Author
Associate II
September 27, 2022

oh then I got this wrong , the way i understand from reading the manuals is that TCM memory run at same speed at processor without any wait state or latency .

Another thing that would also contradict what you say is that AN4891 that test the performance of this processor in various memory configurations . USEs ITCM and DTCM as the benchmark for other tests

Tesla DeLorean
Guru
September 27, 2022

Further TCM isn't cached, so it doesn't pollute / waste it,​ but also doesn't benefit from the optimized read on a hit.

Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..
waclawek.jan
Super User
September 27, 2022

Where's the stack?

JW

xchen.3
xchen.3Author
Associate II
September 27, 2022

if this is " .user_heap_stack " (in memory .ld file) what you are referring to. then it is currently in RAM_D1 . I will change to DTCMRAM and check .

but here is a question (i am very new to this). what should i do in main to let program understand and use DTCM-ram for heap and stack . What i do for variables . I use attribute . and the forst function in main is memcopy(&sdata,&sidata,&edata-&sdata) . Is there a similar thing to do for heap and stack ?

xchen.3
xchen.3Author
Associate II
September 27, 2022

after changing the heap and stack . aslo .bss & .data sections , now in build analyzer , memory details it shows DTCM-RAM contains everything . RAM1 , RAM2 RAM 3 . zero k byte used and nothing inside . Also Same result regarding calculation time .

mƎALLEm
ST Technical Moderator
September 27, 2022

Dear @xchen.3​ 

I invite you to refer to the AN4891 "STM32H72x, STM32H73x, and single-core STM32H74x/75x system architecture and performance". Some benchmarks are provided.

SofLit

To give better visibility on the answered topics, please click "Best answer" on the reply which solved your issue or answered your question.
xchen.3
xchen.3Author
Associate II
September 27, 2022

thank you ,I have started there and i am trying to duplicate similar results on my software. In order to start using these advanced features, however in this app note the projects are done by system workbench IDE not CubeIDE. So the startup code and generated functions are different . Also CubeMx has no options and there is little information on how to apply the required correctly (placing Critical code in ITCM and Relevant Data in DTCM memory) . for example how to modify linker script and what should change in code like moving vector tables , etc...

waclawek.jan
Super User
September 28, 2022

Try reloading the thread, until you see "More answers" under the 10th post.

JW

Piranha
Principal III
September 28, 2022

Guys, you're talking with a bot... ;)

https://stackoverflow.com/questions/59198934/l1-cache-behaviour-of-stm32h7

https://community.st.com/s/question/0D50X0000BmnJAWSQ2/setup-licache-on-stm32h753

On topic... Just a guess, but, looking at the core architecture images, it could be that ITCM and DTCM share some resources or arbitration logic:

http://www.emcu.it/STM32F7/Slide/Core.png