2022-09-26 03:43 AM
I am benchmarking Stm32h7 processor on floating point operations and using CYCCNT to measure the performance difference .
I placed my function in ITCM-RAM. which is just a function that does double math multiplication 10,000 times, the 4 variables used are declared volatile.
I placed my variables in RAM ( with D-cache disable) : CYCCNT = 216849
I placed my variables in RAM ( with D-cache enable) : CYCCNT = 104300
I placed my variables in DTCM-RAM ( with D-cache disabled/enabled same result) : CYCCNT = 115800
my question is why is DTCM slower than RAM1 with cache ??!!
2022-09-26 07:14 AM
Caching paths tend to be able to service within the current machine cycle, whereas other memory needs at least a cycle.
2022-09-26 11:45 PM
oh then I got this wrong , the way i understand from reading the manuals is that TCM memory run at same speed at processor without any wait state or latency .
Another thing that would also contradict what you say is that AN4891 that test the performance of this processor in various memory configurations . USEs ITCM and DTCM as the benchmark for other tests
2022-09-26 11:56 PM
Where's the stack?
JW
2022-09-27 01:00 AM
Zero wait state infers that the memory is single cycle.
I'm saying cache paths are sub-cycle.
TCM is going to have best predictable speed, and is deeper.
2022-09-27 01:12 AM
Further TCM isn't cached, so it doesn't pollute / waste it, but also doesn't benefit from the optimized read on a hit.
2022-09-27 01:46 AM
if this is " .user_heap_stack " (in memory .ld file) what you are referring to. then it is currently in RAM_D1 . I will change to DTCMRAM and check .
but here is a question (i am very new to this). what should i do in main to let program understand and use DTCM-ram for heap and stack . What i do for variables . I use attribute . and the forst function in main is memcopy(&sdata,&sidata,&edata-&sdata) . Is there a similar thing to do for heap and stack ?
2022-09-27 02:47 AM
Dear @xchen.3
I invite you to refer to the AN4891 "STM32H72x, STM32H73x, and single-core STM32H74x/75x system architecture and performance". Some benchmarks are provided.
SofLit
2022-09-27 04:48 AM
thank you ,I have started there and i am trying to duplicate similar results on my software. In order to start using these advanced features, however in this app note the projects are done by system workbench IDE not CubeIDE. So the startup code and generated functions are different . Also CubeMx has no options and there is little information on how to apply the required correctly (placing Critical code in ITCM and Relevant Data in DTCM memory) . for example how to modify linker script and what should change in code like moving vector tables , etc...
2022-09-27 06:23 AM
after changing the heap and stack . aslo .bss & .data sections , now in build analyzer , memory details it shows DTCM-RAM contains everything . RAM1 , RAM2 RAM 3 . zero k byte used and nothing inside . Also Same result regarding calculation time .