2024-06-21 04:59 AM - edited 2024-06-21 05:06 AM
Hi all,
when running the same piece of code (math operations with random access to RAM) on the CM4 of the STM32G474 we measure smaller execution times than on the CM4 core of the STM32H745.
The STM32G474 is clocked at 168 MHz and uses the SRAM mapped at 0x20000000.
The CM4 of the STM32H745 is clocked at 200 MHz (and its CM7 at 400 MHz) and uses AHB SRAM1 mapped at address 0x30000000.
How is it possible that the slower clocked CM4 core of the STM32G474 is more performant? Any explanations?
Thanks,
Marco Accame, Ph.D.
iCub Tech Facility, Istituto Italiano di Tecnologia
CRIS, via S.Quirico 19D, 16163 Genoa Italy
e-mail: marco.accame@iit.it
Solved! Go to Solution.
2024-07-16 06:14 AM
Hi @marcoaccame ,
Simply because in case of the Cortex-M4 the region starting from 0x00000000 to 0x20000000 is optimized for code execution and the region starting from 0x20000000 to 0x40000000 is optimized for data.
That's why in STM32F4 product we can remap some memories to the region 0x00000000 to enhance the execution performance.
2024-07-16 09:21 AM - edited 2024-07-16 01:40 PM
Just to show you the impact of a non-optimized memory location (instruction and data) for STM32G4:
BubbleSort algorithm:
FPU algorithm inspired from your tests:
If SRAM1, which is not optimized for instruction fetch, is used for code and CCMRAM, which isnot optimized for data fetch, is used for data, the performance is notably decreasing.