cancel
Showing results for 
Search instead for 
Did you mean: 

Performance of CM4 core of STM32H745 vs the CM4 of STM32G474

marcoaccame
Associate II

Hi all,

when running the same piece of code (math operations with random access to RAM) on the CM4 of the STM32G474 we measure smaller execution times than on the CM4 core of the STM32H745.

The STM32G474 is clocked at 168 MHz and uses the SRAM mapped at 0x20000000.

The CM4 of the STM32H745 is clocked at 200 MHz (and its CM7 at 400 MHz) and uses AHB SRAM1 mapped at address 0x30000000.

 

How is it possible that the slower clocked CM4 core of the STM32G474 is more performant? Any explanations?

 

Thanks,

 

Marco Accame, Ph.D.

iCub Tech Facility, Istituto Italiano di Tecnologia

CRIS, via S.Quirico 19D, 16163 Genoa Italy
e-mail:
 marco.accame@iit.it  

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Hi STOne-32,

I re-run the tests with the load regions as you suggested, see following scatter file, 

marcoaccame_0-1721036406280.png

and things runs better. It is the right path to follow.

 

Now one test runs the same on both G4 and H7. The other two tests runs slower on H7 but the extra time has dropped to 1.5% and 5% which can be acceptable. 

I have updated our repository with your contribution: https://github.com/icub-tech-iit/study-cm4-performances/blob/master/docs/improvements.md

 

Thanks, Marco.

 

View solution in original post

21 REPLIES 21
SofLit
ST Employee

Hello,

I have some questions:

1- are you sure you have similar code optimization for G4 as for H7?

2- Are you sure you are not using math library instead of FPU for CM4/H7?

3- When you said "The CM4 of the STM32H745 is clocked at 200 MHz (and its CM7 at 400 MHz) and uses AHB SRAM1 mapped at address 0x30000000." Do you mean you're executing the code of CM4 from SRAM1? if yes I recommend you to use the address  0x1000 0000 for code execution.

4- Did you enable the ART for CM4/H7?

SofLit_0-1718985983386.png

SofLit_1-1718986053013.png

See stm32h7xx_hal.c:

 

HAL_StatusTypeDef HAL_Init(void)
{

uint32_t common_system_clock;

#if defined(DUAL_CORE) && defined(CORE_CM4)
   /* Configure Cortex-M4 Instruction cache through ART accelerator */
   __HAL_RCC_ART_CLK_ENABLE();                   /* Enable the Cortex-M4 ART Clock */
   __HAL_ART_CONFIG_BASE_ADDRESS(0x08100000UL);  /* Configure the Cortex-M4 ART Base address to the Flash Bank 2 : */
   __HAL_ART_ENABLE();                           /* Enable the Cortex-M4 ART */
#endif /* DUAL_CORE &&  CORE_CM4 */

 

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
STOne-32
ST Employee

Dear @marcoaccame ,

 

we assume that code execution from both is running out of flash ? And how it is mapped ? 
You can try a test using only RAM areas for Code and Data on two different bus and SRAMs.

https://www.st.com/resource/en/application_note/an5557-stm32h745755-and-stm32h747757-lines-dualcore-architecture-stmicroelectronics.pdf

Cheers,

STOne-32

Hi SofLit,

thanks for your suggestions.

We have already tried all what you suggest and performance still stay in favor of the G4.

 

In here is a full explanation of what we did related to your suggestions..

 


> 1- are you sure you have similar code optimization for G4 as for H7?
> 2- Are you sure you are not using math library instead of FPU for CM4/H7?

We have used the armclang v6.19 w/ equal optimization modes (now we have -Obalanced) and we has enabled the FPU as from following pictures.

 

 

 

marcoaccame_1-1719320752643.png

marcoaccame_2-1719320781492.png

FPU option for the CM4 of the STM32H745 and for the STM32G474 (below)

 

marcoaccame_4-1719322448200.png

 

marcoaccame_6-1719322473622.png

 

Optimization option for the CM4 of the STM32H745 and for the STM32G474 (below)

 

 

> 3- When you said "The CM4 of the STM32H745 is clocked at 200 MHz (and its CM7 at 400 MHz) and uses AHB SRAM1 mapped at address 0x30000000." Do you mean you're executing the code of CM4 from SRAM1? if yes I recommend you to use the address 0x1000 0000 for code execution.

For the STM32H745-CM4 we execute code from FLASH and use AHB SRAM1 for data (that is mapped at address 0x3000 0000 and aliased at 0x1000 0000). I believe that using 0x3000 0000 or 0x3000 0000 is the same as they are aliases of the same bank.

marcoaccame_7-1719322514811.png

 

marcoaccame_8-1719322535069.png

 

Scatter file for the CM4 of the STM32H745 and for the STM32G474 (below)

We have also tested execution of code from SRAM as from the following STM example but there was no improvement: https://github.com/STMicroelectronics/STM32CubeH7/tree/master/Projects/STM32H745I-DISCO/Templates/BootCM7_CM4Gated_RAM


> 4- Did you enable the ART for CM4/H7?

Yes, the code in our `HAL_Init()` is exactly the same as yours.

 

 

 

Hi StOne-32,

thanks for your suggestion.

 

We run code from FLASH w/ scatter files in previous post.

 

But we also tried to run it from SRAM as suggested by ST in this template example (https://github.com/STMicroelectronics/STM32CubeH7/tree/master/Projects/STM32H745I-DISCO/Templates/BootCM7_CM4Gated_RAM) and nothing changes.

 

 

 

 

SofLit
ST Employee

Hello @marcoaccame and thank you for the sharing.

At this stage I don't see any issue with the information you shared. Meanwhile, would you mind if you share a minimal projects (H7 and G4) that reproduce the behavior (not your complete project)? also tell us what boards are you using?

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.

Hi SofLit,

sure we can share a comparative test.

It will however take some time because we now have the code running on custom boards that we developed for motor control on our humanoid robots and that we cannot share.

 

So, we need to adapt the code to run on the two closest ST development boards with the same MPUs:

https://www.st.com/en/evaluation-tools/stm32h745i-disco.html

https://www.st.com/en/evaluation-tools/stm32g474e-eval.html

 

I will come back next week to tell when we can allocate time for this activity and when we can share the comparative test.

 

 

 

Uwe Bonnes
Principal III

Are the CM4 core  revisions the same?

Hi Uwe,

I will ask our HW design team.

I checked stm32h745xx.h and stm32g474xx.h and bothe have:

#define __CM4_REV 0x0001 /*!< Cortex-M4 revision r0p1 */

So no change in core design.