cancel
Showing results for 
Search instead for 
Did you mean: 

G4 vs H5 speed

JimJW
Associate II

Hi,
We are running a crypto key generation library on both an STM32G474 (170MHz) and an STM32H563 (240MHz).
Over about 80 runs, the average generation time is around 12 seconds for the H5 and 38 seconds for the G4.

These times are fine for our application (although faster would be better) but I am curious why the G4 is over 3 times slower as I would have expected something more like 40% longer generation times.

The arithmetic part of the library uses some assembler functions for long multiplications and, for both processors, these seem to have picked a section for "Armv6-M (or later) with DSP Instruction Set Extensions."

The compiler flags being used are listed here. The repeated sections are because I still haven't really figured out how cmake is generating them but I believe the relevant differences are:

-mcpu= cortex-m4 or cortex-m33
-mfpu= fpv4-sp-d16 or fpv5-sp-d16

compile C with /usr/bin/arm-none-eabi-gcc

__VERSION__ "14.2.1 20241119"

G4 Flags:
C_FLAGS = -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -O3 -DNDEBUG -std=c99 -Wall -Wextra -Wwrite-strings -Wmissing-prototypes -Wformat=2 -Wno-format-nonliteral -Wvla -Wlogical-op -Wshadow -Wformat-signedness -Wformat-overflow=2 -Wformat-truncation -O2 -Wmissing-declarations


H5 Flags:
C_FLAGS = -mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -mcpu=cortex-m33 -mfloat-abi=hard -mfpu=fpv5-sp-d16 -mthumb -Wextra -Wpedantic -fdata-sections -ffunction-sections -O3 -g0 -O3 -DNDEBUG -std=c99 -Wall -Wextra -Wwrite-strings -Wmissing-prototypes -Wformat=2 -Wno-format-nonliteral -Wvla -Wlogical-op -Wshadow -Wformat-signedness -Wformat-overflow=2 -Wformat-truncation -O2 -Wmissing-declarations

Any thoughts would be welcome.
Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Try to execute the critical code in CCM RAM:

mALLEm_0-1763459987656.png

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.

View solution in original post

7 REPLIES 7
mƎALLEm
ST Employee

Hello @JimJW and welcome to the ST community,

Did you enable the cache for H5?

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
JimJW
Associate II

Hi

Yes, we have enabled the ICACHE in the H5 startup.
Thanks

I noticed that I've misunderstood your question. I thought H5 is 3 times slower than G4!

So, you are comparing two incomparable situations:

STM32G474 is running at 170MHz while STM32H563 is running at 240MHz.

STM32H5 has cache, G4 not.

Better to test with a comparable settings: same system frequency, disable the cache ..

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
JimJW
Associate II

Hi,
Yes, we expect the G4 to be slower than the H5 but the difference is a bit more than I expected. We can use both as they are but I was just curious to explain a factor of 3 difference.
The datasheet quotes DMIPS values of 213 and 375 (which I assume factors in the clock speed and any other relevant characteristics) so I was expecting the G4 to be about 1.7 slower.
As I say, it is not really a problem but if we are missing some setting that could make it faster, we would happily take it!

Thanks

ART Accelerator enabled on G4?

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.

Thanks,
The ART accelerator was off and while looking at that, we found that our G4 is a category 3 and does have an ICACHE which should be on by default, but we had managed to turn off!

After enabling those (setting PRFTEN and ICEN in the FLASH_ACR), our average key generation time is now down from 38.5 to 34 seconds - a useful improvement.

The G4 is now 2.8 times slower than the H5 - perfectly OK for us, but if there are any other improvements possible, we'll certainly take them:)

Thanks

Try to execute the critical code in CCM RAM:

mALLEm_0-1763459987656.png

 

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.