cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H745 Core Clock Rate

ACohe.3
Associate II

Hey,
I am developing a project in which the M7 core has CAN and USART communication which constantly communicates with components, the M4 core has Ethernet communication.

I was interested in checking what the rate of the main loop is, so I defined two variables of type double that perform the following operation in every main loop:

ACohe3_0-1704635614578.png

I display b7 in STM32CubeMonitor, and that's how I see how long it takes for 100K main loops, or how many main loops are performed per second.

Clock Configuration:

ACohe3_6-1704638516521.png


I did the test for the Core-M7 and the Core-M4, and these are the results:

Core-M7:

About 1.666M loops per second.

ACohe3_1-1704637111668.png

 

Core-M4:

About 60K loops per second.

ACohe3_2-1704637179474.png

 

The results are so far from each other (I expected a 2x ratio between them).
I started from the assumption that the Ethernet communication is the one that causes the slowing down, therefore I lowered the Ethernet communication and still got almost identical results.

I opened a new project (No code except the test) and ran the same test for each core, here are the results:

Core-M7:

About 500K loops per second, 3x slower at loop rate!!

ACohe3_3-1704637796221.png

Core-M4:

About 75K loops per second.

ACohe3_5-1704638079918.png

 

My questions:
1. The results don't make sense, the clock rate of M7 is 2 times the clock rate of M4, so why aren't the results about 2 times? Am I doing something wrong?

2. How can a code with two active communications (in M7) be 3 times faster than an almost empty code (in M7)?

3. How can it be that M4 is so slow (compared to M7)?

 

Thanks to the helpers!!

 

1 ACCEPTED SOLUTION

Accepted Solutions
TDK
Guru

I suspect there are two main reasons:

  1. The M7 core is much more complicated and has a lot more internal optimization/caching happening. It is not a simple "X operation always takes Y cycles" processor like the M4 core (typically) is.
  2. You're dealing with doubles and calling a fairly complex (sin) function. The FPU on the M4 core is single precision and does not support doubles. It's unclear what sort of code it's generating here. Maybe it's using the float version, or maybe it's emulating doubles in software, which slows things down quite a bit.
If you feel a post has answered your question, please click "Accept as Solution".

View solution in original post

3 REPLIES 3
TDK
Guru

I suspect there are two main reasons:

  1. The M7 core is much more complicated and has a lot more internal optimization/caching happening. It is not a simple "X operation always takes Y cycles" processor like the M4 core (typically) is.
  2. You're dealing with doubles and calling a fairly complex (sin) function. The FPU on the M4 core is single precision and does not support doubles. It's unclear what sort of code it's generating here. Maybe it's using the float version, or maybe it's emulating doubles in software, which slows things down quite a bit.
If you feel a post has answered your question, please click "Accept as Solution".

Thank you very much, you helped me!

If I have a lot of use of sin cos tan in my code, is there a way to optimize the operations? Maybe a library that supports this or some setting?

Not really. Use data types which are natively supported by the FPU. This means using floats on the CM4. There's no magic "make it faster" checkbox to hit here.

If you feel a post has answered your question, please click "Accept as Solution".