2025-08-18 10:19 AM
Dear all,
I am facing very poor performance with the STM32N657.
I have some benchmarks that manipulate arrays of data in different ways.
I ran these benchmarks on
Nucleo_H753 @ 480 MHz, with caches ON
Nucleo_N657 @ 600 MHz, with caches ON
For the STM32N657 I measured the CPU and AXI clocks via MCO2:
fCPU = 600 MHz
fAXI = 400 MHz
The compiler used is gcc-15.2.0
Bench for the Nucleo_H753
-------------------------
uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 17 [us]
X projection t = 41 [us]
Y projection t = 18 [us]
Histogram t = 30 [us]
Bench 01: Fill a small 2D array (200 x 200) elements
in the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 171 [us]
X projection t = 672 [us]
Y projection t = 288 [us]
Histogram t = 451 [us]
Bench 02: Fill a small 1D array (1000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 1000 [-]
Min / Max t = 1110 [us]
Bench 03: Fill a big 1D array (50000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 100 [-]
Min / Max t = 107 [us]
Bench 04: Compute the integer atan2 using the CORDIC
algorithm
Number of tests n = 1000 [-]
1000 x atan2(y, x) t = 1088 [us]
Bench for the Nucleo_N657
-------------------------
uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 29 [us]
X projection t = 173 [us]
Y projection t = 167 [us]
Histogram t = 330 [us]
Bench 01: Fill a small 2D array (200 x 200) elements
in the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 400 [us]
X projection t = 2766 [us]
Y projection t = 2640 [us]
Histogram t = 5369 [us]
Bench 02: Fill a small 1D array (1000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 1000 [-]
Min / Max t = 3127 [us]
Bench 03: Fill a big 1D array (50000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 100 [-]
Min / Max t = 323 [us]
Bench 04: Compute the integer atan2 using the CORDIC
algorithm
Number of tests n = 1000 [-]
1000 x atan2(y, x) t = 2726 [us]
As shown, the N6 performance is not acceptable!
The clock measurements on MCO2 reflect the PLL values, but maybe some other elements
(not clearly identified) are influencing the code execution.
Any clue to get more decent results for the N6?
Kind regards,
Edo
2025-08-18 11:03 AM
Hi,
what optimizer setting you had ?
Try -O2 , compile, check then again.
2025-08-18 11:21 AM
Hi AScha.3,
Thank you for the suggestion.
Both target use the same gcc setting and the optimisation is -Os. Here is for the N6 the results for -O2 and for -O3. Even with these optimisation wa are very far from the -Os of the H753. More probably something is not good with the hardware, but I can measure only the PLL clocks!
Here the new results:
-O2
---
uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 24 [us]
X projection t = 169 [us]
Y projection t = 172 [us]
Histogram t = 331 [us]
Bench 01: Fill a small 2D array (200 x 200) elements
in the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 334 [us]
X projection t = 2752 [us]
Y projection t = 2632 [us]
Histogram t = 5349 [us]
Bench 02: Fill a small 1D array (1000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 1000 [-]
Min / Max t = 2963 [us]
Bench 03: Fill a big 1D array (50000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 100 [-]
Min / Max t = 307 [us]
Bench 04: Compute the integer atan2 using the CORDIC
algorithm
Number of tests n = 1000 [-]
1000 x atan2(y, x) t = 2747 [us]
-O3
---
uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 24 [us]
X projection t = 169 [us]
Y projection t = 171 [us]
Histogram t = 330 [us]
Bench 01: Fill a small 2D array (200 x 200) elements
in the internal memory. Then, compute the
X-Y projections and the histogram.
Fill the array t = 334 [us]
X projection t = 2752 [us]
Y projection t = 2633 [us]
Histogram t = 5348 [us]
Bench 02: Fill a small 1D array (1000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 1000 [-]
Min / Max t = 2962 [us]
Bench 03: Fill a big 1D array (50000) elements in
the internal memory with a random pattern.
Then, compute the min / max values.
Number of tests n = 100 [-]
Min / Max t = 292 [us]
Bench 04: Compute the integer atan2 using the CORDIC
algorithm
Number of tests n = 1000 [-]
1000 x atan2(y, x) t = 2862 [us]