cancel
Showing results for 
Search instead for 
Did you mean: 

Very bad performances on the stm32N657

Franzi.Edo
Senior

Dear all,
I am facing very poor performance with the STM32N657.
I have some benchmarks that manipulate arrays of data in different ways.

I ran these benchmarks on
Nucleo_H753 @ 480 MHz, with caches ON
Nucleo_N657 @ 600 MHz, with caches ON

For the STM32N657 I measured the CPU and AXI clocks via MCO2:
fCPU = 600 MHz
fAXI = 400 MHz

The compiler used is gcc-15.2.0

Bench for the Nucleo_H753
-------------------------

uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
          the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =     17 [us]
          X projection                                 t =     41 [us]
          Y projection                                 t =     18 [us]
          Histogram                                    t =     30 [us]

Bench 01: Fill a small 2D array (200 x 200) elements
          in the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =    171 [us]
          X projection                                 t =    672 [us]
          Y projection                                 t =    288 [us]
          Histogram                                    t =    451 [us]

Bench 02: Fill a small 1D array (1000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =   1000 [-]
          Min / Max                                    t =   1110 [us]

Bench 03: Fill a big 1D array (50000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =    100 [-]
          Min / Max                                    t =    107 [us]

Bench 04: Compute the integer atan2 using the CORDIC
          algorithm
          Number of tests                              n =   1000 [-]
          1000 x atan2(y, x)                           t =   1088 [us]


Bench for the Nucleo_N657
-------------------------

uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
          the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =     29 [us]
          X projection                                 t =    173 [us]
          Y projection                                 t =    167 [us]
          Histogram                                    t =    330 [us]

Bench 01: Fill a small 2D array (200 x 200) elements
          in the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =    400 [us]
          X projection                                 t =   2766 [us]
          Y projection                                 t =   2640 [us]
          Histogram                                    t =   5369 [us]

Bench 02: Fill a small 1D array (1000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =   1000 [-]
          Min / Max                                    t =   3127 [us]

Bench 03: Fill a big 1D array (50000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =    100 [-]
          Min / Max                                    t =    323 [us]

Bench 04: Compute the integer atan2 using the CORDIC
          algorithm
          Number of tests                              n =   1000 [-]
          1000 x atan2(y, x)                           t =   2726 [us]


As shown, the N6 performance is not acceptable!
The clock measurements on MCO2 reflect the PLL values, but maybe some other elements
(not clearly identified) are influencing the code execution.

Any clue to get more decent results for the N6?
Kind regards,
Edo

 

2 REPLIES 2
AScha.3
Super User

Hi,

what optimizer setting you had ?

Try -O2 , compile, check then again.

If you feel a post has answered your question, please click "Accept as Solution".

Hi AScha.3,

Thank you for the suggestion.

Both target use the same gcc setting and the optimisation is -Os. Here is for the N6 the results for -O2 and for -O3. Even with these optimisation wa are very far from the -Os of the H753. More probably something is not good with the hardware, but I can measure only the PLL clocks!

Here the new results:

-O2
---

uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
          the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =     24 [us]
          X projection                                 t =    169 [us]
          Y projection                                 t =    172 [us]
          Histogram                                    t =    331 [us]

Bench 01: Fill a small 2D array (200 x 200) elements
          in the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =    334 [us]
          X projection                                 t =   2752 [us]
          Y projection                                 t =   2632 [us]
          Histogram                                    t =   5349 [us]

Bench 02: Fill a small 1D array (1000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =   1000 [-]
          Min / Max                                    t =   2963 [us]

Bench 03: Fill a big 1D array (50000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =    100 [-]
          Min / Max                                    t =    307 [us]

Bench 04: Compute the integer atan2 using the CORDIC
          algorithm
          Number of tests                              n =   1000 [-]
          1000 x atan2(y, x)                           t =   2747 [us]

-O3
---

uKOS-X > bench
System bench.
Bench 00: Fill a small 2D array (50 x 50) elements in
          the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =     24 [us]
          X projection                                 t =    169 [us]
          Y projection                                 t =    171 [us]
          Histogram                                    t =    330 [us]

Bench 01: Fill a small 2D array (200 x 200) elements
          in the internal memory. Then, compute the
          X-Y projections and the histogram.
          Fill the array                               t =    334 [us]
          X projection                                 t =   2752 [us]
          Y projection                                 t =   2633 [us]
          Histogram                                    t =   5348 [us]

Bench 02: Fill a small 1D array (1000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =   1000 [-]
          Min / Max                                    t =   2962 [us]

Bench 03: Fill a big 1D array (50000) elements in
          the internal memory with a random pattern.
          Then, compute the min / max values.
          Number of tests                              n =    100 [-]
          Min / Max                                    t =    292 [us]

Bench 04: Compute the integer atan2 using the CORDIC
          algorithm
          Number of tests                              n =   1000 [-]
          1000 x atan2(y, x)                           t =   2862 [us]