cancel
Showing results for 
Search instead for 
Did you mean: 

STM32L552 CMSIS DSP Performance Issue

RW1881
Associate

I am using the NUCLEO-L552ZE-Q board to check the cycle counts for some of the CMSIS DSP library functions and getting unexpected results. I am comparing against ST AN4841 benchmarks and getting over 2.5 times more cycles than the Cortex-M4 in the application note. From my understanding, the Cortex-M33 should provide slightly better performance than a Cortex-M4 on an instruction cycle match up. I am using the examples from the CMSIS library and only checking the cycles used for arm_cfft_f32() and arm_fir_f32() functions, with all interrupts disabled, no peripherals (NUCLEO-L552ZE-Q CubeIDE template), no RTOS. I have used both SWV and DWT->CYCCNT to measure. Also, I am using the pre-compiled CMSIS library with FPU and DSP enabled and tried both the ST provided library as well as v5.7.0 from ARM. Am I completely missing something here?

1 REPLY 1
RW1881
Associate

Okay, so I may have figured this out. It looks like the CubeMX / CubeIDE has the ICACHE disabled by default. Setting to 2-way associative gives me a 10% - 25% fewer instruction cycles depending on the CMSIS function. I still find it odd as the M4 doesn't have any cache, I wouldn't think it would make any difference. My data is in RAM, so maybe the instructions still need some cache from flash.