2022-11-30 02:53 AM
I'm building a computationally demanding application on a STMF2F769DI board using STM32CubeIDE and a USB connection to debug.
I've wrapped a segment of code (a FIR downsampling filter) with LED on/off and measured its execution time on an oscillosccope. I'm seeing 78 milliseconds for 512*32 (16320) single precision, floating point, multiply accumulates which comes out to about 5 us per multiply accumulate. The core is clocked at 216 MHz (no change to the clocks) so this seems incredibly slow. My project includes the CMSIS DSP libraries and I've used their library for the multiply accumulate block.
How can I tell if the FPU is being used. I've checked the status of __FPU_PRESENT and FPU_USED (both = 1). The floating point unit is set to the FPv5-D16 and I set the floating point ABI to hardware implementation. All three CMISIS libraries are linked M7lfsp_math, M7lfdp_math and M7l_math. What else should I check and have I missed anything obvious.
Greatly appreciate your help. This is really proving difficult to resolve.
2022-11-30 03:12 AM
just - which setting for optimizer did you choose?
(this changes a lot !)
-O2 was best in my tests.
read:
https://mcuoneclipse.com/2019/03/29/be-aware-floating-point-operations-on-arm-cortex-m4f/
2022-11-30 05:04 AM
You can use objdump on the .o file and check if/what FPU emulation functions are needed (they should start with __aeabi or similar) for linking. Ideally: none.
You may also check the disassembled code for FPU instructions vs. emaulation function calls.
hth
KnarfB