Showing results for 
Search instead for 
Did you mean: 

How to efficiently save and restore the 32F407 FPU context

Senior III

STM should really post their code to handle FPU context saving and restoring BECAUSE their clients have been asking for over a decade!!!

One user stated "the time for a full FPU state save/restore is quite long. A pair of vpush {s0-31}/vpop {s0-s31} takes around 400ns on my STM32F407 @ 168MHz."

Without this our code suffers floats sticking at nan0x400 status after IRQs.

So, STM staff, come on, show us the best routine you have to save and restore the FPU status, because the Lazy FPU feature is useless!!


Accepted Solutions
ST Employee

Hello @Robmar​ 

I come back to this old discussion and would like to mention that the FreeRTOS context switch routine is generic to any Cortex M4 with FPU. Here is an example how to save/store FPU registers in addition to lazy mode.

Added to that, ST provides this application note AN4044 section 3.5.3 FPU load/store instructions where we explained load/store instructions.

0693W00000aHKuWQAW.pngAlso, in Cortex-M4 Revision r0p0 Technical Reference Manual, you can find cycles counts of each instruction set.

It would be a good enhancement for our application note to include and detail the vpush/vpop instructions as load/store option for FPU. These instructions can potentially reduce the timing of FPU save/restore operations. (Any use case proposal from your side is welcomed, you can share your example privately if possible)

Thank you all for your constructive feedback.


To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

View solution in original post

Lead II

There is no faster way than vpush/vpop and these instructions must take > 32 cycles, cause they execute 32 memory transfers. And I don't think there is anything wrong with lazy context switch, as most of ISRs do not need FPU. Just write your own FPU context switch handler - that's it.

Lead II

Not sure what you mean by base libraries but positive, I do not use HAL unless I have to. Not too much may be done with HAL or RTOS when you need to handle >500k events/interrupts per second, and that's the kind of tasks I usually do. As someone from ST said, you have to choose between real time and RTOS. 😉

And no, there is no need to "reinvent" this for each MCU. In the whole STM32 family there are just 2..3 versions of SPI, UART, USB and GPIO and they are quite similar to each other. You do realize that even the basic HAL_GPIO_Toggle routine contains an error making it suboptimal, don't you?

How would this part be any different from any other Cortex-Mx part with an FPU.

Surely this is an ARM thing not ST's responsibility.

ST historical has no interest in staffing other people's projects​.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Lead II

Do your ISRs use the FPU? I try to keep mine as short/quick as possible, so I keep floating-point operations out of them.

For me, the need to save/restore FPU state is for context-switching between threads where more than one thread uses the FPU. And I’ll admit that I use the minimal RTOS provided with a commercial IDE (Rowley Crossworks) and find everything works as intended.

If you have your own “bare metal�? tasking library/RTOS, context-switches are where you might find the need to save/restore the FPU. And lazy-stacking won’t help there.

Sorry, what? We're developing DSP applications, of course they use float.


Check out compiler options, if your interrupt code is not exotic and it is a fixed piece of code, the compiler should be able to push any core register before its reuse, and restore it at the end, the basic way when cores lack multiple register push/pop. A lazy stacking by pure code. That should work if the toolchain is good enough. ?

Pavel A.
Evangelist III

The best context switch is no context switch.

Specially for your need ST makes multi-core MCUs. Use that and have each core focus on its task without interruptions.

It depends how you’ve organised your code. I tend to use ISRs for I/O and pass the need for floating-point to threads in my code. Your application might need floating-point in the ISRs, in which case lazy-stacking should be sufficient.

But if you, like me, have an RTOS that switches between multiple threads, where more than one thread uses the FPU, the RTOS needs to save/restore FPU states as part of the context-switching process. That’s the point I wanted to make.

Okay, lazy stacking only saves half the FPU registers, like flying with half a parachute maybe, not exactly the sort of software quality we are trying to achieve.

My point is that there isn't a single example of the two routines needed to save and restore the 32F4XX FPU status, which considering the chip is over 10 years in production, is a surprising support omission. Yes, I know its in RTOS, but it would be real handy to a lot of developers under time pressure (not hobbyists) if STM or ARM released these routines. Lets end this thread now.