Skip to main content
Robmar
Senior II
December 28, 2022
Solved

How to efficiently save and restore the 32F407 FPU context

  • December 28, 2022
  • 7 replies
  • 3629 views

STM should really post their code to handle FPU context saving and restoring BECAUSE their clients have been asking for over a decade!!!

One user stated "the time for a full FPU state save/restore is quite long. A pair of vpush {s0-31}/vpop {s0-s31} takes around 400ns on my STM32F407 @ 168MHz."

Without this our code suffers floats sticking at nan0x400 status after IRQs.

So, STM staff, come on, show us the best routine you have to save and restore the FPU status, because the Lazy FPU feature is useless!!

This topic has been closed for replies.
Best answer by FBL

Hello @Robmar​ 

I come back to this old discussion and would like to mention that the FreeRTOS context switch routine is generic to any Cortex M4 with FPU. Here is an example how to save/store FPU registers in addition to lazy mode.

Added to that, ST provides this application note AN4044 section 3.5.3 FPU load/store instructions where we explained load/store instructions.

0693W00000aHKuWQAW.pngAlso, in Cortex-M4 Revision r0p0 Technical Reference Manual, you can find cycles counts of each instruction set.

It would be a good enhancement for our application note to include and detail the vpush/vpop instructions as load/store option for FPU. These instructions can potentially reduce the timing of FPU save/restore operations. (Any use case proposal from your side is welcomed, you can share your example privately if possible)

Thank you all for your constructive feedback.

Firas

7 replies

gbm
Lead III
December 28, 2022

There is no faster way than vpush/vpop and these instructions must take > 32 cycles, cause they execute 32 memory transfers. And I don't think there is anything wrong with lazy context switch, as most of ISRs do not need FPU. Just write your own FPU context switch handler - that's it.

My STM32 stuff on github - compact USB device stack and more: https://github.com/gbm-ii/gbmUSBdevice
gbm
Lead III
December 28, 2022

Not sure what you mean by base libraries but positive, I do not use HAL unless I have to. Not too much may be done with HAL or RTOS when you need to handle >500k events/interrupts per second, and that's the kind of tasks I usually do. As someone from ST said, you have to choose between real time and RTOS. ;)

And no, there is no need to "reinvent" this for each MCU. In the whole STM32 family there are just 2..3 versions of SPI, UART, USB and GPIO and they are quite similar to each other. You do realize that even the basic HAL_GPIO_Toggle routine contains an error making it suboptimal, don't you?

My STM32 stuff on github - compact USB device stack and more: https://github.com/gbm-ii/gbmUSBdevice
Garnett.Robert
Senior III
September 21, 2023

Hi,

Interested in your last comment about the toggle routine.

What's sub-optimal about it and what would be a more optimal version?

I use it quite a lot for debugging to keep track of timing etc,

I agree with you regarding RTOS and HAL.  RTOS is good for slow process control and I use it a lot and find it very good.  The support for FreeRTOS is also excellent, But if you need speed multitasking isn't an option because the processor is always busy processing IO. The other problem with the FreeRTOS is its footprint, that whilst small, on the really small low power processors I run out of memory.  I am making a much smaller version for the small processors mainly as a training exercise, but I am hoping to use it on all my small jobs.

I use HAL and then hack it to get rid of unused options and to speed it up. It's a bit rude, but I don't have the time to understand the nuance of every peripheral so HAL is a good starting point and my projects all seem to work.

Regards

Rob

 

Tesla DeLorean
Guru
December 28, 2022

How would this part be any different from any other Cortex-Mx part with an FPU.

Surely this is an ARM thing not ST's responsibility.

ST historical has no interest in staffing other people's projects​.

Tips, Buy me a coffee, or three.. PayPal VenmoUp vote any posts that you find helpful, it shows what's working..
Danish1
Lead III
December 29, 2022

Do your ISRs use the FPU? I try to keep mine as short/quick as possible, so I keep floating-point operations out of them.

For me, the need to save/restore FPU state is for context-switching between threads where more than one thread uses the FPU. And I’ll admit that I use the minimal RTOS provided with a commercial IDE (Rowley Crossworks) and find everything works as intended.

If you have your own “bare metal�? tasking library/RTOS, context-switches are where you might find the need to save/restore the FPU. And lazy-stacking won’t help there.

Robmar
RobmarAuthor
Senior II
December 30, 2022

Sorry, what? We're developing DSP applications, of course they use float.

Danish1
Lead III
January 1, 2023

It depends how you’ve organised your code. I tend to use ISRs for I/O and pass the need for floating-point to threads in my code. Your application might need floating-point in the ISRs, in which case lazy-stacking should be sufficient.

But if you, like me, have an RTOS that switches between multiple threads, where more than one thread uses the FPU, the RTOS needs to save/restore FPU states as part of the context-switching process. That’s the point I wanted to make.

S.Ma
Principal
January 1, 2023

Check out compiler options, if your interrupt code is not exotic and it is a fixed piece of code, the compiler should be able to push any core register before its reuse, and restore it at the end, the basic way when cores lack multiple register push/pop. A lazy stacking by pure code. That should work if the toolchain is good enough. ?

Pavel A.
Super User
January 1, 2023

The best context switch is no context switch.

Specially for your need ST makes multi-core MCUs. Use that and have each core focus on its task without interruptions.

FBLBest answer
Technical Moderator
February 21, 2023

Hello @Robmar​ 

I come back to this old discussion and would like to mention that the FreeRTOS context switch routine is generic to any Cortex M4 with FPU. Here is an example how to save/store FPU registers in addition to lazy mode.

Added to that, ST provides this application note AN4044 section 3.5.3 FPU load/store instructions where we explained load/store instructions.

0693W00000aHKuWQAW.pngAlso, in Cortex-M4 Revision r0p0 Technical Reference Manual, you can find cycles counts of each instruction set.

It would be a good enhancement for our application note to include and detail the vpush/vpop instructions as load/store option for FPU. These instructions can potentially reduce the timing of FPU save/restore operations. (Any use case proposal from your side is welcomed, you can share your example privately if possible)

Thank you all for your constructive feedback.

Firas

To give better visibility on the answered topics, please click on "Best answer" on the reply which solved your issue or answered your question.Best regards,FBL
Robmar
RobmarAuthor
Senior II
February 22, 2023

Thanks Firas, for the useful references, looks like it should be enough to complete two save and restore functions for the FPU. All the best