How to use half precision floating variable

Aurélien f · ‎2017-05-22

Posted on May 22, 2017 at 12:26

Hello,

I would like to use half precision floating variable in my project but i don't know how. I'm working on a NUCLEO-L073 board that embeds a cortex m0 STM32l073 MCU that does not have any FPU. I'm using the SW4STM32 eclipse environment. I saw that gcc proposes some options :

https://gcc.gnu.org/onlinedocs/gcc-4.5.1/gcc/Half_002dPrecision.html

but i don't know how to set flag or use their specific libraires to activate this feature.

I you have any idea, p

lease feel free to share your thoughts about my issue.

Thank you very much.

Best regards,

Aurélien

deepak4code · ‎2024-05-31

Sorry for late response. I stuck into another problems not related to this.

But still I cannot able to resolve this problem of using float16. I have same setting as you suggested. I have attached a screenshoot of the window.

I tried to declare the variable like __fp16. but it does not work. I get the following error:

Did I missing something any library , compilar setting others?

I need this setting for enabling custom deep learning model on stm32 MCUs to save memory for weights. Any help from side will apreciated and will be very helpful.

Thank you

deepak kumar

AScha.3 · ‎2024-05-31

>I tried to declare the variable like __fp16. but it does not work.

Right, fp16 is no standard float format, the compiler "knows".

https://en.wikipedia.org/wiki/Half-precision_floating-point_format

So use standard float or calculate in fixed 16b int16_t .

Or use not the cheapest version of the oldest cpu, that STM produced.

If you feel a post has answered your question, please click "Accept as Solution".

unsigned_char_array · ‎2024-05-31

@deepak4code wrote:
I need this setting for enabling custom deep learning model on stm32 MCUs to save memory for weights.

Save ROM or RAM? In case you want to save ROM you can store the weights as fixed point 16-bit or half precision float and then convert them to float32 for processing (you only need all outputs of one layer in RAM, correct?). If you need help with converting float32 to float16 and back let me know. I recently made a library for serializing floats to a byte array (platform independent) you can look at for inspiration: https://github.com/ChrisIdema/IEEE754_binary_encoder. In your case it doesn't need to be platform independent so storage format of float32 is known and weights are probably not NaN or INF, so it should be relatively easy to convert them to float with a bit of bitshifting. You probably want to use saturation arithmetic since you don't want overflows or infinities in your neural network.

In case you also need to save RAM, then you would need to do the processing in 16-bit format too. That would be more complicated.

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

deepak4code · ‎2024-06-03

I am really thankful for the resources you shared. Basically I have model with 4.04MB of weights size and my target board is STM32H743 MCU with 2MB Flash and 1MB of RAM. So I want to implement with a quantized model of it. I will try to implement as you suggested.

Thank you