Faster way to do a 1/x operation ?

RMyer.1 · ‎2021-08-06

Hi All,

This is not really a specific STM32 question I suppose, so please excuse what might seem like a simple maths question for some of you.

How can I do a 1/x operation to a FLT number without using the VDIV command?

VDIV takes 14 clock cycles to complete, so best to avoid it if there is a quicker way.

Assume for the code below:

s10 = 5000

s6 = 1

vdiv.f32 s10, s6, s10

That does 1 / 5000 = 0.0002

Can the same be achieved using vmul instead?

Thanks.

waclawek.jan · ‎2021-08-07

> Can the same be achieved using vmul instead?

No, unless x is compile-time constant.

14 cycles per float div is blazingly fast. If this is not sufficient, reconsider your algorithm, e.g using fixed-point (i.e. integer) arithmetics instead of floating-point.

JW

Tesla DeLorean · ‎2021-08-07

The trick with the reciprocals is that you or the compiler precompute them.

The optimizer can typically fold constants, and reorder the math so that scaling ends up as a multiply rather than a divide.

For example I'd use

x = x * 1e-6;

instead of

x = x / 1000000.0;

Similar things can be done with say Speed of Light constants and computing wavelengths, etc.

In Assembler you need to consider the order of the math, both in terms of the efficiency, but also in terms of maintaining precision. The ARM FPU doesn't hold intermediate values at higher levels of precision like the more classical Intel and Motorola designs, so one has to be particularly aware of the issues around 32-bit floats.

You might be to compute or load constants you use repetitively. Use spare registers to hold these so you can move to doing fewer divides and more multiplies.

Other algorithm level optimizations a compiler won't handle things, for example you don't need to square-root numbers for the purpose of magnitude comparisons.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Andreas Bolsch · ‎2021-08-07

It's possible to replace taking the reciprocal by some iterations of Newton's method. One Newton step requires one subtraction and two multiplications. If you have a limited range for x so that you can guess a good start value and you can live with limited accuracy, maybe one or two iterations might be sufficient, and then you might get some improvement. But don't expect miracles ...

That had been the standard way to do division when there was fast hardware support for multiplication but none for division.

RMyer.1 · ‎2021-08-07

Thank you everyone for your responses, much appreciated.

I am not running short of CPU cycles, this is a simple algo to turn some varying inputs to timer autoload values......but only for pulses with a maximum output frequency of about 700 Hz, so wasting 14 clock cycles doing a division step to calculate the next timer value is far from the end of the world for me. I was just curious if there was a method to get rid of the divide command.

As Tesla Delorean suggested, I did change one of the steps to convert 'per minute' to 'per second' from / 60 to * 0.016666, for the accuracy I need this is just fine.

Piranha · ‎2021-08-08

By the way...

float n1 = 0.016666;   // Actual value: 0.0166660007
float n2 = 0.016667;   // Actual value: 0.0166669991
float n3 = 1.0 / 60.0; // Actual value: 0.0166666675

The conclusion - proper rounding matters and it's actually better to write "the intention" (1/60) and let compiler derive the best fitting value.

RMyer.1 · ‎2021-08-08

Thanks Piranha, the actual value I am using for that FLT is 0x3C888889 (0.0166666675359), I was just reducing the decimal places for the purpose of the post. Appreciate the response though, if I needed such accuracy for this that would have been an issue as you pointed out.