Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- STMicroelectronics Community
- STM32 MCUs
- STM32 MCUs products
- Faster way to do a 1/x operation ?

Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Faster way to do a 1/x operation ?

Options

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2021-08-06 08:28 PM

Hi All,

This is not really a specific STM32 question I suppose, so please excuse what might seem like a simple maths question for some of you.

How can I do a 1/x operation to a FLT number without using the VDIV command?

VDIV takes 14 clock cycles to complete, so best to avoid it if there is a quicker way.

Assume for the code below:

s10 = 5000

s6 = 1

vdiv.f32 s10, s6, s10

That does 1 / 5000 = 0.0002

Can the same be achieved using vmul instead?

Thanks.

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2021-08-07 01:11 AM

> Can the same be achieved using vmul instead?

No, unless x is compile-time constant.

14 cycles per float div is blazingly fast. If this is not sufficient, reconsider your algorithm, e.g using fixed-point (i.e. integer) arithmetics instead of floating-point.

JW

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2021-08-07 07:09 AM

The trick with the reciprocals is that you or the compiler precompute them.

The optimizer can typically fold constants, and reorder the math so that scaling ends up as a multiply rather than a divide.

For example I'd use

x = x * 1e-6;

instead of

x = x / 1000000.0;

Similar things can be done with say Speed of Light constants and computing wavelengths, etc.

In Assembler you need to consider the order of the math, both in terms of the efficiency, but also in terms of maintaining precision. The ARM FPU doesn't hold intermediate values at higher levels of precision like the more classical Intel and Motorola designs, so one has to be particularly aware of the issues around 32-bit floats.

You might be to compute or load constants you use repetitively. Use spare registers to hold these so you can move to doing fewer divides and more multiplies.

Other algorithm level optimizations a compiler won't handle things, for example you don't need to square-root numbers for the purpose of magnitude comparisons.

Tips, Buy me a coffee, or three.. PayPal Venmo

Up vote any posts that you find helpful, it shows what's working..

Up vote any posts that you find helpful, it shows what's working..

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2021-08-07 11:37 AM

It's possible to replace taking the reciprocal by some iterations of Newton's method. One Newton step requires one subtraction and two multiplications. If you have a limited range for x so that you can guess a good start value and you can live with limited accuracy, maybe one or two iterations might be sufficient, and then you might get some improvement. But don't expect miracles ...

That had been the standard way to do division when there was fast hardware support for multiplication but none for division.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2021-08-07 04:24 PM

Thank you everyone for your responses, much appreciated.

I am not running short of CPU cycles, this is a simple algo to turn some varying inputs to timer autoload values......but only for pulses with a maximum output frequency of about 700 Hz, so wasting 14 clock cycles doing a division step to calculate the next timer value is far from the end of the world for me. I was just curious if there was a method to get rid of the divide command.

As Tesla Delorean suggested, I did change one of the steps to convert 'per minute' to 'per second' from / 60 to * 0.016666, for the accuracy I need this is just fine.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2021-08-08 12:30 PM

By the way...

```
float n1 = 0.016666; // Actual value: 0.0166660007
float n2 = 0.016667; // Actual value: 0.0166669991
float n3 = 1.0 / 60.0; // Actual value: 0.0166666675
```

The conclusion - proper rounding matters and it's actually better to write "the intention" (1/60) and let compiler derive the best fitting value.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

2021-08-08 02:57 PM

Related Content

- Image reading from uSD card vs USD pen drive in STM32 MCUs Embedded software
- STM32F4 DMA Address Update penalty with double buffering in STM32 MCUs products
- STM32F303VET - issue in transmitting CAN messages in STM32 MCUs products
- STM32C031F6 CubeMX pin assignment for USART1 boot loader use in STM32 MCUs products
- ECDH fails for CURVE25519 using X-CUBE-CRYPTOLIB in STM32 MCUs Security