Are there available math libraries in assembly for the STM32G031 type processors?

JCase.1 · ‎2022-12-26

I'm using an STM32G031F8 for a job right now. I'm coding in GCC assembly, no OS. This thing doesn't even have a hardware divide (much less FPU). I pulled out some old routines from NXP ARM7 chips for doing unsigned div 32 bit by 32 bit and 32 bit by 16 bit, but those are in full ARM code....This is minimal thumb code. It looks extremely cumbersome to translate them, since the instruction set for this chip won't do an LSR or LSL without setting the condition flags, and all the routines I have need the condition flags to be retained across shifts.

Is there any available library of optimized assembly math routines for these processors? I don't need a whole pile of them, right now I'd be content with a div 32 by 16. Thanks!

Danish1 · ‎2022-12-26

You don’t say why you’re choosing to use assembly rather than (say) C.

Compilers have such subroutines to hand, and will attach them to your code as appropriate. So you could do worse than writing a small program that needs a division and seeing what the compiler produces. (You’ll need to convince the compiler not to do the division at compile-time rather than run-time).

But don’t forget that the greatest optimisation is not to have to divide at all, or not do it more than necessary. If you divide by the same value more than once, it can be faster to calculate the reciprocal once, then multiply by that as and when needed. You’ll need to choose an appropriate fixed-point representation.

JCase.1 · ‎2022-12-26

No, I didn't say. I was not looking for a discussion on my toolset, I was looking for a solution.

And I am quite aware of ways to get around doing a divide, but certain calibrations require at least one divide done at init time to generate a calibration constant.

Never mind, I already wrote it.

JCase.1 · ‎2022-12-26

in case anybody else has a use for this:

(I apologize for the lousy formatting, the tabs got all messed up.)

.global uDIV3215                               // cheesy unsigned divide, R0 is numerator on entry, R1 is denominator
uDIV3215:  push   {r1,r2,r3,lr}             // R0 is quotient on exit
 
   cmp     r1,0
   beq     _zonk                                // divide by 0
   ldr       r2,=1<<16
   cmp    r1,r2
   bge    _zonk                                // denominator too large
 
      ldr      r2,=0                             // r2 is the answer, built bit by bit starting MSB
      lsls    r1,15                             // start with comparison of numerator to denominator * 2^15
      ldr      r3,=1<<15
 
_loop:    cmp     r0,r1                    // is remainder (so far) >  denominator * 2^N ?
             blt       _next                   // no, just shift down for next
             subs    r0,r1                    // yes, so reduce remainder...
             adds    r2,r3                    // ... and add a bit to the quotient
_next:  lsrs      r1,1                      // divide denominator *2^N by 2
           lsrs      r3,1                      // shift mask
           bne      _loop                   // back jack do it again
     mov      r0,r2
     pop      {r1,r2,r3,pc}
_zonk:  ldr   r0,=-1
     pop      {r1,r2,r3,pc}