2022-12-26 08:26 PM
I'm using an STM32G031F8 for a job right now. I'm coding in GCC assembly, no OS. This thing doesn't even have a hardware divide (much less FPU). I pulled out some old routines from NXP ARM7 chips for doing unsigned div 32 bit by 32 bit and 32 bit by 16 bit, but those are in full ARM code....This is minimal thumb code. It looks extremely cumbersome to translate them, since the instruction set for this chip won't do an LSR or LSL without setting the condition flags, and all the routines I have need the condition flags to be retained across shifts.
Is there any available library of optimized assembly math routines for these processors? I don't need a whole pile of them, right now I'd be content with a div 32 by 16. Thanks!
2022-12-26 10:31 PM
You don’t say why you’re choosing to use assembly rather than (say) C.
Compilers have such subroutines to hand, and will attach them to your code as appropriate. So you could do worse than writing a small program that needs a division and seeing what the compiler produces. (You’ll need to convince the compiler not to do the division at compile-time rather than run-time).
But don’t forget that the greatest optimisation is not to have to divide at all, or not do it more than necessary. If you divide by the same value more than once, it can be faster to calculate the reciprocal once, then multiply by that as and when needed. You’ll need to choose an appropriate fixed-point representation.
2022-12-26 11:17 PM
No, I didn't say. I was not looking for a discussion on my toolset, I was looking for a solution.
And I am quite aware of ways to get around doing a divide, but certain calibrations require at least one divide done at init time to generate a calibration constant.
Never mind, I already wrote it.
2022-12-26 11:33 PM
in case anybody else has a use for this:
(I apologize for the lousy formatting, the tabs got all messed up.)
.global uDIV3215 // cheesy unsigned divide, R0 is numerator on entry, R1 is denominator
uDIV3215: push {r1,r2,r3,lr} // R0 is quotient on exit
cmp r1,0
beq _zonk // divide by 0
ldr r2,=1<<16
cmp r1,r2
bge _zonk // denominator too large
ldr r2,=0 // r2 is the answer, built bit by bit starting MSB
lsls r1,15 // start with comparison of numerator to denominator * 2^15
ldr r3,=1<<15
_loop: cmp r0,r1 // is remainder (so far) > denominator * 2^N ?
blt _next // no, just shift down for next
subs r0,r1 // yes, so reduce remainder...
adds r2,r3 // ... and add a bit to the quotient
_next: lsrs r1,1 // divide denominator *2^N by 2
lsrs r3,1 // shift mask
bne _loop // back jack do it again
mov r0,r2
pop {r1,r2,r3,pc}
_zonk: ldr r0,=-1
pop {r1,r2,r3,pc}