cancel
Showing results for 
Search instead for 
Did you mean: 

This is a STM32CubeIDE question, or GCC issue regarding the inline asm() and a cortex m4. rdhi and rdlo must be different for UMULL instruction, but not for the SMULL instruction for some reason.

TModj.1
Associate II

I need to inline some simple functions for a reverb pedal project, but ran into a bug regarding the reuse of registers and the 'umull' and 'smull' instruction.

I am using a cortex M4 with a single cycle UMULL so, I should be able to use the same registers for source and destination eg. good:UMULL r1,r0,r0,r1 This should be ok on a M4.

On an M3 WITHOUT a single cycle multiply this could be an issue eg. good:UMULL r2,r3,r0,r1 bad:UMULL r1,r0,r0,r1 because source destination register reuse. But I am using an M4 so...

The code in assembly is easy to understand but impossible to inline ...

uumul_asm: // return ((r0 * r1) >> 32)

UMULL r1, r0, r0, r1

bx lr

ssmul_asm: // return ( (r0 * r1) >> 32)

SMULL r1, r0, r0, r1

bx lr

Now the abomination to inline the these is ...

static inline __attribute__((always_inline)) U32 uumul(U32 x,U32 y)

{

asm(" umull %[yy],%[xx],%[xx],%[yy]" :: [xx] "r" (x) , [yy] "r" (y) );

return(x);

}

static inline __attribute__((always_inline)) S32 ssmul(S32 x,S32 y)

{

asm(" smull %[yy],%[xx],%[xx],%[yy]" :: [xx] "r" (x) , [yy] "r" (y) );

return(x);

}

The issue is that S32 ssmul(S32 x,S32 y) will compile without issues no problem...

but U32 uumul(U32 x,U32 y) will produce an error regarding the reuse of source and destination registers.

/tmp/cccWoXGD.s:667: rdhi and rdlo must be different

/tmp/cccWoXGD.s:730: rdhi and rdlo must be different

/tmp/cccWoXGD.s:1098: rdhi and rdlo must be different

/tmp/cccWoXGD.s:1189: rdhi and rdlo must be different

6 REPLIES 6

The error message is not about reuse of source and destination registers, but about identical registers in the destination register-pair.

Generate the asm (e.g. run compiler with -save-temps) and post together with the respective C source.

JW

You want to reuse the registers you put into input constrants, also as outputs, this is certainly incorrect. IMO you should assign them into output constraints with "+r"; but I am not gcc inline asm expert.

https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html

JW

TModj.1
Associate II

I agree that the register usage definition are not proper, as it has no destination register. something like

// asm( " instruction " : destination_regs : source_regs : clobber_regs )

asm(" umull %[yy],%[xx],%[xx],%[yy]" : [xx] "=r" (x) : [xx] (x) , [yy] "r" (y) ); seems more correct, but does not compile.

Actually not having a destination register did produce incorrect code in 1 out of 25 instances. But I guess without a destination register the compile thinks it returns void eg. void func(x,y).

consider this example.

ssmul_ls2_asm:

smull r1, r0, r0, r1

lsl r0,r0,#2

add r0,r0,r1,lsr #30

bx lr

or

static inline __attribute__((always_inline)) S32 ssmul_ls2(S32 x,S32 y)

{

S32 z;

asm(" smull %[yy],%[xx],%[xx],%[yy]  \n" \

" lsl  %[xx],%[xx],#2        \n" \

" add  %[zz],%[xx],%[yy], LSR #30 \n" : [zz] "=r" (z) : [xx] "r" (x) , [yy] "r" (y));

return(z);

}

This works fine because the destination register is defined, the 'S32 z' variable will probably be optimized out.

Now consider this one that does not work

static inline __attribute__((always_inline))

S32 ssmul(S32 x,S32 y)

{

S32 z;

asm(" smull %[yy],%[zz],%[xx],%[yy]" : [zz] '=r" (z) : [xx] "r" (x) , [yy] "r" (y) );

return(z);

}

The only way I have found to make it work is forcing it to use a specific register in the clobber registers section....

static inline __attribute__((always_inline)) S32 ssmul(S32 x,S32 y)

{

S32 z;

asm(" smull r1,%[zz],%[xx],%[yy]" : [zz] "=r" (z) : [xx] "r" (x) , [yy] "r" (y) : "r1" );

return(z);

}

This syntax is insane, both for the inline assembly, and the hoops you have to jump though to inline a simple function.

The gcc inline asm is unique in that it exists at all. It's a shortcut across the whole frontend and some of the backend, right before the register allocator. So you have to have a grab on what the register allocator - in general, and also its particular incarnation for given backend's target architecture - expects.

As I've said I'm no expert, but one thing I know for sure - it's usually much easier to write true asm than (good and universally working) inline asm. So maybe instead of writing stubs of inline asm, you may want to experiment writing the whole (or substantial part of) routine in true asm, and then just call it from C.

JW

KnarfB
Principal III

Thanks for the link, @KnarfB​ 

"This is not exactly well documented, one has to browse the gcc sources to find it." :D But yes, this sums it well up.

Those examples won't enforce the same register, given separate input and output constraints; but they allow it so the compiler can decide to reuse them.

Also, OP here appears to be OK to throw away the lower 32 bits of result, so that's again a slightly different situation.

JW