cancel
Showing results for 
Search instead for 
Did you mean: 

STMCUBEIDE - H7 series force use of SMLAL (Signed Multiply with Accumulate (32 × 32 + 64), 64-bit result) instruction

MikeDB
Lead

Code is :

int16_t 	OscPhase[NumOsc];
int32_t 	OscInc[NumOsc];
int32_t 	OscVol[NumOsc];
int32_t 	Sine[65536];
 
int64_t 	OscTotal;
 
 
and then in main() :
 
		OscTotal = 0;
		for (i = 0; i < NumOsc; i++)
		{
			OscPhase[i] = OscPhase[i] + OscInc[i];
			OscTotal = OscTotal + Sine[OscPhase[i]]  * OscVol[i];
		}

I was expecting the H7 to use the SMLAL instruction for the final multiply and accumulate but instead it performs a MUL.W which only gives a 32 bit result and then uses an ADD.W and ADC.W to add these 32 bits into the final 64 bit result.

Any suggestions on how to force it to use the correct code ?

16 REPLIES 16

Use inline assembler.

JW

Oh come on, this is the 21st century. Even the Arduino compiler produces better code than this managed so I'm sure it's just a compiler directive or something I am missing.

Then use Arduino.

JW

It doesn't support the full M7 instruction set. Only the M0/3.

Does the shiny new STMCubeIDE support the M7 fully ? Including the H7 ?

Else, get a proper toolchain.

I had assumed that as it was ST's newest introduction that it would be worth a try so I decided to install it yesterday and give it a good run-through with existing code. But with debugging not working and the code being not fully optimised I am indeed on the point of reverting to my previous toolchain.

Have any STM people on here actually used it yet ?

It seems to be the refurbished (downscaled ???) Atollic toolchain, now under ST control.

And remarkably, there appear to be developers present here (former Atollic staff, as it seems).

Perhaps you can catch their attention.

From their video

 0690X000008BBOnQAO.png

If it's the ~ OFFICIAL ~ STM32 IDE then I'd expect it to work and work well !!

I've tried every optimisation level and the best it can manage is

mul.w   r2, r2, r3

adds   r4, r4, r2

adc.w   r5, r5, r2, asr #31

which of course is both inefficient and fundamentally wrong as it's thrown away active bits.

Try casting Sine with (int64_t) in the computaion.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..