cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 faster using floats than using uint32_t

JLope.11
Associate III

I made following two versions of the same software to process DMA data, one using floats and the second using uint32_t to try run faster, but as a surprise the first one run faster:

 

 

//161
void process_data_ADC1()
{
	float samples1=(2.0f/(float)ADC_BUF1);  // =1/samples
	uint32_t media1=0,media2=0;
	for (int i=0;i<ADC_BUF1/16;i++)
		{
			for (int j=0;j<8;j++)
				{
					media1+=buffer_ADC1[i*16+j];
					media2+=buffer_ADC1[i*16+j+8];
				}
		}
	float med1=(float) media1*samples1;
	float med2=(float) media2*samples1;
	avg16[1]=(uint16_t) (16.0f*med1+0.5f);
	avg16[2]=(uint16_t) (16.0f*med2+0.5f);
	float rms1=0.0f,rms2=0.0f,x;
	for (int i=0;i<ADC_BUF1/16;i++)
		{
			for (int j=0;j<8;j++)
				{
					x=(buffer_ADC1[i*16+j]  -med1);rms1+=x*x;//maximo 2^12 sin overflow para 12 bits
					x=(buffer_ADC1[i*16+j+8]-med2);rms2+=x*x;
				}
		}
	rms16[1]=(uint16_t) (16.0f*sqrt(rms1*samples1)+0.5f);
	rms16[2]=(uint16_t) (16.0f*sqrt(rms2*samples1)+0.5f);
}
//161b Usando uints32 en vez de float: ES MAS LENTO!!!!!!!
void process_data_ADC1_fast()
{
	float samples1=(2.0f/(float)ADC_BUF1);  // =1/samples
	uint32_t media1=0,media2=0,samples00=ADC_BUF1/2;
	for (int i=0;i<ADC_BUF1/16;i++)
		{
			for (int j=0;j<8;j++)
				{
					media1+=buffer_ADC1[i*16+j];
					media2+=buffer_ADC1[i*16+j+8];
				}
		}
	float med1=(float) media1*samples1;
	float med2=(float) media2*samples1;
	avg16[1]=(uint16_t) (16.0f*med1+0.5f);
	avg16[2]=(uint16_t) (16.0f*med2+0.5f);
	media1=media1/samples00;media2=media2/samples00;
	uint32_t rms1=0,rms2=0,x;
	for (int i=0;i<ADC_BUF1/16;i++)
		{
			for (int j=0;j<8;j++)
				{
					x=(buffer_ADC1[i*16+j]  -med1);rms1+=x*x;//maximo 2^12 sin overflow para adc de 12 bits y variables de 32 bits
					x=(buffer_ADC1[i*16+j+8]-med2);rms2+=x*x;
				}
		}
	x=16.0f*sqrt((float) rms1*samples1);rms16[1]= (uint16_t) (x+0.5f);
	x=16.0f*sqrt((float) rms2*samples1);rms16[2]= (uint16_t) (x+0.5f);
}

 

 

 

This is the routine used to measure time:

 

 

uint32_t measure_time(void)
{
	uint32_t static start = 0;
	uint32_t time2= SysTick->VAL;
	time2=start-time2;
	//DELAY_US(10);
	start=SysTick->VAL;;
	return (time2);
}

 

 

(It surprised to me that the systick timer runs backward)

 

It took in debug mode:

49321 ticks the float routine

49321 tics the uint32_t routine 

 

 

4 REPLIES 4
Andrew Neil
Evangelist III

@JLope.11 wrote:

one using floats and the second using uint32_t to try run faster, but as a surprise the first one run faster:


but then

 


@JLope.11 wrote:

It took in debug mode:

49321 ticks the float routine

49321 tics the uint32_t routine 


So they actually take the same time?

On a CPU with a hardware floating-point unit, I don't think that's necessarily surprising?

AScha.3
Chief II

Your first "time2" is 

time2=start-time2;

time2 = 0 - systick.   Is negative ...ok? 

next time2 = old systick - new systick  . Also negative...  ed.

If you feel a post has answered your question, please click "Accept as Solution".

Yes SYSTICK down counts and is only 24-bit, and often has a DIV8 prescaler.

However DWT CYCCNT is 32-bit and upcounts processor cycles.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Andrew Neil
Evangelist III