I recently noticed a strange case on STM32F746.

I measure the execution time of this function by clock cycles. which is just a simple while loop.

void LK_ADDr(float *im, float *meanParameter, int Size )

{

while (Size--)

{

*im = *im + *meanParameter;

im++;

meanParameter++;

}

}

the **execution time** should intuitively has a linear relationship with the **Size.**

but in test, I've found that it is the square of the **Size. The R is eventually equal to 1.**

** **

I test the function 10K times for every Size number. And got the average execution time for every Size value. So I'm sure the result I got is correct.

in the trend function, the real constant 28 can be explained. which denotes the clock cycles cost by the function, push and pop stack and the parameters loading.

the linear coefficient 5.63 is also explainable, which denotes the clock cycles cost by every loop.

I don't know what's the meaning of the quadratic coefficient.

And then I increase the number of size. the quadratic coefficient close to zero the it looks become a linear curve.

But the quadratic coefficient increased from 5 to 7. Does that means the time cost for every loop, increase with the increasing of the size?

I draw a curve for the quadratic coefficient.

the Y axis means, when I draw the trend curve for the first several points in the first picture, the quadratic coefficient of the trend curve.

it looks interesting. It not due to the FPU because I test the function again but random int data.

so which is not caused by the data too.

I think it may caused by the branch predictor. But the hit rate of branch predictor should over 95%, in this curve, the largest loop time cost has double the smallest one. if it is caused by the branch predictor, it may not a good predictor.

Why the average loop time changed so intensely?

Have you disabled the interrupts before calling the function? (systick, etc...)

How about the program and data cache?