STM32f429 vs STM32F767, process speed?

oguzhan demirci · ‎2017-03-20

Posted on March 20, 2017 at 13:31

Hi,

We are testing a code which is generated from simulink. Code is processing over the single precision operands. When we compile the code with keil , by choosing STM32F767 and SP FPU, the maximum time to complete the one cycle of the algorithm seems more than the maximum time for the STM32F429 in the same conditions. This means F429 is processing and completing the algorithm faster than F767.

?n the technical manual, we were expecting the FPU of the F757 must be faster than F429, in contrass processing the algorith time seems more than F429.

CPU speed of F767 is 216MHz and F429 is 168MHz. we selected the same optimization levels over the keil before compiling.

Algorithm time is measured by a timer as starting in the beggining of the algorithm and stoping at the end.

Question is why and how is the 767 works slower than the 429?

thank you for your answers,

O?uzhan Demirci

#stm32f767 #stm32f429 #fp

FLast.17.70 · ‎2018-03-22

Posted on March 22, 2018 at 17:24

Sure. If you need double floating point, the STM32F767 will be a lot faster. The STM32F4 can only do single precision float, so when it comes across double, the compiler will pull in the double float library functions and you will see a dramatic slow-down.

The 20-30% performance improvement I saw was with the exact same code only using single precision float. And I suspect if you don't actually need double precision but still use double where float would be sufficient, you will see a small performance penalty because even though the F7 can do double in hardware, it will still need to do 64-bit as opposed to 32-bit accesses, resulting in some performance drop.

Again, all I can say is that if you want maximum performance out of the Cortex M7, you need to understand where the performance improvements can come from; if you just blindly expect that it will perform better without you having to do anything, you might be disappointed. This is a complex beast.

AVI-crak · ‎2018-03-22

Posted on March 22, 2018 at 18:54

Instructions with single and double precision are not a performance indicator. Users most often flash the LED - literally.

For the Cortex M7, everything connected to the external periphery is very slow compared to the Cortex M4. Yes it works quickly and completely autonomously. But quick access is not the speed of the periphery, it is the speed of the data bus arbiter. Just look at the second page of the project in CubeMX - to see the differences.

GCC for cortex-m0 used the sequence of instructions ldrb + uxtb, because ldrb filled the register with extra garbage. Cortex-m3 and higher - does not add garbage when reading one byte, but the sequence of instructions remains the same. In this case, deletion of garbage when reading the byte (uxtb) - is obtained by an unnecessary command. And this is among other things extra 1-10% of the code mass.

GCC performs automatic surveying of the program code based on the settings of the ancient cortex-m0. Someone forgot to fill out the tables for executing the processor instructions for the new kernel. As a result, the pipeline does not work optimally even in cortex-m4. It is quite easy to check the code in assembler in comparison with automatic compilation.

So Keil / IAR / GCC do not know about the latency of the external periphery. For them, this is the linear address space.

Although IAR and Keil have mechanisms for predicting the execution of processor instructions, they can not predict the behavior of the external data bus.

The general proposal is a separate experimental branch in the GCC development environment.

In fact, there is no need to write complex algorithms, but you need a lot of manual monotonous work in the robot style (literally). And of course access to 'secret' information from the ST itself.

And of course they want to work for free (without me).

Tesla DeLorean · ‎2018-03-22

Posted on March 22, 2018 at 19:21

>>Instructions with single and double precision are not a performance indicator. Users most often flash the LED - literally.

And roughly half the population is at or below average intelligence. Being in a STEM field moves the needle, but honestly not much.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..