cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F7 vs STM32F4 performances

edouard2
Associate II
Posted on October 15, 2015 at 17:56

I have got a question about  performances of the STM32F7 vs STM32F4. (Sorry I didn�t found the answer on the web). 

If I run the processors at the same frequency, and execute code from internal flash does STM32F7 will be faster than the F4?

If I have a good understand, one major advantage of cortex m-7 series compare to cortex m-4 is the introduction of cache. But If I execute code from internal flash does it increase performance compare to ART accelerator already existing on ST cortex m-4? (In STM32F4 documentation I can read �Based on CoreMark benchmark, the performance achieved thanks to the ART accelerator is equivalent to 0 wait state program execution from Flash memory at a CPU frequency up to 180 MHz.�).

Thanks for your help

#stm32f7
4 REPLIES 4
stm322399
Senior
Posted on October 15, 2015 at 18:34

Edouard, the short answer is yes.

F4 and F7 both can run code from Flash using ART, that provides near to zero-wait access to instructions. This is the recommended setup

F7 can additionally execute code from ICache, by-passing ART. The Flash controller can serve either ART or ICACHE(through AXI).

Cortex-M7 is a dual issue superscalar CPU, compared to Cortex-M4 that can only run a single instruction at once. In other words it can executes two instructions in parallel when they have no dependencies each other. This is the feature from which you can expect the most performance improvement, with the same frequency point. Of course, do not expect x2 in performance, but depending the workload, somewhere between x1 to x2.

stm322399
Senior
Posted on October 15, 2015 at 18:51

I realized I failed to give the clear answer: ICache will not bring (significative) improvement over ART. One can even argue that accesses from ICache to the Flash will increase the AXI contention, which might slow down data path !

Avoid using ICache to run from Flash (ART and AXI accesses to Flash have separate memory region). ICache will bring benefit when code is run from SRAM or DRAM.

Posted on October 15, 2015 at 18:52

If you buy into the benchmarks an F7@210MHz is 2x an F4@168MHz

Synthetic benchmarks are a salesman's tool. The only thing that matters is your algorithm and how it performs in reality. The only way to really understand that at a system level is to measure it.

The ART in the F4 does a very good job of masking the slowness of the FLASH, and provides a prefetch path that out strips an SRAM cycle (ie it already has the data and pushes it, rather than having to go fetch it).

The F7 has caching architected into the design, and it is tightly coupled with other features of the processor, and works across different memories uniformly.

An M7 with a 64-bit FPU would be significantly more dangerous than an M4 in math heavy applications.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
edouard2
Associate II
Posted on October 16, 2015 at 17:39

OK, thanks for your answer, I have now a better understand of the difference between F4 and F7.