cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F7 maximum speed

1991red1991
Associate III

Recently, I began to deal with the STM32F7 stone, as the STM32F4 lacked speed. By the result, I realized that if you simply run the same program (calculating digital filters) on F4 and F7 at the maximum core frequency, then the performance does not increase much. But if you enable the first level cache on F7, the calculations are much faster. The question is how this cache is used? Its only 4Kb for RAM and 4Kb for flash. How does the compiler decide what to load there? I also read on F7 that TCM memory can be used to increase performance, but I can’t figure out how to work with it not in Keila, not in stmIDE. Can anyone explain how to work with TCM memory?

8 REPLIES 8
S.Ma
Principal

Look first at your architecture, and answer these questions first:

  • Do you have lots of DMA transfer?
  • Do you have lots of interrupts firering frequently?
  • Do you compute wiith float or double?
  • Do you use external serial memories?
Piranha
Chief II

Compiler doesn't decide what to load in cache memory - CPU does it in hardware. Using instruction cache doesn't require anything special at all. Using data cache will not require anything special for most of the code, but requires special care for DMA buffers. For that user can configure memory regions with different properties in MPU and/or call clean/invalidate functions on memory blocks. Using latter will also require DMA buffer size and position to be aligned to cache line size, which, by the way, is one of many things, that ST's code monkeys are not able to grasp.

The compiler doesn't decide. It is part of the CPU and it is designed to hold proximate and recently used memory content. Perhaps take a course on micro-controller architectures, or a book on the topic. Think of it as taking a handful of candy out of a bag so you can pick the pieces from your hand rather than picking individual pieces from the bag each time.

The 4KB RAM/FLASH also doesn't make much sense, so I don't know what documentation you're reading.

Different memory runs faster and is closer to the core (less bus transactions, less delay interacting with other bus users). The TCM is close to the core, code and data placed here can be fetched efficiently, so isn't cached, and the cache can focus in serving the slow memories.

The FLASH has a different caching/fetching mechanism by using very wide access to the array, it is slow, but can fetch 128-bits at a time, first word is expensive, subsequent word(s) are free. It holds a small array of recently fetched word lines.

The LINKER can be used to place things in different memory area, you would use a scatter file or linker script to direct this. On the compiler side you could pass information to the linker via #pragma or attribute directives. Check you compiler, linker and tool chain documentation.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Spooky action at a distance.....

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
MikeDB
Lead

If you are doing serious DSP where speed is important, forget the F7 and get a H7. Not much more expensive but seriously more powerful for DSP work as it has two 16k caches which the core of most DSP programmes will happily sit in, and it's dual issue.

F7 is also dual issue, because it's a feature of ARM's Cortex-M7 core, and has 16 KB I/D caches. Also it should be noted, that peripherals and internal architecture of H7 significantly differ from all other series.

In some way, yes, because almost everyone else (not You and few others) are quiet and submissive to this deplorable attitude to software from ST. Someone has to say it loud: "The king is naked!"

Talking about those, who are not at a distance... If their software division management would be thinking, they would try to hire You and other smart and capable people even from this forum. Even with an extremely good salary that strategy would be cheaper and much better than a whole division of their code monkeys. For example, this topic has ST employee's "advice" at the end:

https://community.st.com/s/question/0D50X0000AnsIJeSQM/how-to-get-ethernet-working-again-after-upgrading-to-firmware-fwh7v140-

But it only serves to demonstrate that ST's software developer is not even able to understand the problem and what others are talking about...

Sorry yes F7 is dual issue, and I accept the CoreMarks are identical, but with only one DTCM RAM it has to do the second issue to some other part of memory and so some speed can be lost, especially if the core DSP code is sitting in the cache so the ITCM is sitting idle. I started my current project with F730 but soon jumped to the H750 for it's superior bang/buck.