What is the point of high CPU frequency when Flash Latency slows the CPU down?

MMill.1 · ‎2020-02-11

I just got started with STM32 and I was trying to make 1 second delay with " for() " loop based on CPU frequency with STM32F407 . I realized It takes more clock cycle to perform one loop in higher frequencies than lower frequencies. For example " for(i=0;i<1;i++)" takes 4 or 6 clock cycle when the system clock is 16Mhz

then I increase the system clock to 168 Mhz with PLL

and set flash Flash Latency to 5 according to AN3988.PDF and the same loop takes 36 clock cycle!

I'm I right?

if so then what is 168Mhz good for?

Ozone · ‎2020-02-11

> I just got started with STM32 and I was trying to make 1 second delay with " for() " loop based on CPU frequency with STM32F407 .

Not a good idea.

Like other MCU vendors, the STM32 core accesses the Flash in larger word sizes then the core register size. In ST's case, as 128 bit words.

Think of it as a kind of L1 cache.. ST calls this method ART accelerator. Flash latencies only hit when the pipeline is cleared, because of branch/jump instructions.

For proper delays, better use a timer.

Tesla DeLorean · ‎2020-02-11

Wait until you use a processor with a cache or is super scalar, it will blow your mind.

Heavy interrupt loading may also significantly effect the loop time.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

S.Ma · ‎2020-02-11

In a non-ideal world, tradeoffs are common.

No delay? Put the code in the RAM ! Need bigger RAM?

Use a bigger chip with higher price.

168MHz will be achieved at zero wait state.

Andrei Chichak · ‎2020-02-11

The architecture gives you a system tick timer for free, you could just use that. You COULD try and count NOPs on a superscaler processor, it's not going to go well. But the compiler is probably going to optimize out your for loop since it doesn't do anything useful. You COULD turn optimizing off, but then why would you want 168MHz at all? You're throwing away cycles all over the place at that point.

If you want to avoid FLASH latency, don't use FLASH. This isn't a new problem, look at EEPROM chips in the past. A good one had a 100ns cycle time, that's 10MHz. You're going to take wait states with a fast processor. If you don't want wait states, transfer your code to internal static RAM. It's small and expensive, but fast. There are tradeoffs with computers.

Even your desktop box has multiple levels of cache to attempt to keep the processor fed. DDR4 memory gets clocked at 800-1600MHz, how does that avoid wait states with a 4000MHz processor? Some combination of magic, slight of hand, and it doesn't.

But if you feel that your processor is wasting cycles waiting for cache, slow it down or buy a slower processor.

berendi · ‎2020-02-12

Real world microcotroller applications spend most of theiir time waiting for some external event to happen, and have only a handful of functions where things have to happen really fast.

According to the STM32F407 datasheet, internal SRAM is always accessed with 0 wait states. If you need it, move critical code to internal SRAM.

Danish1 · ‎2020-02-12

Another point worth making:

FLASH memory generally is prefetched. Where the cpu is executing one instruction after another (without any branches) the prefetched instructions can be executed without delay.

It is only where the program branches (loops, conditional code) that prefetching doesn't help and you have to wait for the FLASH read to take place.

In fact the arm/thumb processors have special machine-code to allow short sequences of conditional instructions without branches and so without penalty.

Your tight "for" loop has to take one branch each time round the loop. That's the worst possible case.

But put some processing in the loop and the processing will get done without wait-states, so the benefit of clock speed will be seen.

Edit: I now see that Ozone made this point already. But it is the key point. Run real code any you do see real speed benefits whether running from FLASH or RAM

MMill.1 · ‎2020-02-12

Hi

Thanks for answers.

As I go more into these stuff more I get confused.

I believed generally program is stored in flash and ram is used to call data and do operation on them and by power of everything in ram will be deleted!

speaking of C language and ARM toolchain, where variables and program are stored and how can I put my code into ram?

Pavel A. · ‎2020-02-12

> how can I put my code into ram?

Any decent debugger can load your program into RAM if you link it to the RAM addresses.

If you have a GCC based IDE, it may provide variant of linker script for RAM.

If you want to run off the RAM permanently, you'll need to make some sort of bootloader that copies the program to RAM from flash, SD-card and so on.

-- pa