I try to find the performance difference between running in the flash and running in the RAM for STM32F407.
I write a test source code. The function runs in the flash or in the RAM.
I find that the performance in the RAM is about 20% poorer than in the flash when the CPU frequency is 168MHz.
The datasheet describes that "the performance achieved thanks to the ART accelerator is equivalent to 0 wait state program execution from Flash memory at a CPU frequency up to 168 MHz".
And the datasheet also describes that "RAM memory is accessed (read/write) at CPU clock speed with 0 wait states".
Both the RAM and flash memory are accessed with 0 wait states.
Why is the performance in the RAM poorer than in the flash? Is it reasonable?
The compiler is IAR Embedded Workbench for ARM 188.8.131.5235 . optimization=high.
My source code:
/* Initialize Leds mounted on STM32F4-Discovery board */
GPIO_PORT->BSRRL = GPIO_PIN;
GPIO_PORT->BSRRH = GPIO_PIN;
__ramfunc void code_to_be_measured()
volatile unsigned long int l_count, l_count_max=1000;
volatile int a,b,s;
volatile unsigned int j;
a = 1000;
b = 2000;
s = a + b + l_count;