2018-11-19 03:15 PM
Hello. I am using STM32F4279ZI MCU with FreeRTOS. I want to compare efficiency of bare-metal and OS based program implementation. While doing some tests I have noticed that the same function called from bare-metal aplication is much slower than that called from FreeRTOS despite the assembler code is the same (I turned off all compiler optimalization).
After more detailed tests I have noticed that LDR and STR asm instructions execute 1 cpu cycle longer in function called from main() function than that one called is FreeRTOS task.
I measure execution time in cpu cycles with DWT->CYCCNT register.
Do you know or do you have any idea what is the reason of this difference?
Here is my example code. I deleted all unnecessary code:
void foo()
{
uint32_t j = 0;
uint32_t i;;
DWT->CYCCNT = 0;
i = 0;
for (; i < 10000; i++)
{
asm("NOP");
asm("NOP");
asm("NOP");
asm("NOP");
asm("NOP");
asm("NOP");
asm("NOP");
asm("NOP");
asm("NOP");
}
j = DWT->CYCCNT;
printf("%d\r\n", j);
}
void task(void* param)
{
foo();
for (;;)
{
};
}
int main(void)
{
foo();
xTaskCreate(task, "task", 100, NULL, 1, NULL);
vTaskStartScheduler();
}
Solved! Go to Solution.
2018-11-21 04:18 AM
> Is any documentation about it?
I know of none. I attempted some "benchmarking" back then too; it indicated that SRAM2 in 'F407 as compared to SRAM1 has a penalty of 1 cycle for the first consecutive access (this might be related to the way how its bus arbiter locks/moves around arbitration, this is an intimate detail of the bus matrix implementation/setup which is very unlikely be shared by ST). I guess the same may apply for SRAM3 in 'F42x/43x.
https://community.st.com/s/question/0D50X00009hnJN6SAM/how-many-cookies-to-feed-st
https://community.st.com/s/question/0D50X00009XkedySAB/reproducing-loadstore-timings-claimed-by-arm-on-stm32f4
These chips are very, very complex and there might be other mechanisms impacting timig involved.
JW
2018-11-21 05:23 AM
Ok. I will try to locate global stack in the same memory area as FreeRTOS locates its own. If results will be equal in both implementations, your are right. If not, I will try to contact with ST or ARM.
Is that link correct? I don't see any connection with the topic.
2018-11-22 06:15 AM
@Community member Thank you for your help. The problem is in global stack localization. When i changed stack's address from 0x20030000 to 0x20010000 (from SRAM3 to SRAM1) executon times are equal. In STM32F427xx STM32F429xx Datasheet in 'Multi-AHB bus matrix' section is shown that SRAM1 is connected additionaly with core with D-BUS. SRAM2 and SRAM3 are connected only with S-BUS.
2018-11-22 03:45 PM
Sorry for the bad link.
> SRAM1 is connected additionaly with core with D-BUS. SRAM2 and SRAM3 are connected only with S-BUS.
That does not explain why (and under which circumstances) SRAM2/SRAM3 are *slower*.
SRAM1 is accessed through the D/I ports when mapped to the 0x0000000 area.
JW