Weird cycle behavior when running code from SRAM. 4 cycles per instruction instead of 1. What could be the cause?
Board: Nucleo-144
Chip: STM32L4R5ZIT6P
Hey all!
I have a curios problem when executing code from SRAM.
My firmware lies in flash and occasionally jumps to a externally loaded binary in SRAM. As the cycle count of this binary is quite crucial, I used the DWT to read the cycles and while stepping through the code I realized that when I execute code in SRAM it takes exactly 4 cycles for a simple instruction (like nop, add, it takes more cycles for load/store ops) and not 1, as you would expect. I find this behavior quite weird and I can't really explain why execution takes longer in SRAM.
More details:
I created a section in SRAM1 (from 0x20000000 to 0x20030000) to hold the dynamic binary. This should be fine as SRAM1 is connected to the I-Bus and the D-Bus, the only thing that rung the alarm-bells was that the reference manual (rm0432,revision 9) states on p.107, that physical remap should be enabled for maximum performance.
I didn't want to do this, so I moved the section to SRAM2, where (according to the reference manual, same page) execution can be performed with maximum performance. However, this didn't solve my problem.
Furthermore, I thought it could have something to do with my clock configuration, but I couldn't find any problems there. The MCU is running at 1 MHz using the MSI oscillator, configured by CubeMX.
I am running out of ideas on what to try and what causes this inconsistency. Cycle-behavior in flash is how you'd expect it.
Does anybody have any idea on what might cause this sort of cycle behavior?