2022-04-08 01:10 PM
I have an application on STM32H750 which runs out of QSPI in XIP mode. After init, I enable the cache with SCB_EnableICache() and SCB_EnableDCache(). After this point, the application goes to a continuous loop which just calls the same small function repeatedly, so I expect this loop and function should be executing completely out of cache. However, I noted that the data lines to the QSPI are still active. If the data in QSPI has been cached, I expect no need for the system to access the QSPI interface...does this mean the cache is not actually working? Do I need to enable a DMA to support the cache mechanism?
2022-04-08 01:29 PM
Is your vector table in QSPI as well? Getting any interrupts (systick...) ?
2022-04-08 01:37 PM
Yes, the application's vectors are also in QSPI, and the SCB->VTOR has been set accordingly. I have a timer which is running by IRQ and it is running.
I may have a misunderstanding on my part. I was trying to run a test function to measure the difference in performance between the QSPI and internal flash with and without cache enabled. I just tried running the tests with the MPU disabled (commented out the call to MPU_Config()), and now the QSPI execution speed is MUCH slower. This might mean the cache was working all the time even when I wasn't expecting it. It also appears the QSPI is being accessed much more often than my previous measurements.
To your question, are you thinking that the NVIC is causing the accesses to the QSPI...and that the interrupts are not cached...or that the MPU is disabled during IRQs?
2022-04-08 01:40 PM
See MPU_Config() section here
STM32Cube_FW_H7_V1.8.0\Projects\STM32H750B-DK\Demonstrations\MenuLauncher\Core\Src\main.c
Seem to recall there being an App Note also
2022-04-08 01:52 PM
https://community.st.com/s/question/0D53W000010uOOcSAM/stm32h750-mpu-xip
My take aways are that you first need to establish the region as 256MB (full 0x90000000 decode space) to be strictly ordered, and then get a readable/executable region that is the size of the underlying device. Perhaps 16MB, or whatever portion you're using for code.
2022-04-08 02:08 PM
> are you thinking that the NVIC is causing the accesses to the QSPI.
If your vector table is in QSPI and all handlers are in QSPI, then the systick interrupt hits it every millisecond. At least.
> I just tried running the tests with the MPU disabled (commented out the call to MPU_Config()), and now the QSPI execution speed is MUCH slower. This might mean the cache was working all the time even when I wasn't expecting it.
Need to check the default attributes of the QSPI area - is it normal or device?
ICACHE won't enable itself unless you enabled it explicitly.
2022-04-08 02:12 PM
According to the Reference Manual, the QSPI region is Normal, Write-through cache, Execution enabled.
I would have expected that once the interrupts were hit the first time, they would continue to be executed out of cache afterward. Is this not the case?
2022-04-08 02:17 PM
I think I understand the recommendation, however, I don't think I am in danger of getting speculative reads in the unused region since I am using only the first 23K of the available 4MB. However, I'm not familiar with how the speculation works...maybe it is still getting hits to an otherwise unused or unneeded area.
I will adjust my MPU_Config() to use a similar setup and see if it has any impact on performance when cache is enabled.
2022-04-08 02:30 PM
> ICACHE won't enable itself unless you enabled it explicitly.
In this case, I have an application and a bootloader. The bootloader does not touch the MPU or the cache functions. The application configures the MPU, runs a couple tests, then enables the ICACHE and DCACHE, runs the tests again, then goes to the endless loop.
Since I am getting the test results with the debugger, I suspect that if the application enables the caches before I connect with the debugger, the caches may still be enabled after debugger connection since nothing is explicitly disabling them. I'm not clear if the debugger reset method will cause the MPU and/or caches to be disabled when I reset to run the tests or not...need to check on it. Probably I need the Bootloader to explicitly disable the MPU and caches to ensure the proper sequencing for my tests.
2022-04-12 06:18 AM
I have made the recommended adjustments to the MPU_Config(). I don't see any performance difference between my original configuration and the new one. I have a simple loop function:
void ExecTest() {
u32 test_count = 0xFFFFFFFF;
while(test_count--);
}
I have a copy of this function in QSPI and in internal Flash. I use a timer to measure the execution time (read the time before the call, then after, compute the difference). Optimizations are off, CPU is running at 120 MHz, caches are enabled.
Execution Time from Internal Flash: 53.775 seconds
Execution Time from QSPI Flash: 71.695 seconds
I don't understand why I am seeing +18 seconds execution time with the QSPI when running the function from QSPI flash, unless the cache is missing a lot...but there's not much data here, it shouldn't have to be fetching data from the QSPI much...or at all. Any ideas why the QSPI cache performance is low? I've seen other forum posts where I think people have noticed similar performance reduction of the cache when using QSPI.