2020-03-06 11:43 AM
In the h745 reference manual, the RAM region of memory is write-back while the code region is write-through. According an app note, cache coherency issues can be avoided by using a write-through ram region. In the h7 the sram sections are aliased in the code region.
Does this mean that i can simply use the aliased addresses to get write-through D2 ram access?
2020-03-06 01:03 PM
Generally you need to use fencing instructions to clear the write-buffers. Usually better to be aware/explicit and use Clean DCache by Address on buffers you actually use for DMA Read, or cross-core comms. Watch also for 32-byte granularity. Use Invalidate DCache with caution, it can break surrounding structures/variable, and break pending stack pushes.
I haven't taken the time to evaluate all modes of behaviour
2020-03-06 01:23 PM
Thanks Clive, maybe i asked the wrong question.
In my application the m4 is doing all the real time stuff and i/o, using a semaphore to alert the m7 that there is new data to process. What is the fastest way to get 8 floats from SRAM3 to the m7 TCRAM? Just 32 bytes, so it seems that the best way would be to make sram3 non-cacheable, but i may very well be missing something.
2020-03-06 08:12 PM
The M4 core doesn't cache, so:
1) If you can take the speed hit without issue, you can just disable data cache on the M7 core and read it after the M4 is done (via semaphore notification).
2) If you want data cache enabled, on M7 core, wait for semaphore notification, then invalidate that region and read it. You could invalidate prior to the semaphone notification to slightly speed things up. You should make the structure line up with cache boundaries to make the invalidate "safe" with regards to messing up other data, as clive1 states.
2020-03-06 08:15 PM
The way I read the reference manual, your understanding is correct. You can choose which region to address to get write-through or write-back. I have not tested this.
2020-03-06 11:03 PM
There is no cache on the M4 side, but a write buffer. Issue __DMB() to ensure that the data hits the SRAM.
However, the M7 core would still access SRAM through the L1 cache, but there are a few ways around that.
1. You can set just the buffer as non-cacheable in the M7 MPU.
volatile float *buffer; // this should be aligned to 32 bytes
MPU->RBAR = ((uint32_t)buffer) | MPU_RBAR_VALID_Msk; // using region slot 0
MPU->RASR =
MPU_RASR_XN_Msk | // 1: Instruction fetches disabled
(3u << MPU_RASR_AP_Pos) | // Full access
(4u << MPU_RASR_SIZE_Pos) | // 32 bytes. Table 90 in PM0253 rev5
MPU_RASR_ENABLE_Msk |
0; // TEX,C,B,S are 0 meaning strongly ordered, this is the safest thing
You have to do it only once, assuming that the buffer address does not change. Increase the region size if you have more buffers.
The full documentation of the MPU registers is in the PM0253 STM32F7 Series and STM32H7 Series Cortex®-M7 processor programming manual.
2. invalidating cache lines is IMO slow and difficult to get right, so I'd just skip that
3. Use MDMA to move data from SRAM to TCM.
MDMA can read/write TCM RAM inside the M7 core. Both cores have access to the MDMA registers, so either core can start the transfer. TCM accesses do not go through the L1 cache, so just issuing __DMB() before accessing the buffer ensures that the M7 reads valid data.
2020-03-07 04:50 AM
Thank you for MPU code snippet. I'll try that if the aliased ram turns out not to be write-through.
It would be great if someone from ST would let me know if I'm reading the reference manual properly as this could be a super simple way of avoiding cache coherency problems with the M7 and D2 SRAM
Interesting to know that the M4 can control the MDMA. I guess that means i could replace the semaphore and reading SRAM3 with an MDMA transfer complete interrupt on the M7 with the transfer initiated by the M4?
Thanks again, it's much clearer now.
2020-03-07 05:51 AM
Note that write-through does not mean that the read cache is deactivated. Even if the M7 reads SRAM through the aliased region, it will still go through the read cache.
Initiating the MDMA transfer from M4 and getting the interrupt on the M7 should work, just be careful there so that they won't attempt to update the same register at the same time.
2020-03-07 06:17 AM
Thanks again.
These dual core H7s are hella complicated compared to the F7s. I will try the MDMA, seems like a nice tidy solution. In my application, new data arrives via slave SPI from an FPGA at about 26Khz with the M7 doing the signal processing, so I'm interested in finding a way to get data in and out of TCRAM with the least M7 processor overhead. I think two MDMA channels are the best - one initiated by the M4 at the beginning of the cycle, the other by the M7 when it is finished calculating.
2020-03-07 06:17 AM
Actually cache invalidation must be done before starting the DMA transfer, because otherwise cache eviction can corrupt the data at any time during the transfer.
Disabling the D-cache as a whole reduces performance approximately twice...