2026-03-24 11:36 AM - last edited on 2026-03-25 2:56 AM by mƎALLEm
Ok, this is perhaps esoteric, so maybe there is some research out there.
I am looking at STM32U3 or STM32C5, but the problem applies to pretty much all CM33 based ST MCUs.
The bus matrix implies a fast multiplexor for GPDMA (and SDMMC) to SRAM1. So unless dealing with burst transfers to SRAM (yes, SDMMC does that, and the GPDMA channels with the 32 byte FIFO could do that if the transfer is a multiple of 4 bytes), this fast multiplexor seems to save 1 clock reducing a beat to 2 clocks (if somebody has details for AHB vs. APB, that would be interesting). So to minimize latency and maximize bandwidth, DMA buffers (.dma section) should go into SRAM1.
So if DMA goes into SRAM1, then should not all the stacks go into SRAM2 ? I would assume burst DMA to/from SRAM1 would introduce some additional latency, especially in the way for ISR stacking / unstacking. Is this just overthinking this, or is that measurable ?
Now some U3 parts have SRAM3. One thought is to allocate DMA buffers for SDMMC from SRAM3, as SDMMC deals with 512 byte chunks, so burst transfers. By letting GPDMA stay in SRAM1, the longer bursts from SDMMC should not affect GPDMA.
In a scheme like this, where should .data/.bss/.no init go ? SRAM1, where there is lower latentency, but less bandwidth, or SRAM3, where there is potentially higher latency ?
In general is there some analysts out there looking at .stack vs. .data/.bss/.noinit vs .heap latency/bandwidth ?
2026-04-14 8:19 AM
Hello Thomas,
On STM32 CM33 parts with multiple SRAM banks, memory placement mainly matters because of bus contention, not because one SRAM is universally “faster”. If GPDMA/SDMMC have a preferred path to SRAM1, it makes sense to reserve SRAM1 for high-throughput DMA buffers. Putting the ISR stack in SRAM2 is often a good idea because exception stacking/unstacking can otherwise contend with DMA bursts to SRAM1, increasing worst-case interrupt latency. Large .bss, heap, and filesystem/work buffers are good candidates for SRAM3 if present. SDMMC buffers in SRAM3 can help only if the bus matrix still gives SDMMC good access there; otherwise peak throughput may drop. In practice the gain is usually measurable more in latency/jitter than in average throughput, so the best answer is to benchmark ISR latency and DMA throughput with different linker placements.
Regards,
Stassen
To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.