Performance characteristics of SDRAM on STM32F7508-DISCO board
- August 26, 2020
- 1 reply
- 5292 views
Below is my benchmarking of multply-accumulate performance on contiguous memory blocks on the STM32F7508-DK board for three different types of memory (on-chip SRAM, external SDRAM managed by the FMC, and QSPI-connected NOR flash):

The horizontal axes give the size of the contiguous memory region operated on, and the vertical axes gives the number of millions of multiply-accumulates per second.
One observation that makes sense to me is that performance in all cases drops markedly once the contiguous memory block grows beyond 2^12 B = 4 kiB = the cache size.
The primary thing I don't understand is why the external SDRAM performance is so much worse in the small-size region. Can someone elaborate on this?
The board, SDRAM and NOR flash are all initialized by the STM32CubeF7's BSP functions and templates for the STM32F7508-DISCO board.
While the absolute numbers differ, the overall qualitative behavior is the same across optimization levels from -O0 to -O3.