External SDRAM performance vs data width (8, 16, or 32)

Adrian Adamson · ‎2019-03-06

I'm adding external SDRAM to a project and I'm trying to decide which data bus width to choose, 8, 16, or 32 bits. Do the wider bus width options significantly affect external SDRAM performance? I have been poring over the AN4891 app note but it does not specify which bus width was used for the test of running code from the external SDRAM. I would compile and run the test myself but that would require (A) hardware I'm designing now and (b) the Keil compiler that I don't have.

Any advice would be appreciated.

Tesla DeLorean · ‎2019-03-06

On the F429I-DISCO, executing from SDRAM (16-bit) was about 6x slower than internal SRAM.

Unless you're using the Cortex-M7 devices the SDRAM will not be cached.

Several of the DISCO boards use 32-bit SDRAM but only wire half the data bus. The chip is perhaps cheaper, and the pins are at a premium.

If you are using this as a frame buffer 16-bit will throw away half the available bandwidth so will create a ceiling on performance and compromise other peripherals, using microSD cards in Polled mode gets to be very fragile.

Do you like large LQFP and BGA devices?

Do you like routing lots of equal length traces across your board?

Could you use an STM32 capable of executing from a QSPI memory?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2019-03-06

One can run benchmarks on any of the EVAL and DISCO boards with SDRAM, you can decimate the bus width so as to use a portion of it, you can compile and test code with GNU/GCC compilers. The evaluation version of Keil will build/debug apps up to 32KB in size with no problem, I don't recall it counting the heap in that limit, and certainly pointers wouldn't.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Adrian Adamson · ‎2019-03-06

Thanks for responding, Clive.

I should have mentioned up front I'm working with the STM32H7 with the largest packages. The software team I'm working with think they will need 32 MB to 64 MB each of NVM and RAM. I'm planning to cover NVM with dual port DDR QSPI and the RAM with external SDRAM. I'd like to use the 16 or 24 additional SDRAM data bus pins for other functions if the L1 cache mitigates the performance hit. This makes me all the more interested in what were the settings that were used in the AN4891 experiments. We should have pretty good CPU margin if the performance is in the ballpark of what they found with AN4891, but if I went down to 8 bits wide and made performance 4x worse for any data stored outside of the internal SRAM, I'd rather spend the pins.

waclawek.jan · ‎2019-03-06

Roughly speaking, SDRAM access consists of a relatively lengthy "prologue" where address, mode etc. is set and latencies are observed, say through some 6-8 cycles; and then data are transferred to-from successive addresses one cycle each, so that's then quite fast. So, if you are talking about churning in or out lots of successive data as in case of feeding LCDC or some DMA-driven peripheral or internal memory or whatever, then the data transfer prevails over the "prologue" and penalty for 8 vs. 16 bits is almost 2x (similarly for 32 bits vs. the others). But if we are talking about the processor going to SDRAM for some randomly addressed data, here and there, then it doesn't really matter whether it's one extra cycle to grab 16 bits through an 8-bit bus, the performance is gone long ago with the "prologue" (read: it's simply slow, you can't do anything about it, go for a principially faster truly random access = more expensive and physically larger and more consuming memory).

> if the L1 cache mitigates the performance hit

That of course depends on the application (contrary to the usual marketing lie, cache is not a magic tool to universally speeding things up). If you are going to quickly randomly access data across megabytes, then caching may actually making it worse (benchmarking tip: try incrementing every 1000th byte in the SDRAM in a loop, with and then without cache). If you are going to chew on a few kilobytes of data in a relatively lengthy process, then after some time turn to another few kilobytes, and so on, that's the case for cache showing up its strength. And, when that is the case, cache may help also with the narrower SDRAM data width, as the cache lines are then read and written in bursts.

JW

egoltzman · ‎2019-06-28

Hello Clive,

I'm interesting to know why you said: "using microSD cards in Polled mode gets to be very fragile."

Can you elaborate on that please?

Thanks,

Eyal

Ajay1 · ‎2020-07-20

Adrian,

Did you take any decision for data width? Today we are in the same position and need to decide on the SDRAM width. We are planning to H7 with 16 Bit SDRAM. The data transfer throughput will help us to decide on the maximum graphics resolution. Is there any data which can help us? Any pointer would be helpful.

waclawek.jan · ‎2020-07-20

> The data transfer throughput will help us to decide on the maximum graphics resolution.

AN4861? Not updated for the 'H7, but maybe as a starting point?

You may perhaps want to experiment with some of the development boards (Disco presumably) out there, too.

JW

Tesla DeLorean · ‎2020-07-20

Colour Depth vs Pixel Clock should provide you with a sustained bandwidth number.

Most optimistically clocked SDRAM is 120 MHz, isn't it?

What frame rate and resolution are you trying to get?

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2020-07-20

SDIO/SDMMC has *hard* timing deadlines, the FIFO provides some margin, but if the memory is heavily contended it can slow the read/write of memory buffer transactions, both in the loop, and other software running. The polled loop does byte packing/unpacking, and can be interrupted by higher priority tasks. I can induce failure with little effort, so I class it as fragile and likely to catch out the unwary.

DMA has to deal with the same bus loadings, but isn't going to be distracted, or cause the transfer to abort or fail.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..