2019-09-26 05:14 PM
https://community.st.com/s/question/0D50X00009XkgpsSAB/fatfs-sdio-dma-write-speedI'm trying to write WAV files to SD card on my STM32F7 Discovery board and I've finally gotten round to setting up a DMA + FATFS + SDMMC. My sampling rate is around 1Msps on the 12 bit ADC and I want to know if I'll be able to achieve this with the STM32 and a standard Class 10 SD Card.
My issue is I can't seem to understand how people benchmark the read and write speeds. Is there a code template somewhere? I've searched on the forums and I came across this but I can't seem to understand while f_sync is being used and why so many while loops at the end. Any help is appreciated.
Solved! Go to Solution.
2019-09-26 05:50 PM
f_sync() is generally pretty brutal and should be avoided. It is an equivalent to an f_close, as it flushes internal structures, fat tables, directory entries, etc.
The critical path to speed is to write large aligned blocks.
Doing small f_write() less than 512 bytes and spanning 512-byte sector boundaries being the particularly worst performers.
FatFs does not cache. Single sector IO command overhead predominates. FatFs maximum run length is a cluster.
Own the buffering, the file system and IO subsystem don't.
Best performance sweet spot, write 32 KB blocks, on 32 KB boundaries. Same for reads.
For course timing use Systick, measure across f_open, f_write(s), f_close
For fine timing on fast systems us DWT_CYCCNT.
I time read/write of 32 MB as 32 KB blocks. It will give you a performance ceiling for your implementation. Usually constrained by bus clock, bus width, and card.
Write speed on cards are often much lower than reads, the cards juggle 128KB erase blocks internally.
On premium cards I can probably sustain 15-20 MBps Writes and 25 MBps Reads, more realistically you might hit 6-7 MBps Writes. With dumb buffering you'll likely be below 1 MBps
2019-09-26 05:50 PM
f_sync() is generally pretty brutal and should be avoided. It is an equivalent to an f_close, as it flushes internal structures, fat tables, directory entries, etc.
The critical path to speed is to write large aligned blocks.
Doing small f_write() less than 512 bytes and spanning 512-byte sector boundaries being the particularly worst performers.
FatFs does not cache. Single sector IO command overhead predominates. FatFs maximum run length is a cluster.
Own the buffering, the file system and IO subsystem don't.
Best performance sweet spot, write 32 KB blocks, on 32 KB boundaries. Same for reads.
For course timing use Systick, measure across f_open, f_write(s), f_close
For fine timing on fast systems us DWT_CYCCNT.
I time read/write of 32 MB as 32 KB blocks. It will give you a performance ceiling for your implementation. Usually constrained by bus clock, bus width, and card.
Write speed on cards are often much lower than reads, the cards juggle 128KB erase blocks internally.
On premium cards I can probably sustain 15-20 MBps Writes and 25 MBps Reads, more realistically you might hit 6-7 MBps Writes. With dumb buffering you'll likely be below 1 MBps
2019-09-26 05:58 PM
For the F7 specifically, put your buffers in the 64KB or 128KB DTCM RAM when using DMA. Make sure to be 4-byte aligned, f_write() will pass-thru large buffers.
Polling has bus and interrupt loading issues, you can't stall the transfers. The FIFO provides some protection, but you can't take your eye off the ball. Remember HAL callbacks are done under interrupt context, so need to be short and sweet, stalling/blocking will result in data loss.
2019-09-30 03:23 AM
Thanks Clive, used this method and I got sustained 14 - 16 MBps on my Sandisk card.