cancel
Showing results for 
Search instead for 
Did you mean: 

Very slow SDMMC Read performance on STM32F746NGH6

anonymous.8
Senior II
Posted on January 26, 2017 at 20:12

Hi all. I am using the

STM32F746NGH6

 on the STM32F7 Discovery board to read from a micro-SD card through FATFS.

I am able to get only around 1MByte/sec read performance when reading blocks of 512 bytes. My application requires at least twice this read speed.

I am accessing a file through Chan's FATFS and the block size is fixed at 512 bytes. My SDMMC driver code is basically bare metal, so all the STD PERIPH and HAL crap has gone. It is pared down to the bone, as it were. I am using the 4 bit mode and the high speed clock mode and DMA mode reads. The clock speed, as verified by my logic analyzer, is indeed around 50MHz.

Given the above configuration, I should get blazing read performance, but the elapsed time to perform the 512 Byte read is sluggish, around 430 - 660 us. This is verified both by a hardware timer and an LED turning on, then off, after the block read and measuring that LED on time with my logic analyzer.

I am using a Class 4 micro-SD card.

I have attached a JPG showing a screen shot of the SD card clock. That image verifies the clock speed as about 50MHz, but interestingly it also shows that the clock comes in bursts of 7 clocks followed by a gap of 120ns. Thus the clock is not clocking in data continuously and therefore is losing about 1/2 of the available time to read the data in.

I have also attached a ZIP file with my SDMMC driver code.

Can anyone shed any light on why my SD MMC clock is so bursty and not continuous, or shed any other light on my poor read performance.

Thanks for any help you may be able to give.
14 REPLIES 14
Posted on January 30, 2017 at 01:34

Dear David Harrison,

Disable clock powersave mode,and obtain read/write buffer(about 16kByte or avobe) at DTCM region.

Best Regards,

Nemui.
Posted on January 30, 2017 at 03:40

Don't read/write single blocks, the command overhead will eat you. I benchmark with 32KB of blocks. Find a sweet-spot that balances the speed you want and the memory you have to spare, try 4KB, 8KB, 16KB and 32KB

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
dttworld
Associate II
Posted on January 30, 2017 at 03:49

Nothing wrong with the ST HAL libraries. I am getting 5 MB/s write speeds using a class 10 micro SD with the ST HAL libraries using DMA. That was not a typo, 5 mega bytes per second write speeds! This test was done on an STM32F7-DISCO board. Test was also done on an STM32F405 chip. I haven't done read speed tests since my application requires the files to be read by a PC but clearly you can get better read speeds than 1 MB/s.

Posted on January 30, 2017 at 19:43

Actually my code originated from the STM32F7 HAL SDMMC driver. I simply stripped out all the unecessary HAL clutter and reduced it to direct register access. But the algorithm and SD card read access process is exactly the same.

After looking at the problem some more, I have discovered that the issue is not bus bandwidth or memory bandwidth. I see very little change in overall read access time for a 512 byte block whether I use a1 bit bus, 4 bit bus, DMA or polled or standard speed or high speed Therefore I can only come to the conclusion that the bottleneck is not the code or the bus bandwidth but the SD card itself.

Here is where Clive's comment comes in - yes, I do remember that there is an overhead for any SD card access but I am surprised at how much that is. I am about to do more measurements to try and quantify it.

I have reasons for wanting a small buffer size and accessing 512 bytes at a time since a large buffer size creates latency elsewhere in the system. But I may to rethink my whole SD card access and sample processing strategy to allow larger SD card accesses but retain smaller DMA buffer dumps to my DAC.

I will post some result later after some measurements.

Posted on January 30, 2017 at 21:34

I don't think disabling the SD bus clock power save mode will do anything at all. All it means is that the SD bus clock will be running all the time, even when there is no bus activity.

Posted on January 30, 2017 at 22:21

Yes, indeed. When I Increase the buffer size to just 4KBytes, I get a raw SDReadBlocks access of 10MBytes/sec and through FATFS f_read of about 6MBytes/sec. However I can't directly using this size buffer without redesigning other parts of the system since that large buffer size currently causes large latencies elsewhere.

Thanks for reminding me of the huge SD card overheads in reading data.

Akos Miklos
Associate
Posted on February 07, 2018 at 18:16

Hello David,

I am working with the same STM32F7 Discovery board. I am using all the STD PERIPH and HAL. I also experienced the unexpected low speed at writing with DMA, although in polling mode I could reach 3-5MBps. So it does not lie on the card.

By debugging and benchmarking I found that there is a large overhead somewhere between two writes of clusters (Chan's FATfs splits the data according to the card's cluster size). So increasing the cluster size by formatting the SD card can help instantly, tough it is not the ultimate solution. By the way, it is not a solution if you HAVE to read/write 512byte blocks.

Remark: if I stopped the execution by breakpoints in some cases (when stopping at sending WRITE_MULT_BLOCK command) the overhead has gone. It seems that the annoying something has happened while pausing execution.

If somebody knows what exactly slows down the process, I would be very happy to learn it.

Best regards

Akos

Posted on February 07, 2018 at 19:52

There is a special SD Card formatter that takes into account the underlying erase block size, and the alignment of structures/clusters with that. Avoiding spanning these blocks, the card hides this management from you, but it slows things down.

For the L4R9 SDMMC implementation special casing single sector writes using SINGLE vs MULTI command made the performance significantly more robust.

For the F7 one needs to be cognisant of caching, using the 64KB SRAM at 0x20000000 for buffering helps there. I need to pull the current benchmarks for the STM32F746-DISCO implementation, but it is not slow, and doesn't stall or require pauses. One needs to be aware that DMA will complete before the FIFO clears, and some of the error call-backs, oddly, occur in normal operation.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..
Posted on February 08, 2018 at 03:29

So running the current F746-DISCO build

 FatFs Testing (68300)

SystemCoreClock: 216000000

Mounting FatFs

FAT32

   31.2 GB total drive space

   31.0 GB available

  109.5 MB used

Display Directory

----         90 /DIR.TXT

----   33554432 /TRILO.BIN

----   32768000 /COUNTER.TXT

D---          0 /RINEX

----        100 /LOG.TXT

CRC32 A534026F Memory Image

(Polled)

32768000 Bytes, 1014536440 Cycles

 6.98 MBps Write (FatFs)

4696 ms run time

 6.98 MBps (Sanity Check)

2048

CRC32 A534026F COUNTER.000

32768000 Bytes, 1155486544 Cycles

 6.13 MBps Write (FatFs)

5349 ms run time

 6.13 MBps (Sanity Check)

2048

CRC32 A534026F COUNTER.000

COUNTER.000

32768000 Bytes, 674931892 Cycles

10.49 MBps Read (FatFs)

32768000 Bytes, 674919666 Cycles

10.49 MBps Read (FatFs)

6260 ms run time

10.47 MBps (Sanity Check)

200005A8 200005C0 16384

CRC32 8192377E PKZIP 7E6DC881 TRILO.BIN

200005A8 200005C0 30720

CRC32 8192377E PKZIP 7E6DC881 TRILO.BIN

200005A8 200005C0 32768

CRC32 8192377E PKZIP 7E6DC881 TRILO.BIN

Done!

(DMA)

32768000 Bytes, 1116692486 Cycles

 6.34 MBps Write (FatFs)

5169 ms run time

 6.34 MBps (Sanity Check)

2048

CRC32 A534026F COUNTER.000

32768000 Bytes, 1203805583 Cycles

 5.88 MBps Write (FatFs)

5574 ms run time

 5.88 MBps (Sanity Check)

2048

CRC32 A534026F COUNTER.000
Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..