2017-08-25 1:53 AM
Hi all,
I am trying to get a hight speed write to SD via SDIO as well. I use a project generated with Cube MX, 4bit SDIO and DMA. But I only get a write speed of about 300 kBytes/s. I write 1000 times a 100 byte block, because I need a write about every 100 us. With a 100 byte block I have a write time of about 25 us, but every fifth time over 1 ms. Weird...The time span to the 1 ms write decreases the less bytes I write.
When I look at the trace of TrueStudio, I see that more than 60% of the time is spend in the function SDMMC_GetCmdResp1, compared to only 0.05% in BSP_SD_WriteBlocks_DMA... (Screenshot attached)
I am running it on a STM32f429IGT at 180MHz with 8MHz external crystal. My clocksettings for the main clocks are 8MHz HSE, M=8, N=360, P=2, AHB Pre=1, APB1 Pre =4 and APB2=2 -> 90 MHz -> its this to high?.
Clocksetting for SDIO is 0, as CubeMX advises.
What would the max speed be for a write to SD card via SDIO?
I use a 16Gig SD card with SpeedClass 10 from SanDisk.
I have played around with some settings of FatFs, e.g. set the _MIN_SS and _MAX_SS to 4096, then I have a write speed of about 2MBit/s, but the card is not readable in Windows anymore.
I have tried another example from the CubeLib V1.16/Projects/STM324xx9I_EVAL/Applications/FatFs/FatFs_uSD, copied the write loop from the previous project and had a write speed of about 1 MByte/s - yesterday. Today I tried to compile it with optimize level 2 and now I have the same write speed like in the previous project, of about 300kByte/s. This is weird.
Has anyone an example?
Thanks a lot,
best regards
Andreas
2017-08-25 2:07 AM
Try to write by bigger chunks, it will greatly increase the performance. For example, you can have two 4KB buffers (or smaller, or bigger depending how much free RAM you have). Fill buffer 1 then when it's full - issue a write command, and while it's being written fill the buffer 2 then when it's full issue a write command, and while it's being written fill the buffer 1, etc..
Also see these threads:
2017-08-25 3:16 AM
Hi Jive,
thanks for your reply.
I have tried with larger blocks, 1024, 2048, the speed increases a little bit, not much.
And the time for each write increases and the peaks of more than 1 ms writes increase as
well. I only have about 200 us for one write, maybe less.
When I switch to 2048 I always get a Disk_Error.
Edit: Implemented now the waiting that the DMA and the sd write finishes, works now with
bigger blocks of e.g. 4k and an increased write speed of factor 5. But each write takes
several milliseconds then, which might not possible.
Is this the only way?
2017-08-25 5:58 AM
Separate the thread that accumulates the data from the one that writes the data. Don't call blocking code in the IRQ or callback routines.
When doing say an ADC recorder, the DMA IRQ Handler manages the collection, and then a foreground tasks flushes to disk.
Have a big enough buffer that you can hold several KB of aligned data, if the data doesn't fit cleanly have enough of a spill buffer so you can write whole KBs, and then copy down the excess into the front of the next buffer. I'd use up to 32KB, but 16 or 8 might suffice, it is a trade off between speed and resources, and you need to determine the sweet-point for your application.
The SDIO and FATFS code is likely not reentrant.
The Q tap of the PLL feeds the SDIO clock
2017-08-27 11:19 PM
Hi Jive Tihs and Clive One,
thanks for your reply.
Then, when I understand it right, the full speed is only available when writing large blocks and not with small blocks e.g. less than 100 bytes via SDIO and the ST FatFs?
What do you think are the transfer peeks shown in the screenshot attached?
I might have to speed up SDIO and FatFs by going an own way with less calls to card status and handling the Fat accesses by myself?
2017-08-28 1:51 AM
External flash memory uses a variety of chips 29 and 39 series. In these chips, recording and erasing occur in blocks. The block size depends on the chip, and is in the range from 1kbit to 4kbit. To accelerate the flow - μ on the map has its own buffer of 4x by the size of the block of the flush. Two buffers are designed for continuous recording / preparation for data writing, and two buffers for reading.
Flash memory can simultaneously write data and read from another area.But for the correct operation of the memory card - it is necessary to read user commands from it. They differ from the standard for everyone, and are intended only for the card with which they were removed.All cards have different teams, and a different way of using them. This is what happens when you connect a new flash card to a large computer - search for a driver (communication algorithm).Reading custom commands in manual debugging mode is quite an entertaining exercise.As a minimal optimization, it makes sense to use the size of the cache coinciding with the maximum version - 4kbits (512bytes). More precisely - do not use smaller dimensions.
