cancel
Showing results for 
Search instead for 
Did you mean: 

stm32f4 bad sdio throughput

Kuikui
Associate III
Posted on August 23, 2017 at 17:55

Hi,

I'm using an STM32F446 @ 168MHz.

APB2 Frequency = 84 MHz

SDIO_CK frequency = 12MHz @ 4-bit lines = 48 Mbit/sec

Shoud give a reading throughput of approx 6MBytes/sec.

But using ST's SPL, reading 1 block ( of 512 bytes ), It takes approx 500µsec, that gives a thoughput of approx 1MBytes/sec.

Swapping from DMA mode to POLLING mode gives the same result.

Is that the best achievable reading throughput on STM32F4 ?

Thanks for help.

Best regards,

Vincent.

5 REPLIES 5
Kuikui
Associate III
Posted on August 23, 2017 at 19:10

I've found the cause of the bad throughput (but not the reason).

I tried to read 2 blocks with the multiblock function and I was surprised to see that the time was not 1ms, but about 520µs..

So I had a look at signals :

- in blue, a signal which rises just before calling the ReadBlock function, and falls after the ReadBlock function.

- in red, the SDIO_D0 signal

0690X00000607xQQAQ.png

It clearly appears that there's an extra 350µs before any read sequence.

Anyone knows what is causing this, and how to reduce (or cancel) this delay ?

It does not seem to be related to the DMA setup phase. I could measure that 99% of the 500µsec time was spent in :

while ((DMAEndOfTransfer == 0x00) && (TransferEnd == 0) && (TransferError == SD_OK) && (timeout > 0))

{

timeout--;

}

Thanks.

Best regards,

Vincent.

Posted on August 23, 2017 at 19:13

 

Doing a single sector will have overly burdensome overhead, you want to be doing multi-sector, FatFs does this in terms of feeding through f_read/f_write. I benchmark reading/writing aligned 32KB blocks.

You should avoid writes spanning two 128KB erase blocks (or whatever the media geometry is)

Run the SDIO peripheral with a 48 MHz clock, with a clock of 24 MHz to the card. The F446 doesn't have the BYPASS errata, so higher speed might be possible.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on August 23, 2017 at 19:19

Haven't dug into SD architecture in years, but I'd assume there is a 'seek time' related to pulling the 128KB block from the array, ie fetching from the slower NAND array into a 128KB RAM holding buffer that then gets streamed out.

Doing a read across multiple blocks will likely initiate a prefetch on subsequent blocks, so within the time frame of transferring block N across the interface block N+1 will be ready. So subsequent blocks have zero or minimal fetch times.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on August 23, 2017 at 20:32

Thanks Clive.

I understand there can be some 'seek time', but doing this :

1. ReadSingleBlock N

2. Wait 1ms

3. ReadSingleBlock N+1

Step 1 gives 500µs, step 3 gives also 500µs .. wherease ReadMultiBlock (N and N+1) gives approx 550µs ..

I'm not sure if this delay is only due to the SD card.

Posted on August 23, 2017 at 20:51

You're talking to a micro controller on the SD card which has the brains on an 8051, that manages the protocol, and initializes the transfers. The fetching and transfer is likely all handled by hw state-machines.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..