Showing results for 
Search instead for 
Did you mean: 

How could I increase speed of SD reading with SPI+USB in STM32L073RZ?

Associate II

Hello, everybody.

I'm working in a circuit design whose main task should be to read an µSDHC card and copy its files to the computer. I'm using SMT32L073RZ with SPI (µSD to MCU) and full-speed USB (MCU to PC) communication interfaces.

The other features of design require very low power consumption, so that's the reason I'm using a device of ultra low-power STM32L0 series. Unfortunately, as far as I know STM32L0 family doesn't have SDIO interface and I have to get along with the slower SPI-mode for SD communication.

After several tests, I achieved a maximum speed of ~110 kB/s when I adapted code from a STM32Cube's project (...\STM32Cube\Repository\STM32Cube_FW_L0_V1.11.0\Projects\STM32L073Z-EVAL\Applications\USB_Device\MSC_Standalone) to make it work with my own circuit configuration. USB is implemented as a Mass Storage Class (MSC) and there is NO use of FatFS middleware. Apart from relatively low speed, the only problem is it only works with SD cards (up to 2 GB), not with SDHC.

Although I'm aware it's not doable to reach speeds on the order of MB/s (USB FS is already limited to 1.5 MB/s), I think there is still room for improvement. So I thought about 3 approaches to make it happen:

  • Improve SPI communication: Maybe add DMA control and/or try to make multi-block readings from µSD instead of single-block readings.
  • Implement/emulate SDIO interface: If I could emulate SDIO using MCU's current resources, it would be possible to use SD mode (1-bit or 4-bit) instead of SPI mode for µSD card.
  • Change/add dedicated hardware.

I have no knowledge about how to put them into practice, so that's why I ask you for opinions, advices or suggestions about these three options or any other ones.

Kind regards,

Jose Costa


Reading single blocks is very slow as there is significant overhead, you need to let the host system past requests for multiples. Make MSC_MEDIA_PACKET a larger power of two.

/* MSC Class Config */

#define MSC_MEDIA_PACKET                     512 // << TOO SMALL

The highest achievable speed for USB FS in these situations is around 600-700 KBps.

You could try pushing the SPI to 50 MHz or higher.

I don't see emulating 4-bit SDIO to be remotely practical.

Look at the L4, perhaps clock it slower? Perhaps have it sleep more, ie does the task in a fraction of the time so sleeps 95% of the time rather than 50%.

SDIO/SDMMC needs at least a 64-pin count STM32, as I recall.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Associate II

First of all, thank you for your response. I will take note about it, but I'm still having some questions:

  • Could you be more specific about how to change single-block reading for multi-block reading, please? This is the reading function:
  * @brief  Reads block(s) from a specified address in the SD card, in polling mode. 
  * @param  pData: Pointer to the buffer that will contain the data to transmit
  * @param  ReadAddr: Address from where data is to be read  
  * @param  BlockSize: SD card data block size, that should be 512
  * @param  NumOfBlocks: Number of SD blocks to read 
  * @retval SD status
uint8_t SD_ReadBlocks(uint32_t* pData, uint32_t ReadAddr, uint16_t BlockSize, uint32_t NumberOfBlocks)
  uint32_t offset = 0;
  uint8_t retr = BSP_SD_ERROR;
  uint8_t *ptr = NULL;
  SD_CmdAnswer_typedef response;
  /* Send CMD16 (SD_CMD_SET_BLOCKLEN) to set the size of the block and 
     Check if the SD acknowledged the set block length command: R1 response (0x00: no errors) */
  response = SD_SendCmd(SD_CMD_SET_BLOCKLEN, BlockSize, 0xFF, SD_ANSWER_R1_EXPECTED);
  if ( response.r1 != SD_R1_NO_ERROR)
     goto error;
  ptr = malloc(sizeof(uint8_t)*BlockSize);
  if( ptr == NULL )
     goto error;
  memset(ptr, SD_DUMMY_BYTE, sizeof(uint8_t)*BlockSize);
  /* Data transfer */
  while (NumberOfBlocks--)
    /* Send CMD17 (SD_CMD_READ_SINGLE_BLOCK) to read one block */
    /* Check if the SD acknowledged the read block command: R1 response (0x00: no errors) */
    response = SD_SendCmd(SD_CMD_READ_SINGLE_BLOCK, (ReadAddr + offset)/(flag_SDHC == 1 ?BlockSize: 1), 0xFF, SD_ANSWER_R1_EXPECTED);
    if ( response.r1 != SD_R1_NO_ERROR)
      goto error;
    /* Now look for the data token to signify the start of the data */
      /* Read the SD block data : read NumByteToRead data */
      SD_IO_WriteReadData(ptr, (uint8_t*)pData + offset, BlockSize);
      /* Set next read address*/
      offset += BlockSize;
      /* get CRC bytes (not really needed by us, but required by SD) */
      goto error;
    /* End the command data read cycle */
  retr = BSP_SD_OK;
error :  
  /* Send dummy byte: 8 Clock pulses of delay */
  if(ptr != NULL) free(ptr);
  /* Return the reponse */
  return retr;

I suppose CMD18 (SD_CMD_READ_MULT_BLOCK) should be sent instead of CMD17 (SD_CMD_READ_SINGLE_BLOCK), and also I have to increase the size of vector ptr to make it big enough to storage n blocks instead of only one. But once that is done, how can I know when these exactly n blocks have been read? At what time should I send CMD12 (SD_CMD_STOP_TRANSMISSION)? Is there some flag or interrupt which can be used?


  • I tried to increase MSC_MEDIA_PACKET, but results seem to be similar. (Maybe you want me to increase it after implementing multi-block reading?)

  • Regarding SPI's frequency, STM32L073RZ datasheet (DS10685, Rev 4) indicates the maximum is 16 Mbits/s:


Is there any way to 'push' the SPI frequency limit for this MCU?

I think that's all. I would appreciate if you could help me on those. ��

Kind regards,

Jose Costa


Have you maxed out the SPI clock frequency ?

The hardware (PCB impedances, series resistors) need to be carefully designed for high clock frequencies. And keep the wiring as short as possible.

Measuring at those lines with a scope is equally difficult. The probes need to be impedance-matched, or you see mainly probe-induced artifacts.

Associate II

Yes, I think so. I use max frequency (32 MHz) of the SPI1's peripheral domain (APB2), and then I use the min prescaler (2). This way, SCLK is 16 MHz, which is the max theoretical value.

I would pay attention to minimize hardware interferences. Thank you for the information.

Kind regards,

Jose Costa