cancel
Showing results for 
Search instead for 
Did you mean: 

STM32L4 with USB MSD and FatFS, Limitations

Nor Sch
Associate III
Posted on October 11, 2016 at 12:19

First: I do not use USB MSD and FatFS at same Time, but they work on the same SD-Card. This SD-Card is connected via SDIO with 4 Bit Databus. I use a STM32L476 but I think the Problems should be identical for all STM32L4-MCUs.

In the last Weeks I worked among others a lot on USB MSD and FatFS and got a bunch of Trouble with this. I updated FatFS (R0.12b) and FreeRTOS (v9.0.0). To get it run, I also used some Code from the Examples of the STM32L476-Eval-Board. Additionally I refactored a lot of CubeMx-generated Files including HAL-Drivers. The Result was cleaner, shorter and better readable Code without lot of duplicate Stuff, but I couldn't find any (not-documented) Errors or Reasons for obscure Limitations.

Here are the Limitations I found:
  1. The sdmmc-ClockDivider is internally incremented by 2. With a Clock of 48 MHz and the in Theory possibly Divider 0 the 24 MHz should be possible. But In Practice I must use 2 and get a Clock of 12 MHz. Otherwise I get from Read()-Functions permanently SD_RX_OVERRUN, means that I�m to slow with reading incoming Data out of the FIFO. The Demo-Code for the STM32L476-Eval-Board also uses 2 as Divider, but why?
  2. With FatFS the DMA-Write is working fine, with DMA-Read I get again the SD_RX_OVERRUN in every Case after 84 Byte copied from FIFO into my Buffer. The FIFO has 128 Byte and seems to be again faster full than the slow L4 can read out.
  3. With USB MSD I have to use also for the Write the blocking Mode. The DMA-Write is hanging somewhere not getting a Flag (I�m not sure without testing it again, which Flag, but it looked like a suggested IRQ-Handler is not called). So from generated Code no DMA at all is usable here.

Can anyone give some Hints to these Problems or can anyone confirm these Limitations? Any Suggestion is welcome!

By the Way here are the Speed-Limits I got with deactivating other Threads. The Internal Tests are done with a 1MB-File. For the USB MSD I used a Windows 7 and copied Files greater 40 MB.

Internal FatFS Read 3908 kB/s

Internal FatFS Write 678,6 kB/s

USB MSD Read 702 kB/s

USB MSD Write 657 kB/s

I think I�m on the Limit with these Values for the slow STM32L4 with 12 MHz SDIO-Clock and only USB FS with 12 Mbit/s. But I�m sure, that I will need the DMA later to get a little bit more Power for the other Threads � Also I think, that the Implementation of SD-HAL-Drivers and FatFS is not really cooperative for RTOS. After the Start of Read / Write even with DMA you will poll on some Flags. Has anyone a Hint for a fine working Solution which has multithreading better integrated?

#stm32l4-usb-msd-fatfs-dma
27 REPLIES 27
ChrisH
Associate III
Posted on December 07, 2016 at 01:09

I'm having exactly the same issues as you mentioned. All from 1 - 3, I haven't found any workaround, I hope its not a HW limitation and possibly some kind of lib bug. Although I doubt it...

Nor Sch
Associate III
Posted on December 07, 2016 at 16:18

Since then I rewrote the complete SD-Driver. I have now a Driver which implements SD and eMMC. The above Problems are still there. But I have it now 100% stable and can use 8-Bit-Bus for eMMC. With this 8-Bit-Bus you can only use 9.6 MHz but at the End it's still a little bit faster than the 4-Bit-SD with 12 MHz.

If you try higher Frequencies, it runs often for a While. But at some Point the SDMMC-HW in the STM32L4 seems to hang. Other Threads are working, but there is no more Traffic on the Cmd-/Data-Lines. Also a Reinit of the SDMMC-HW was not working for me.

For Point 3 there would be a Solution: You only have to rewrite the USB-Driver and the DMA-Implementation. They are Garbage. But they are also so complicated, that normally nobody really wants to rewrite them. If you debug through a Filecopy via USB you will see that there are many many Functions and Layers called out of one Interrupt and there is a lot of Handling in this Interupt-State ... So if you have enough Manpower for your Project: write new Drivers.

Posted on December 07, 2016 at 16:35

Thank you for quite an extensive reply. For the final product I'm not going to use MSC so rewriting drivers is pointless:) I have weird behaviour with RTOS as interrupt callbacks are not called and when using any sort of DMA functions I timeout. Priorities in NVIC are correct SDMMC1 has higher priority than DMA request, but it is still not called. I probably missed something really simple here. Rewriting DMA Driver might be a need since I want to have long battery life and I want less frequent writes with larger chunks of data. Going here to sleep might be a good idea I guess, hence why I prefer to have working DMA. Btw, DMA Write in MSC worked for me out of the box. Do you think all those problems might be affected by the card that is used or it doesn't matter?

Posted on December 07, 2016 at 18:36

Dear Users,

Please, have a look to this

http://www.st.com/content/ccc/resource/technical/document/errata_sheet/65/dd/2c/78/59/b0/4c/68/DM00111498.pdf/files/DM00111498.pdf/jcr:content/translations/en.DM00111498.pdf

 related to the STM32L476xx/STM32L486xx devices, in the section 'SDMMC peripheral limitations' to check if you have the same conditions as described in the errata sheet.

Best Regards

-Imen-

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen
Posted on December 07, 2016 at 18:46

>>

Do you think all those problems might be affected by the card that is used or it doesn't matter?

Cards can change the system dynamics, within ST's code base historically these can cause race issues. ie mostly sequence/ordering becoming more critical when the card responds near instantaneously.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on December 08, 2016 at 17:13

These Problems are no Part of the Errata.

Posted on December 08, 2016 at 17:25

That's right. There are different Problems belonging to special Cards and a STM32-Controller are reported. But you often see not the Reason why there was this Problem. It could be, that it will work with other Frequencies, Maybe it will work with USB-MSD but not with FatFs. I mostly worked with a 4GB-Panasonic-Card, but also other Cards from Intenso and Sandisc looked same for me.

The eMMC I used is a Toshiba THGBMBG6D1KBAIL. For the Initialisation the eMMC-Datasheet says same as the JEDEC-Spec and the STM32-L4-Spec: Init should be done with 400 kHz. But this will not work, never! You have to init with 187,5 kHz. Why this Frequency? I don't know, have seen it by Reverse-Engineering a Card Reader and all above 200 kHz was not working ... I think, that this is a common Hardware-Incompatibility but I can't tell you which side makes something wrong.

ChrisH
Associate III
Posted on December 09, 2016 at 20:41

Now with bare minimum setup using RTOS + FATFs with SDIO driver, I can read in DMA but can't write since in Read DMA interrupts are normally fired and everything works while in write they won't work. On the other hand with USB MSC it is opposite, write DMA works, read DMA doesn't.

robertwood9170
Associate II
Posted on December 14, 2016 at 00:06

I have also seen similar (but not exactly the same) issues.

Background: we are developing an STM32l486 LoRa EndPoint for sensors; after many iterations, the hardware appears solid, and re drivers, the SD card is the last driver to be developed. It also has been by far the hardest; it took a very long time to get any life. Here's were we are at:

1) Our SDMMCCLK is 48MHz from the PLL (our crystal is 12MHz, our system clock is 72MHz). From the HAL library:

SDMMC_CK = SDMMCCLK / (ClockDiv + 2). We could not get anything working with ClockDiv equal to 0, 1, or 2. With ClockDiv equal to 3, we got very intermittent success. With ClockDiv equal to 4 and larger, we got some success, but still significant errors. For the smaller ClockDiv's the errors were RCV_OVERRUN's and TR_UNDERRUN's.

2) We were not successful using BSP_SD_ReadBlocks_DMA() or BSP_SD_WriteBlocks() in sd_diskio.c; we were able to read and write many times using low 

ClockDiv's and non DMA block reads and writes. We have not yet tracked down exactly what is failing when using DMA block reads and writes, we hope to get to this sometime. It appears others have been more successful than us wrt using DMA.

3) Whenever we tried running f_mkfs((TCHAR const*)SDPath, 0, 0), the SD card got corrupted; we have given up on this for now as we can easily make fat32 initialized cards using fdisk to partition and either mkfs or mkdosfs. We have not tracked down where 

f_mkfs goes wrong and probably won't as we don't see the necessity to use an STM32 to format the SD card.

4) After getting the example from STM32Cube_FW_L4_V1.5/Projects/STM32L476G_EVAL/Applications/FatFs/FatFs_uSD/ to work most of the time using a large ClockDiv (so that 

SDMMC_CK is 8MHz or slower)

and with f_mkfs commented out, we put the example code in a loop to open a file, write, close the file, re-open, read, and re-close as is done in the example. It ran often but almost never made 100 loops; the failures seem to be always occurring at the opens and closes of the write portion. If we retry the open or close directly using gdb after breaking, it often returns FR_OK, so the problem appears sporadic.

5) The cube generated code has errors: for example:

PeriphClkInit.PLLSAI1.PLLSAI1Source = RCC_PLLSOURCE_HSE;

and

PeriphClkInit.PLLSAI1.PLLSAI1M = 1; in SystemClock_Config() of main.c of the example. Errors like this do not inspire confidence in the quality of the Cube code.

Structure PLLSAI1 does not have 

PLLSAI1Source and 

PLLSAI1M components, only the main PLL has them. Also, combining the code generated from Cube with the code from the examples is very error prone and convoluted.

6) Summary: we are mostly working, but far from production ready after significant efforts. The 

STM32L476G_EVAL is the only Cube/HAL example that seems to have a chance of working. Documentation for getting microSD cards working with the STM32L4XX's is not there. It would be nice if there was a cohesive example using HAL and Cube code, but refactored into something sensible that could be used as a starting point. Also, at this time, we think the hardware may be buggy, but it is too complicated to tell. Will update later when we get more insight. Hoping this helps others and it would sure be nice to get some help from ST here.