cancel
Showing results for 
Search instead for 
Did you mean: 

STM32L4 with USB MSD and FatFS, Limitations

Nor Sch
Associate III
Posted on October 11, 2016 at 12:19

First: I do not use USB MSD and FatFS at same Time, but they work on the same SD-Card. This SD-Card is connected via SDIO with 4 Bit Databus. I use a STM32L476 but I think the Problems should be identical for all STM32L4-MCUs.

In the last Weeks I worked among others a lot on USB MSD and FatFS and got a bunch of Trouble with this. I updated FatFS (R0.12b) and FreeRTOS (v9.0.0). To get it run, I also used some Code from the Examples of the STM32L476-Eval-Board. Additionally I refactored a lot of CubeMx-generated Files including HAL-Drivers. The Result was cleaner, shorter and better readable Code without lot of duplicate Stuff, but I couldn't find any (not-documented) Errors or Reasons for obscure Limitations.

Here are the Limitations I found:
  1. The sdmmc-ClockDivider is internally incremented by 2. With a Clock of 48 MHz and the in Theory possibly Divider 0 the 24 MHz should be possible. But In Practice I must use 2 and get a Clock of 12 MHz. Otherwise I get from Read()-Functions permanently SD_RX_OVERRUN, means that I�m to slow with reading incoming Data out of the FIFO. The Demo-Code for the STM32L476-Eval-Board also uses 2 as Divider, but why?
  2. With FatFS the DMA-Write is working fine, with DMA-Read I get again the SD_RX_OVERRUN in every Case after 84 Byte copied from FIFO into my Buffer. The FIFO has 128 Byte and seems to be again faster full than the slow L4 can read out.
  3. With USB MSD I have to use also for the Write the blocking Mode. The DMA-Write is hanging somewhere not getting a Flag (I�m not sure without testing it again, which Flag, but it looked like a suggested IRQ-Handler is not called). So from generated Code no DMA at all is usable here.

Can anyone give some Hints to these Problems or can anyone confirm these Limitations? Any Suggestion is welcome!

By the Way here are the Speed-Limits I got with deactivating other Threads. The Internal Tests are done with a 1MB-File. For the USB MSD I used a Windows 7 and copied Files greater 40 MB.

Internal FatFS Read 3908 kB/s

Internal FatFS Write 678,6 kB/s

USB MSD Read 702 kB/s

USB MSD Write 657 kB/s

I think I�m on the Limit with these Values for the slow STM32L4 with 12 MHz SDIO-Clock and only USB FS with 12 Mbit/s. But I�m sure, that I will need the DMA later to get a little bit more Power for the other Threads � Also I think, that the Implementation of SD-HAL-Drivers and FatFS is not really cooperative for RTOS. After the Start of Read / Write even with DMA you will poll on some Flags. Has anyone a Hint for a fine working Solution which has multithreading better integrated?

#stm32l4-usb-msd-fatfs-dma
27 REPLIES 27
Posted on December 14, 2016 at 01:21

At the moment I got Read and Write DMA working fine and pooling is not an option at least in my case as I had various over/under run errors when using FreeRTOS (most likely you need to perform read writes under taskEnterCritical/taskExitCritical). The key is to correctly set up DMA you can't use separate channels for Read and Write. Therefore only add SDMMC1 request without TX nor RX at the end (when using CubeMX) or simply cofigure it just one DMA channel for both rx and tx. Lastly, you need to set up interrupts so that SDMMC1 interrupt has higher priority than DMA interrupt.

Posted on December 14, 2016 at 02:45

Thanks Chris, I'll give your suggestions a try. I believe the code that

comes from the Cube example uses channel 4 only, but if it does seem to

include RX and TX, for example:

state = ((SD_DMAConfigRx(&uSdHandle) == SD_OK) ? MSD_OK : MSD_ERROR);

and

state = ((SD_DMAConfigTx(&uSdHandle) == SD_OK) ? MSD_OK : MSD_ERROR);

but when you look at the code for the two Config functions, they seem

pretty well the same, and both use channel 4.

hdma_tx.Instance = DMA2_Channel4

In both Config functions, the priority for DMA channel 4 is set at 6,

HAL_NVIC_SetPriority(DMA2_Channel4_IRQn, 6, 0);

HAL_NVIC_EnableIRQ(DMA2_Channel4_IRQn);

whereas the SDMMC1 priority is set at 5

/* NVIC configuration for SDMMC1 interrupts */

HAL_NVIC_SetPriority(SDMMC1_IRQn, 5, 0);

HAL_NVIC_EnableIRQ(SDMMC1_IRQn);

which makes the SDMMC1 higher priority as it has a lower number.

In summary, I think I am already doing what you suggest, but I must be

doing something wrong elsewhere. Thanks again.

-Ken

Posted on December 14, 2016 at 03:03

You are correct everywhere, even though the bsp_driver_sd generated by cube is slightly different than the one attached inside examples. However I don't see why this wouldn't work for you, probably it's some kind of small mistake, maybe priorities for DMA? Good luck anyway I hope its gonna work out in the end, getting DMA running should fix any OVERRUN, UNDERRUN errors. And always check registers, helps a lot in debugging:)

Posted on December 14, 2016 at 23:32

Chris, thanks again for the help; it's good to know that I am mostly

doing it right; I'm pretty sure I will be able to figure it out. I am

checking the registers; I think I have them mostly memorized by now. My

current guess is I have something wrong with the interrupts; the process

is fairly convoluted and I think my startup_stml476xx.s may not have the

correct vectors. If I understand correctly, for example, this file

specifies the Handlers in stm32l4xx_it.c which then calls the HAL

interrupt handlers. But also in bsp_driver.c produced by Cube is: void

BSP_SD_DMA_Tx_IRQHandler(void) which also calls

HAL_DMA_IRQHandler(hsd1.hdmatx); I think this one doesn't actually get

used unless BSP_SD_DMA_Tx_IRQHandler is specified in startup_stml476xx.s

file.

...

Update: at end of day; I held off on sending after writing above at

beginning of day.

It appears I finally have it working after significantly modifying quite

a few files.

Basically, I am now using

STM32Cube_FW_L4_V1.5.0/Drivers/BSP/STM32L476G_EVAL/stm32l476g_eval_sd.c

with many changes and doing most of my init in

stm32l4xx_hal_msp.c in my home directory using ...msp() functions as the

HAL library mostly calls these. The code produced by Cube is conflicting

with the fats code in the eval example in TM32Cube_FW_L4_V1.5.0 in many

places.

-Ken

robertwood9170
Associate II
Posted on December 15, 2016 at 00:57

Update: I finally appear to have the eval example of STM32Cube_FW_L4_V1.5.0/Projects/STM32L476G_EVAL/Applications/FatFs/FatFs_uSD/ working with DMA. What I have now is a combination of code from the above example and code generated by Cube. I didn't record everything but I can mention a few of things I did (I did lots and lots of trial and error  and I am very far from the beginning). Firstly, the example and the code produced by Cube have many conflicts especially wrt init functions. I have tried to put files that have been customized into my home source directory similar to Cube, but I have not been perfect here (which may cause problems later). The main driver from the example that has been really modified is STM32Cube_FW_L4_V1.5.0/Drivers/BSP/STM32L476G_EVAL/stm32l476g_eval_sd.c; I copied this to my home src directory and renamed it stm32l476_sd.c. Also, we are actually using the 64 pin 486, not the 476, but the only differences appear to be a hardware AES is included in the 486. Almost all our code was originally developed with a previous iteration based on the 476. The equivalent file produced by Cube is in the home Src directory and is called 

bsp_driver_sd.c and is very different. Since the example fatfs were written to work with 

stm32l476g_eval_sd.c, I used it as my basis. This may have been a mistake. However, I do most of my inits in 

stm32l4xx_hal_msp.c in my home src directory similar to the Cube code. The Cube code does most of its configs in main.c.; I moved these to stm32l4xx_hal_hw.c again in home/src. I also put my interrupt handler redirection into home/src/stm32l4xx_it.c again similar to Cube. Now for some details (I will only use my file names).

1) In BSP_SD_ReadBlocks_DMA() in 

stm32l476_sd.c, I no longer set uSdHandle.hdmatx = NULL or call SD_DMAConfigRx(&uSdHandle). The changing of hdma_rx.Init.Direction, for example is done in the HAL library file 

stm32l4xx_hal_sd.c in HAL_SD_ReadBlocks_DMA() at line 896; also the DMA callbacks are also set in the function (i.e. hsd->hdmarx->XferCpltCallback  = SD_DMA_RxCplt at line 892). The 

SD_DMAConfigRx() function completely re configures everytime 

BSP_SD_ReadBlocks_DMA() is called. This seems wasteful. Similar comments for

SD_DMAConfigTx(), 

The configuration for the interrupts is now being done in my HAL_SD_MspInit() function in my home/src/

stm32l4xx_hal_msp.c similar to how Cube sets things up.The lines look like:

__HAL_LINKDMA(hsd,hdmarx,hdma_sdmmc1);

__HAL_LINKDMA(hsd,hdmatx,hdma_sdmmc1);

SD_on();

HAL_Delay(1);

/* Peripheral interrupt init */

HAL_NVIC_SetPriority(SDMMC1_IRQn, 5, 0);

HAL_NVIC_EnableIRQ(SDMMC1_IRQn);

/* USER CODE BEGIN SDMMC1_MspInit 1 */

/* NVIC configuration for DMA transfer complete interrupt */

HAL_NVIC_SetPriority(DMA2_Channel4_IRQn, 6, 0);

HAL_NVIC_EnableIRQ(DMA2_Channel4_IRQn);

The function 

SD_on() just turns on the PFET to power up the SD card. Notice that there is now a global for the dma handler function (i.e. 

hdma_sdmmc1). In the example driver, static locals were used for the dma handlers and this caused all kinds of problems. For example, HAL_SD_CheckReadOperation() was failing because hsd->DmaTransferCplt was not being set in HAL_SD_IRQHandler() because the dma handler was incorrect. Note: this is all in the HAL library file 

stm32l4xx_hal_sd.c.

The current set up is very convoluted and really needs to be refactored in a major way. For example, if you can follow this:

f_open() calls find_volume() (in ff.c) which calls disk_initialize() (in dsikio.c) which calls SD_initialize() (in sd_diskio.c) which calls BSP_SD_Init() (in stm32l476.c) which calls HAL_SD_Init() (in stm32l4xx_hal_sd.c) which calls HAL_SD_MspInit() (in my stm32l4xx_hal_msp.c file) which configures the SD once again! Sigh! Also, setting the dma handlers in the sd handler is done in a lot of different places and is hard to track down. The above example is the only reasonable example I found for sdio for the STM32L476 and once again it is not consistent the code generated by the Cube.

In the end, I was able to run the example 1000 times with each time opening the file, writing the file, closing the file, re-opening the file, reading the file, and confirming it was the same as that written. I am not running f_mkfs(). Each time, an LED toggles on my board; I can see there are times it stops for a noticeable fraction of a second, but no errors occur. With my 72MHz system frequency and uSdHandle.Init.ClockDiv set to 4, each complete iteration takes 42ms.

Finally, finally, this took a number of weeks of long days including weekends; I wouldn't suggest taking it on unless you are willing to invest the time; I'm sure there are easier and more efficient ways of accomplishing the same, but I couldn't find them with the sparse documentation available. I would suggest ST consider investing some resources to clean this up. Anyways, I think I am done for now and ready to move on.

Posted on December 15, 2016 at 02:22

I'm glad it worked for you in the end although you took kinda round way, btw in cube generation settings there is an option to generate a separate pair of h/c files for each peripheral so it makes your code much cleaner personally I always use it. Secondly code generated by Cube is more or less ok, but you have to remember where to put your code because Cube like to overwrite code which is in 'wrong' place. What you wrote regarding eval project is totally right and makes sense. Btw since you use LORA I assume you work heavily on optimizing your power consumption, what are your average current when using SD Card? I heard it can actually drain quite a lot...

robertwood9170
Associate II
Posted on December 15, 2016 at 21:17

Update 2: f_mkfs() is now working as well; indeed, it appears everything in fatfs (that I have tried is working). This is for ClockDiv = 3 (SD-CLK = 9.6MHz) or larger, for ClockDIv = 2 (SD-CLK=12MHz) it doesn't work at all. I'm currently mostly using ClockDiv = 4 (SD-CLK=8MHz). Now here's the interesting thing: for a test bench which is mostly writes, the frequency of SD-CLK make very little difference. For example, in a test bench that links, mounts, opens a file, write 100 lines of 47 characters each, closes, opens, and reads the complete contents once (it actually does a bit more, but I don't believe it is significant):

FD-CLK = 8MHz: 88ms/iteration (I do 100 iterations).

FD-CLK = 6MHz: 91ms/iteration

FD-CLK = 1MHz: 144ms/iteration.

The times vary a bit between tests (maybe 5%). I found this very surprising. Changing the test so that in each iteration, during the read part, the file was opened, completely read 100 times be using f_lseak to reset to the beginning of the file before each read, and then this process was repeated 100 times, produced a larger effect due to FD-CLK, but not even close to being proportional. For example, for the modified test with 100 reads (of 4700 bytes in each read) in each of the iterations:

FD-CLK=8MHz: 29ms/iteration

FD-CLK = 1MHz: 118ms/iteration

Currently, I have no idea why these times are not closer to being proportional to the FD-CLK period.

Chris, you are correct, with LoRa power is absolutely critical. We have not yet done power measurements. I believe it will be very dependent on whether it is a read or write (I believe writes require quite a bit more power) and maybe on the type of SD card. We are working on End Points, and almost all current End Points do not have SD cards. Our current plans are to store most samples in RAM and then every so often store the RAM contents on the SD card. In this way, we can mostly leave the SD card off to minimise memory. The other reason we need the SD card is it will allow us to do firm-ware updates which we think are absolutely necessary, especially in commercial applications. Using LoRa to download firmware updates is very difficult if not almost impossible. When I get some power measurements, I will post.

bryenton
Associate II
Posted on December 30, 2016 at 19:49

Hi,

I spent ALOT of time getting these working as well. The SD drivers have many bugs/timing issues in them.

For MSD and FatFS the most important issues to address are:

1) PCLK2 to SDMMC_CK must be such that PCLK2 is > 6/8*SDMMC_CK. This means that you can't just use the SDMMC_TRANSFER_CLK_DIV define, but need to adjust it to ensure .ClockDiv is set properly. Please note that the RM states 3/8 relationship but in testing from .5-80MHz SYS/PCLK frequencies seems that 6/8 ratio is needed. Whenever a new SD card or eMMC is used, this needs to be tested since might need to slow down SDMMC_CK a bit more or possibly speed it up if testing proves OK. To be safe, when ratio close to 6/8 I add 1 to ClockDiv to ensure SDMMC ratio is fast enough, and that gives reliable results, albeit slightly lower throughput.

2) ensure these are corrected as follows:

#define SDMMC_DCTRL_DBLOCKSIZE_2             (0x4U << SDMMC_DCTRL_DBLOCKSIZE_Pos) /*!< 0x00000040 */ // ie. 3U -> 4U

#define SDMMC_DCTRL_DBLOCKSIZE_3             (0x8U << SDMMC_DCTRL_DBLOCKSIZE_Pos) /*!< 0x00000080 */ // ie. 4U -> 8U

3) add a 10mSec delay after  __HAL_SD_SDMMC_ENABLE(hsd); and before CMD0/CMD8 sequence in SD_PowerON

4) fix bugs in uin64_t parameter to allow >4G devices in calls to BSP_SD_ReadBlocks, BSP_SD_ReadBlocks_DMA, BSP_SD_WriteBlocks, BSP_SD_WriteBlocks_DMA function calls, i.e.

    (uint64_t)(sector * BLOCK_SIZE), => should be  (uint64_t)sector * BLOCK_SIZE,

To get eMMC working there are many additional resolutions and functions that are needed since ST drivers don't fully support it.

Hopefully these will be of use and save you some/a lot of time.

Al

Posted on December 31, 2016 at 15:56

Regarding first one I've never run clock lower than 80MHz so far so I didnt encounter those issues - good to know anyway as I'm planning to lower frequency when dealing with less demanding tasks.

2 is fixed.

with 3 I didn't actually have any issues becuase of this and lastly indeed there is this bug in sd_diskio

Posted on January 02, 2017 at 11:52

Dear

Bryenton.Al

‌,

2)ensure these are corrected as follows:

#define SDMMC_DCTRL_DBLOCKSIZE_2 (0x4U << SDMMC_DCTRL_DBLOCKSIZE_Pos) /*!< 0x00000040 */ //ie. 3U -> 4U

#define SDMMC_DCTRL_DBLOCKSIZE_3 (0x8U << SDMMC_DCTRL_DBLOCKSIZE_Pos) /*!< 0x00000080 */ //ie. 4U -> 8U

=>Please note that this issue is already fixed in the release 1.6.0 of the Cube firmware package STM32L4

4)fix bugs in uin64_t parameter to allow >4G devices in calls to BSP_SD_ReadBlocks, BSP_SD_ReadBlocks_DMA, BSP_SD_WriteBlocks, BSP_SD_WriteBlocks_DMA function calls, i.e.

(uint64_t)(sector * BLOCK_SIZE), => should be (uint64_t)sector * BLOCK_SIZE,

=> We confirm this bug, it will be fixed in coming releases.

We will come back to you after moreinvestigating cases 1 and 3.

Best Regards

-Imen-

When your question is answered, please close this topic by clicking "Accept as Solution".
Thanks
Imen