Fastest/best way to recover from a SDMMC TXUNDERR error, and write again successfully?

Chris Rice · ‎2020-05-17

Hello, we are using the SD Card HAL code on an STM32F7. We are pretty much using the default initializers for the peripheral, which leave us at

SDMMC2.POWER = 0x00000003
SDMMC2.CLKCR = 0x00000900
SDMMC2.DCTRL = 0x00000033

We have some operations that run many consecutive writes using:

HAL_StatusTypeDef r = HAL_SD_WriteBlocks_IT(&hsd, pData, BlockAdd, NumberOfBlocks);

What we do is we monitor for the callback

void HAL_SD_TxCpltCallback(SD_HandleTypeDef *hsd)

And then we allow our main loop to write another block. It seems that whenever we cycle through enough writes, anywhere from the 5th to the 50th write will encounter a SDMMC_IT_TXUNDERR interrupt. The default HAL handling for this interrupt seems to kind of abort and disable everything.

There is not a *ton* of explanation as to what this is, aside from name itself (which frankly doesn't paint a picture... I understand an overrun but whats an underrun?) but ultimately assuming that this mean the write failed, we want to try the write again, same data to the same spot. The datasheet says (sec 39.8.11 of RM0410):

"If DMA is used to fill SDMMC FIFO (DMAEN bit is set in SDMMC_DCTRL register),

user software should disable DMA stream, and then write DMAEN with ‘0’ (to disable

DMA request generation)."

However, for us DMAEN is already off, so I think we are not using it (however the SD_DMAReceiveCplt callback is still part of the sequence, so that's confusing too). Ultimately we want to retry, so what is our quickest, but also correct, way to reset the registers and peripheral so that we can HAL_SD_WriteBlocks_IT again, and expect it to succeed the next time.

Thanks for any help!

Danish1 · ‎2020-05-18

A transmit underrun is where the SDMMC is required to send data by a certain time, but your code hasn't responded quickly-enough to the interrupt to load the data.

SDMMC is a synchronous interface, meaning that data transfers are synchronised to a clock generated by the SDMMC module.

You are not using DMA - you are using interrupts to load data because you call HAL_SD_WriteBlocks_IT, whereas the DMA version would be something like HAL_SD_WriteBlocks_DMA.

It is possible that you have another interrupt that is taking too long to allow the SD data transfer to keep up. Do look at all your interrupts to see if any have "blocking" code e.g. writing to a uart. Switching to DMA for the SD interface will probably solve this, as might slowing the SDMMC clock. It might also be worth playing with the "Hardware flow-control enable" bit in SDMMC_CLKCR

Writing to FLASH (including emulating an EEPROM in FLASH) will block the whole system so you shouldn't try that at the same time as writing to SD.

Hope this helps,

Danish

Tesla DeLorean · ‎2020-05-18

During the transfer phase the SD card has zero tolerance for delay/stalling. You definitely don't want another interrupt to preempt or allow RTOS to yield.

Likely going to need to reset peripheral and cycle card through escape sequence, and initialization phases.

CMD Line High, 80 Clocks

Viewing SDMMC/SDIO Peripheral and FIFOs in debugger will cause failures.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Chris Rice · ‎2020-05-18

Thanks both of you! Clive... what I did in the meantime to reset for another write, is what the HAL "abort" path did, which actually just disables all interrupts (which are enabled again on the next write). I actually haven't checked the actual transferred data yet... but if I stop and clear all interrupts, and bubble up a notification after TXUNDERR occurs that my calling code needs to re-send the last data (after a brief cooldown wait), the next write seems to report success.

This is a bit less than "reset peripheral and cycle card through escape sequence, and initialization phases", and its also a little less than I expected to need to do. Do you suspect that the next write is failing, even if reporting success? (I'll check this when I get a chance, it's harder than it sounds... just wanted to get your feeling on this.)

Thanks again!

Chris Rice · ‎2020-05-20

Another follow on question... is this the sort of thing that could be resolved by elevating the priority of my SD card interrupt? Thank you.

Tesla DeLorean · ‎2020-05-20

Perhaps, the danger there is if you have a timeout dependent on HAL_Tick advancing.

The FIFO has some depth, but the data moves rapidly.

ST made the implementation worse in polled mode by decomposing the bytes/words to address the not 4-byte alignment issue rather than just special case the unaligned one, or copy to an aligned buffer. I thought the processors (above CM0) could handle unaligned 32-bit word reads, it was just DMA that couldn't, and optimizations that used LDRD/STRD inappropriately.

Often on the SDIO/SDMMC I see cascading failure, FATFS reporting an error on the second operation, but the card failed on the prior/earlier operation. This being the case if the previous write hasn't completed yet. The library is quixotic at times about if it is responsible for waiting or not, and at which BSP or DISKIO layer..

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..