FatFs + SDMMC + DATAEND interrupt

MPage.1 · ‎2021-10-05

I've been struggling to get FatFs working on STM32L496 / L471 for some time now. At present my code uses polling which even at low card clock with certain interrupts disabled is not reliable enough, giving me random FR_DISK_ERR. I know I have to get DMA working, and have spent the last two weeks focussed on just that. (Some of that was wasted not realizing CubeMX had not merely initialized but also started the watchdog, giving strange debug behaviour.)

I am not using an OS.

To simplify things I've been looking at single block DMA writes. The FatFs code SD_write() calls HAL_SD_WriteBlocks_DMA() via BSP_SD_WriteBlocks_DMA() then waits for WriteStatus to be set via callbacks. The first callback is SD_DMATransmitCplt. Stock code merely enables the SDMMC DATAEND interrupt, since - unlike the read block case - data does not reach the card until the DPSM has fully flushed the TX FIFO.

The second callback ought to triggered by DATAEND but I never get the DATAEND interrupt. I get TXUNDERR instead. If I follow the RM's advice and disable DMA requests via SDMMC_DCTRL_DMAEN in the transfer complete callback, I still get TXUNDERR. And when I run the code to completion then pause, DCOUNT still contains 388 bytes the same as when the DMA callback occurred.

If I disable TXUNDERR interrupts at transfer complete (should be safe), I never get DATAEND.

MASK & DATAENDIE is enabled in every case. RM tells me DATAEND occurs when DCOUNT counts down to zero. I can demonstrate DMA single block write-read round-trip even with stock code but since WriteStatus is never set, FatFs times out.

Right now - if I can prove single and multi-block transfers actually work given enough time - I am going to attempt a workaround by setting a "DMA Transfer Complete" flag in the callback then polling TXFIFOE. This would be in a modified version of SD_write(). But I'd really like to get DATAEND working so I'm making fewer modifications to stock code.

Thanks for reading this far and any suggestions.

MPage.1 · ‎2021-10-13

UPDATE: It's now working well enough to move on. Soak testing on the actual hardware in the actual application proved invaluable as two of the bugs were marginal. There were times the code behaved flawlessly on my Disco board, but not in my application. There were times I could perform DMA block write-read flawlessly - over a million blocks in groups of 1 to 10, but FatFs didn't.

Bugs:

One: HAL code is incomplete. At the bottom of HAL_SD_MspInit() is this comment:

/* Several peripheral DMA handle pointers point to the same DMA handle.

Be aware that there is only one channel to perform all the requested DMAs. */

/* Be sure to change transfer direction before calling

HAL_SD_ReadBlocks_DMA or HAL_SD_WriteBlocks_DMA. */

It's nice of the programmer to do that. But it would have been nicer if ST had finished the job and modified the DMA functions; or failing that put a note in the relevant documentation eg HAL UM and Chip RM. This is the bug that stopped me getting anywhere with DMA.

Two: FatFs to HAL interface code contain race conditions. SD_read and SD_write initiate DMA transfer then wait on ReadStatus and WriteStatus respectively. But why are these flags cleared after initiating transfer? With a long enough interrupt, the transfer could already be over and the flag already set. Many people won't see this, but it's there. And with FAT is one error is enough to really mess things up. Though in this case the most likely behaviour is a 30 second block followed by FR_DISK_ERR.

Three: This one was my fault. The SDMMC peripheral has two clocks. There is what becomes the card clock after division. This was running at 32MHz. But the peripheral registers are on APB2, which was clocked at 10MHz due to a decision made a long time ago. This works most of the time but just occasionally - probably when TIM15 ticked - the DMA already hobbled by APB2 lost a few too many cycles and caused a FIFO underrun. Increasing PCLK2 solved the problem.

General lack of confidence in HAL and lack of knowledge of the design it was trying to implement along with an over-cluttered source clouded the issue and made debugging difficult. Hopefully by posting this update some of you will avoid these problems.

Four: (minor) FatFs R0.12c has a bug that is fixed in R0.14b. When opening write-append, if seek to end fails (could happen with buggy HAL layer), the file lock is not released and the next call returns FR_LOCKED. Doesn't fix the basic behaviour, but prevents errors propagating.