cancel
Showing results for 
Search instead for 
Did you mean: 

STM32L4XX SD HAL + DMA/ISR race condition causes failed reads/writes

Campbell Young
Associate

Hi,

I'm using STMCube_FW_L4_V1.14.0 and have noticed an issue with the SD Hal driver + SDMMC that is causing very sporadic issues. I've investigated the issue and think I found the cause, but am hoping to get some feedback. Hopefully this is the right place to post this :)

The issue:

We're using stm32l4xx_hal_sd.c as the backend for a FAT filesystem that is used for recording sensor data. About one every 2 GB of data written, we get a write that fails due to an error in HAL_SD_WriteBlocks_DMA.

I spent a bit of time to make this more reproducible and eventually noticed that the issue is sensitive to interrupt timings; if I randomly delay interrupts within a timer, the issue happens much more frequently. I trace the cause to SDMMC_CmdWriteSingleBlock/SDMMC_CmdWriteMultiBlock which is returning SDMMC_ERROR_CMD_CRC_FAIL.

The source of this error is not actually a CRC error at all, instead it's caused by SDMMC_GetCmdResp1 checking if the result of SDMMC_GetCommandResponse(SDMMCx) matches the expected command. But it doesn't! Instead the command I get back corresponds to SDMMC_CMD_STOP_TRANSMISSION.

The DMA transaction is handled by HAL_SD_IRQHandler interrupt, which executes SDMMC_CmdStopTransfer during its processing of the read/write, and this causes the race condition.

The code that started the transaction is waiting for its command to be acknowledged, meanwhile the interrupt been triggered and executed a different command.

The logic seems quite broken for interrupt driven flows. If I disable the SD interrupts until after the SDMMC_CmdWriteSingleBlock/SDMMC_CmdWriteMultiBlock functions have returned, then the race condition is fixed.

Hope that makes sense, I'm happy to clarify if needed.

Cheers

Campbell

1 REPLY 1
Campbell Young
Associate

Seems like this has been fixed in v1.15.0. Didn't notice until I found your Github repo, my bad.