2020-06-09 11:06 PM
Hi,
I'm using STMCube_FW_L4_V1.14.0 and have noticed an issue with the SD Hal driver + SDMMC that is causing very sporadic issues. I've investigated the issue and think I found the cause, but am hoping to get some feedback. Hopefully this is the right place to post this :)
The issue:
We're using stm32l4xx_hal_sd.c as the backend for a FAT filesystem that is used for recording sensor data. About one every 2 GB of data written, we get a write that fails due to an error in HAL_SD_WriteBlocks_DMA.
I spent a bit of time to make this more reproducible and eventually noticed that the issue is sensitive to interrupt timings; if I randomly delay interrupts within a timer, the issue happens much more frequently. I trace the cause to SDMMC_CmdWriteSingleBlock/SDMMC_CmdWriteMultiBlock which is returning SDMMC_ERROR_CMD_CRC_FAIL.
The source of this error is not actually a CRC error at all, instead it's caused by SDMMC_GetCmdResp1 checking if the result of SDMMC_GetCommandResponse(SDMMCx) matches the expected command. But it doesn't! Instead the command I get back corresponds to SDMMC_CMD_STOP_TRANSMISSION.
The DMA transaction is handled by HAL_SD_IRQHandler interrupt, which executes SDMMC_CmdStopTransfer during its processing of the read/write, and this causes the race condition.
The code that started the transaction is waiting for its command to be acknowledged, meanwhile the interrupt been triggered and executed a different command.
The logic seems quite broken for interrupt driven flows. If I disable the SD interrupts until after the SDMMC_CmdWriteSingleBlock/SDMMC_CmdWriteMultiBlock functions have returned, then the race condition is fixed.
Hope that makes sense, I'm happy to clarify if needed.
Cheers
Campbell
2020-06-10 12:11 AM
Seems like this has been fixed in v1.15.0. Didn't notice until I found your Github repo, my bad.