2019-11-18 03:52 PM
My application works great, until it doesn't. I'm logging to an SD card, and every so often the call to f_write will return FR_DISK_ERR, at which point all subsequent attempts to write will return the same error.
Attempts to call f_open on a second file result in FR_LOCKED after the initial failure on the first file.
After a power cycle, everything is fine. And the mean time to failure is hours/thousands of log file writes (on two log files, not just one).
Documentation describes FR_DISK_ERR as an "unrecoverable hard error", but I'd like to figure out is there's any way at all to recover without power cycling.
Board layout follows guidance for SDIO lines, and the waveforms on the transmission lines look very similar to those on the ST dev boards. I can't totally rule out EMI as the root source, but would love to find a software work around since the application is tolerant of a few missed log entries.
2019-11-18 05:23 PM
Polled or DMA? The FIFO provides some elasticity but you can't wander off task for a bunch of time.
Instrument the SDIO layer, understand the specific error on the hardware side.
The SDIO peripheral will not multi-task, you need a single thread to own it, or mutex it.
2019-11-18 05:53 PM
DMA, with no modifications to the drivers/HAL code as generated by CUBEMX FW_F4 V1.24.1. Do you mean I could be overloading the FIFO?
The more I instrument the less convinced I am that it's hardware. The problem has only shown itself when the PCB is integrated into the system. Of course the glitch is rare enough that it could just be coincidence.
That the SDIO peripheral will not multitask is interesting and I hadn't truly considered it since the application code only calls OS level functions )F_write, f_open, etc. and the FatFS was configured with FF_FS_REENTRANT enabled, which, "switches the re-entrancy (thread safe) of the FatFs module itself". However, one thread initializes the system (with a call to MX_FATFS_Init() ) and another runs the application once all initialization is finished and is the sole user of the SDIO peripheral thereafter.
2019-11-18 05:59 PM
ADDENDUM: I was actually infrequently writing to the log from another thread that's running at the same time as the main thread. I'll remove that and see what results.
2019-11-18 06:12 PM
DMA should be ok. The FIFO can't run empty the card has expectation to receive all the data it is due, it can't stall.
2019-11-19 09:02 AM
The application has been running without fail for about 15 hours now. Previously I would have expected the SD card writes to have failed by now, so I think restricting SDIO peripheral use to a single thread has solved the problem. It's fairly obvious in retrospect...
Thanks for the guidance.
2019-11-20 11:31 AM
Edit: Limitiing SDIO peripheral access to a single thread fixed the problem. The issue I continued to have (below) was due to a corrupted SD card, which certainly complicated the debugging process. Problem solved. Thanks again.
Another Update: The problem still persists. I was too quick to claim that a 20 hour run was proof of a fix. So I'm leaving this thread open ended for now in case there's any other advice/info floating around out then.
Other information that might be relevant: I have three USARTS all running with interrupts enabled. Those interrupts are lower priority than SDIO (pre-emption level 5 for SDIO stuff, and level 6 for USART stuff) and I've never seen the USARTs fail in any way. The Timer for the FreeRTOS timebase is level 0.
Would it be recommended to disable interrupts for SDIO operations?
I did have to disable them for the call to f_mount, because often failures just like the ones described in the thread would occur if I didn't.
// Register the file system object to the FatFS module
__disable_irq();
if(f_mount(&SDFatFS, (TCHAR const*)SDPath, 0) != FR_OK)
{
/* FatFs Initialization Error */
Error_Handler();
}
__enable_irq();
2020-05-22 11:03 AM
Hi, I have a similar issue with my board, which is running an STM32H743 (without FreeRTOS, just a simple scheduler), with FATFs and SDIO in 4 Bit Mode. 98% of the time data is written correctly to the file, but sometimes I get FR_DISK_ERROR and then with the next write to the file FR_INVALID_OBJECT. Then no further writing is possible to the file.
My question now: could you solve the issue and if so, can you please let us know how?
Thanks for any help!
2020-05-22 11:42 AM
Need to root cause the failure.
Failure here cascades, and comes from the DISKIO layer, instrument and debug that.
Polled or DMA?
Prior Write completed?
Polling is fragile, can't be distracted with interrupts and callbacks.
Stupid packing/unpacking of FIFO in HAL. Not sure why it would have an alignment issue, nor why that case needs to cripple everything.
2020-05-24 03:09 AM
Hi Clive1,
thank you for your proposals.
I use DMA.
When the problem happens only a part of the data is written.
Prior writes are completed correctly.
The buffer is aligned to 512 and write packets are modulo 32 Bytes. Tried with modulo 512 also but the problem still exists.
I will track the problem in deep now. Really have to fix the issue!