AnsweredAssumed Answered

Data Corruption when Simultaneous Transferring Data from SD Card and via ETH

Question asked by Michael Steinecke on Jun 29, 2015
Latest reply on Jun 29, 2015 by Clive One
Hello

We have a custom board with a STM32F429ZG MCU and CubeMX 4.7 FW Lib 1.5 generated libraries. The board is a redesign of a board a STM32F103ZG MCU with Wiznet W5300 Ethernet Coprocessor.
The old FW is based on the non-HAL/CubeMx libraries V3.5.0 (SPL?).

Used Libraries (new board):
FreeRTOS 8.1.2
FatFS R0.10b
LwIP: Self-modified to handle data transfers with 64k per tcp_write (exchanged several u16 with u32 for snd_buf and like variables)
Zero Copy ethernetif + stm32f4xx_hal_eth driver in send and receive direction
sdio.c implementation has two independent DMA for send and receives (like found in this forum)

With the new board we are facing some strange data corruption:

Reading larger binary files (2MB+) in chunks of SD-Card page sizes (64k, but occurs also with smaller chunks):
       
  1. Requesting a buffer of 64k from a custom LWIP mempool (memp_malloc())
  2.    
  3. Read from SD Card to SDRAM 1 via DMA2_Stream3 Channel 4.
  4.    
  5. Posting a pbuf to a FreeRTOS queue which is read by LWIP tcp_poll and eventually processed by tcp_write
  6.    
  7. LwIP splits up the large 64k in slices of 1460 byte and passes an pbuf chain to the driver
  8.    
  9. The low level ethernetif sets the ETHIF_DMATxDescriptors to point directly to the payload of the pbuf without copy: heth->TxDesc->Buffer1Addr = (uint32_t)p->payload;
  10.    
  11. In LwIP tcp_sent() the pbuf is freed on successful receiving confirmation

The entire memory allocation is handled by the LWIP mempool. The allocation/deallocation is guarded by FreeRTOS portSET_INTERRUPT_MASK_FROM_ISR() / portCLEAR_INTERRUPT_MASK_FROM_ISR(0);
We have also replaced it by the FreeRTOS heap_2 with a heap on the SDRAM.

This works fine as long as the files are smaller than ~2MB. However on larger files at some point it ends up that we have simultaneous SD Card reads and ETH writes because LwIP starts to process the pbuf queue while the SD card is still sending data.

On larger files I get sometimes corrupt data at the boarder of a SD-Card page (or begin/end of a TCP transmission). Usually 564 bytes of data are corrupt. Within these corrupt bytes there is a constant value added or subtracted from the expected value, but only if Abs(constant value) < Abs(expected value).

Usually a data table like this is transferred (artificial values for testing):
Expected:
CH0        CH1        CH2        CH3        CH4        CH5        CH6        CH7        UINT16
20100    20200    20300    20400    20500    20600    20700    20800    5
20101    20201    20301    20401    20501    20601    20701    20801    5
20102    20202    20302    20402    20502    20602    20702    20802    5
...
30100    30200    30300    30400    30500    30600    30700    30800    5

Corrupt:
CH0        CH1        CH2        CH3        CH4        CH5        CH6        CH7        UINT16
20100    20200    20300    20400    20500    20600    20700    20800    5
13101    13131    13301    13401    13501    13601    13701    13801    5
13102    13132    13302    13402    13502    13602    13702    13802    5
...
30100    30200    30300    30400    30500    30600    30700    30800    5


The oddest thing is, we found no invalid CRC reports. Nowhere.
The actual stored values on the SD Card are correct. It is correct in the SDRAM as well. It is received corrupt on the PC side however, without TCP/IP to reporting an invalid CRC. I've tried both LWIP and HW CRC generation.
The error doesn’t appear if the SD card is not involved at all. (Writing the same structure directly in the SDRAM from an external connected ADC via SPI/DMA and following the same procedure afterwards)

Has anyone a hint where to look or how to isolate the root cause? For me, it seems to be some electrical behavior. It may be something in the board layout but I wouldn't exclude the software as well. Currently I’ve ordered an STM32429I-EVAL for reproduction, but it will last some days until I receive it.

Outcomes