DMA (STM32F4xx) aborted transfers

Discussion created by waclawek.jan on Jul 24, 2017
Latest reply on Sep 27, 2017 by David Littell

This is one of the DMA bugs I encountered recently and mentioned here. The description is lengthy - see attached code, or skip to the conclusion below.


The test code uses a DISCOF4 with PC11 and PC12 shorted (SPI3 MOSI looped back to MOSI) and monitored by an LA together with PC10 (SPI3_SCK) and less importantly PA1 (TIM2_CH2).


I never call any clock setup etc. from the startup code; everything is explicitly set in main(). System clock is 16MHz HSI, (although PLL is switched on but only to provide the 48MHz clock to RNG - see explanationlater); APB1 is divided down by 4 (APB2 should play no role here).


SPI3 is enabled as conventional master with disabled soft-NSS, byte-wide transfer at PCLK/2. Both Rx and Tx request their respective DMA Stream (DMA1Stream0 and DMA1Stream7 respectively), storing/loading data circularly to/from a buffer in SRAM1. At initialization, the Tx buffer is filled by a simple pattern (00 01 02 etc.); Rx buffer is explicitly cleared to 0; after starting this is continuously filled by the same pattern as the Tx buffer, thanks to the DMA->Tx->loopback->Rx->DMA action (both buffers are the same in size, for simplicity). FIFO is switched off on those two DMAs, i.e. Direct Mode is used.


So far so good.


Now I want to insert data into the Tx buffer. I don't want to stop and restart the SPI nor DMA (note that this is a significantly simplified example from a real-world application, where there are reasons for this requirement); so to prevent the DMA to overrun the position I am writing at, I temporarily disable the Tx DMA request by clearing SPI_CR2.TXDMAEN. As the DMA request from SPI has no latch, but is directly SPI_SR.TXE ANDed ("masked" in ARM/ST parlance) by SPI_CR2.TXDMAEN, this should effectively stop any further Tx. Should the DMA be already triggered there may be one more byte transferred by DMA from buffer to SPI_DR shortly after this disable; I simply account for this when calculating the position where to store the new data in the Tx DMA buffer. After having modified the data I reenable SPI_CR2.TXDMAEN.


In the test program, I don't modify the Tx buffer data, only simulate this in SpiTryTxBuf(), called from the main loop here and there. This works pretty well, and worked like a charm, in my feasibility study, where these two streams were the only load on DMA1. I stress-loaded the SPI and gadgetry attached  to it, so at this point I was pretty confident this is a flawless piece of code.


Now comes the gotcha. I started to add other features of the application, and when it grew quite a bit, random data errors started to occur. Under realistic loads this was like once in a couple of minutes, and of course I did not suspect the DMA initially, so it resulted in a week or so of 12+hour debugging sessions and frustration...


In the test program, this extra "disturb" is modeled by a timer (TIM2) firing a DMA (DMA1Stream1, with priority set higher than the SPI-fired DMAs, although I don't know if this priority is of any importance), which performs a dummy transfer from some rather arbitrary SRAM2 position to TIM2's CCR1 (it's on the same APB1 bus as is SPI3, and it's the same DMA unit - I don't know which one of these is important). To "visualize" the timer, CH2 is set to 1-clock-wide PWM and output to PA1, so the DMA "happens" at roughly the time when the pulse is output (there may be latencies both in the DMA-triggering-chain and PWM-to-GPIO output, I don't/can't/won't discuss). To avoid inadvertent synchronicities, TIM2_ARR is "randomized" by changing it here and there it in the main loop, based on the RNG (okay some simple numerical LCRNG  or such would probably suffice, but once we have the real hardware stuff handy... :-) ).


Once this "disturb" is in place, the SPI Tx DMA starts to behave erratically: occasionally, it increments its source pointer and decrements NDTR, but does not store the read byte into SPI_DR. The most visible consequence is, that the data in the Rx buffer are "shifted" by one byte. To demonstrate that, we detect an incorrectly received datum at the end of Rx buffer in the Rx-DMA ISR, and subsequently stop the Tx DMA, and the program can be stopped at this point by placing breakpoint at the nop marked "PLACE BREAKPOINT HERE". After reset and run, the program pretty soon stops there, and upon subsequent runs stops again and again, as the problem continues to occur. One example of what can be seen is below:


As the LA trace indicates, 0x1F has not been transmitted. The same can be seen in the Rx buffer in memory - 0x1F is supposed to be at 0x20000023 (it took some time until the code in Rx DMA ISR stopped the process so the first two bytes in the Rx buffer were already transmitted). The NDTR of Tx Stream7 is one less than the Rx Stream0 NDTR, indicating that the Tx DMA had actually picked the byte from SRAM, but had not stored it into SPI_DR.




If (DMA is temporarily suspended by clearing the request source enable) & (there is another stream active in the same DMA) then Tx data may get lost.


As I found no mention in RM that the DMA request must not be disabled temporarily, I consider this to be a genuine bug.


There *may* be a workaround; I don't know and don't have energy to investigate (I worked around the whole problem using a completely different approach). It might be way easier to find one having access to the IP's sources.


My guess on the mechanism is: the DMA fails to properly latch the request upon starting the transfer process internally. It relies on request (which is *external* to DMA itself) to be on until the end of the whole process. If it is removed in the course of transfer, the transfer may be prematurely aborted, especially if there are factors delaying the second (store) part of the process.


The underlying reason is lack of robustness in the DMA design, I was already talking about in the referenced thread; and the same lack of robustness - even if not the same mechanism - is the origin for other DMA bug already published in the 'F4 errata (DMA2 data corruption when managing AHB and APB peripherals in a concurrent way). I'd hereby like to ask ST to perform a thorough review of the 'F4 ('F2, 'F7?) DMA for any other similar potential sources of misbehaviour.




This bug should not pertain to DMA required by TIM:  as they have latches on DMA sources, which are cleared by their respective enable. However, I encountered a different bug with TIM/DMA, but that's for another discussion...