DMA+USART on STM32F407VG: TC Interrupt sometimes not triggered

niklas2 · ‎2017-07-11

Posted on July 11, 2017 at 17:53

Dear community,

I have run into a DMA-related issue while trying to implement an application for the STM32F407VG which receives data from a sensor via UART. The sensor sends a 162-byte data packet at 912600 baud every 10ms with pauses in between. Because the MCU is already quite busy, i want to use DMA. Since I have no way to stop/start the sensors' transmission, and the MCU may start up after the sensor and I want the whole system to be hot-pluggable, I have to find the beginning of each data packet. If I just enable the DMA for USART reception, configure it to 162 bytes and process the result in the transfer complete interrupt, I might end up receiving the end of a packet which is currently being transmitted and the beginning of the next packet as one data block.

To solve this, I use the USART's IDLE interrupt to recognize the pause between two packets and start the new DMA transfer. I use the DMA completion interrupt to determine that a packet has been completely received (and stored in memory). If the TC interrupt has not triggered before the next IDLE interrupt, I assume the data packet to be too short and discard it.

However, under certain conditions, sometimes (every few hundreds of packets), the DMA TC interrupt simply is not triggered. In the USART IDLE interrupt, I see that the DMA 'NDTR' register is zero (indicating a complete transfer), while 'LISR' is also zero (indicating that the TC interrupt is not pending). After resetting the DMA stream it works fine again for some time.

This behaviour seems to be influenced by the CPU load: I configured a timer to call an empty dummy ISR at a high frequency. The higher the frequency, the more TC interrupts go missing. This happens even though both the USART and the DMA interrupt have a higher priority than the timer interrupt.

Executing the code from RAM instead of flash results in many more missed interrupts too.

My suspicion is that this has something to do with a high (RAM) bus load, which would affect performance but should not lead to missing interrupts.

I have tried many variations but could not find a working constellation. I have uploaded an example code demonstrating the problem on

https://github.com/Erlkoenig90/INSReceive

. The interesting part is in the Src/main.c file (shortened):

static uint8_t rxBuffer [176] __attribute__ ((aligned (16)));
static DMA_Stream_TypeDef* const dmaStream = DMA1_Stream1;
static unsigned int state = 0;
static unsigned int printCounter = 0;
void USART3_IRQHandler (void) {
if (USART3->SR & USART_SR_IDLE) {
// Clear Interrupt via dummy read
(void) USART3->DR;
switch (state) {
case 0:
// First IDLE detected. Do nothing special.
break;
case 1:
// IDLE has been detected without a DMA interrupt. This should not happen.
printf ('Reception failed: NDTR = %lu, LISR = 0x%lx\n', dmaStream->NDTR, DMA1->LISR);
printCounter = 0;
break;
case 2:
// DMA Completion and IDLE has happened. A packet has been properly received.
if (rxBuffer [0] == 0xFA && rxBuffer [160] == 0x27 && rxBuffer [161] == 0x10) {
if (printCounter == 99) {
puts ('Received 100 packets OK');
printCounter = 0;
} else {
++printCounter;
}
} else
puts ('Packet received, but is invalid');
break;
}
state = 1;
// Disable DMA stream properly
dmaStream->CR = 0;
while ((dmaStream->CR & DMA_SxCR_EN) != 0);
// Clear Interrupt flags
DMA1->LIFCR = DMA_LIFCR_CTCIF1 | DMA_LIFCR_CHTIF1 | DMA_LIFCR_CTEIF1 | DMA_LIFCR_CDMEIF1 | DMA_LIFCR_CFEIF1;
// Make sure buffer is correctly aligned
uint32_t mptr = (uint32_t) rxBuffer;
assert_param (mptr % 16 == 0);
// (Re-)Initialize DMA
dmaStream->PAR = (uint32_t) (&USART3->DR);
dmaStream->M0AR = mptr;
dmaStream->NDTR = 162;
dmaStream->FCR = DMA_SxFCR_DMDIS;
dmaStream->CR = DMA_SxCR_CHSEL_2 | DMA_SxCR_PL_0 | DMA_SxCR_MSIZE_1 | DMA_SxCR_MINC | DMA_SxCR_EN | DMA_SxCR_TCIE;
USART3->CR3 = USART_CR3_DMAR;
}
}
void DMA1_Stream1_IRQHandler (void) {
if (DMA1->LISR & DMA_LISR_TCIF1)
state = 2;
}
// Dummy Timer ISR to simulate high workload
void TIM8_UP_TIM13_IRQHandler () {
if (TIM13->SR & TIM_SR_UIF) {
TIM13->SR = ~TIM_SR_UIF;
__NOP ();
}
}
int main(void) {
// ... The usual initialization ...
puts ('Application startup');
// Configure interrupts
HAL_NVIC_SetPriority (TIM8_UP_TIM13_IRQn, 1, 1);
HAL_NVIC_EnableIRQ (TIM8_UP_TIM13_IRQn);
HAL_NVIC_SetPriority (USART3_IRQn, 0, 1);
HAL_NVIC_EnableIRQ (USART3_IRQn);
HAL_NVIC_SetPriority (DMA1_Stream1_IRQn, 0, 0);
HAL_NVIC_EnableIRQ (DMA1_Stream1_IRQn);
// Enable peripheral clocks
RCC->APB1ENR |= RCC_APB1ENR_TIM13EN;
RCC->AHB1ENR |= RCC_AHB1ENR_DMA1EN;
RCC->APB1ENR |= RCC_APB1ENR_USART3EN;
// Initialize TIM13 to call the interrupt at 50kHz, which simulates some dummy load
TIM13->PSC = 83;
TIM13->DIER = TIM_DIER_UIE;
TIM13->CR1 = 0;
TIM13->SR = ~TIM_SR_UIF;
TIM13->ARR = 19;
TIM13->CR1 = TIM_CR1_URS;
TIM13->EGR = TIM_EGR_UG;
TIM13->CR1 = TIM_CR1_CEN;
DBGMCU->APB1FZ |= DBGMCU_APB1_FZ_DBG_TIM13_STOP;
// Initialize UsART3 for reception
USART3->BRR = 46;// 921600 Baud.
USART3->CR1 = USART_CR1_UE | USART_CR1_RE | USART_CR1_IDLEIE; // Only enable IDLE interrupt
while (1) {
__WFI ();
}
}�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?�?

An example output is:

Application startup

Received 100 packets OK Received 100 packets OK Received 100 packets OK Reception failed: NDTR = 0, LISR = 0x0 Received 100 packets OK Received 100 packets OK Received 100 packets OK Reception failed: NDTR = 0, LISR = 0x0 Reception failed: NDTR = 0, LISR = 0x0 Received 100 packets OK Received 100 packets OK

The output is different each time the code is run. The problem also occurs when i remove the (slow) printf statements. The whole thing seems to be quite erratic and elusive...

Does anyone have an idea as to what I am doing wrong or maybe a workaround that still allows robust operation when the sensor and MCU are randomy hotplugged?

Thank you very much in advance!

#interrupt #issue #stm32f4 #dma #usart

waclawek.jan · ‎2017-07-11

Posted on July 12, 2017 at 00:18

I don't have an explanation for the behaviour you are experiencing, but I wonder how comes it won't choke forever in the

DMA1_Stream1_IRQHandler()

, as you don't clear the interrupt-triggering flag there and this is the highest priority interrupt...?

JW

niklas2 · ‎2017-07-11

Posted on July 12, 2017 at 01:29

Thanks for the reply! Good point, I don't know. Changing the ISR to

void DMA1_Stream1_IRQHandler (void) {
if (DMA1->LISR & DMA_LISR_TCIF1) {
state = 2;
// Disable DMA stream properly
dmaStream->CR = 0;
while ((dmaStream->CR & DMA_SxCR_EN) != 0);
// Clear Interrupt flags
DMA1->LIFCR = DMA_LIFCR_CTCIF1 | DMA_LIFCR_CHTIF1 | DMA_LIFCR_CTEIF1 | DMA_LIFCR_CDMEIF1 | DMA_LIFCR_CFEIF1;
}
}�?�?�?�?�?�?�?�?�?�?�?�?

doesn't change the behaviour, though.

I also tried removing the DMA ISR (and the DMA_SxCR_TCIE flag) alltogether, and just check for DMA_LISR_TCIF1 in the USART ISR. That doesn't help either - it works most of the time, but sometimes TCIF1 just stays at 0 even though NDTR is 0 as well.

waclawek.jan · ‎2017-07-17

Posted on July 17, 2017 at 14:54

Humm.

It would never occur to me to use the FIFO and then a different transfer size than (you have 162 bytes i.e. 2 outstanding bytes). Sounds much like a silicon bug, but to prove that it would need a cleaner and selfcontained example...

Meantime you might want to ask for support through official channels - distri/FAE, web contact form.

JW

waclawek.jan · ‎2017-07-17

Posted on July 17, 2017 at 13:18

Good point, I don't know.

Then revert and find out. When a substantial question arises, changing code in hope of things getting resolved magically usually won't help - or, worse, results in 'works for me' kind of solutions.

If the code is not stuck in that ISR, then the ISRs don't have the intended priorities, or the ISR in question is not fired for some reason at all; or there is some other code with higher polling/nesting priority which clears/resets the DMA (a fault handler, perhaps, or some zealous debugger), or something else I can't guess.

Cut the code to bare minimum; avoid libraries;check all relevant registers by reading back; use pin toggles and a LA to follow the actual code flow.

JW

niklas2 · ‎2017-07-17

Posted on July 17, 2017 at 13:29

As you can see from my example, there is no other code and no other ISR's. It happens with and without a debugger. I don't use any libraries besides the HAL, and I don't use that for the DMA. I tried it with a separate project without any libraries at all (only register level accesses) - same result. The USART ISR - which has a lower priority than the DMA ISR - is triggered, so I don't think there is another ISR blocking the CPU. The fault handlers have an endless loop, and would therefore block the CPU forever, which doesn't happen. The code is already at bare minimum, just the required initializations. I found out one thing though: Disabling the DMA FIFO and using the direct mode seems to solve it - not a single error in a day. This however results in a higher bus load.

niklas2 · ‎2017-07-17

Posted on July 17, 2017 at 21:54

Oh. You're right. According to p. 315 of the reference manual, transfer sizes must be a multiple of MSIZE. I configured my sensor to send 164 bytes instead of 162 and everything works fine even with FIFO enabled. Kind of evil that problems occur only rarely if that condition is not satisfied.

Anyways, thanks a lot, i wasted way to much time on this...

waclawek.jan · ‎2017-07-17

Posted on July 17, 2017 at 22:15

Please try to set the FIFO threshold to 1/2 (i.e. 2 words = 8 bytes), while still having 164 bytes (i.e. not-an-integer-multiple-of-8) to transfer.

I am willing to bet that the problem reoccurs.

JW

niklas2 · ‎2017-07-17

Posted on July 17, 2017 at 22:34

Not sure whether that is forbidden by the manual if no bursts are used. I can't try it out right now, maybe in a few days.

waclawek.jan · ‎2017-07-21

Posted on July 21, 2017 at 19:45

So I lost my bet... 🙂

To reproduce the original issue, I took a

http://www.st.com/content/st_com/en/products/evaluation-tools/product-evaluation-tools/mcu-eval-tools/stm32-mcu-eval-tools/stm32-mcu-discovery-kits/stm32f4discovery.html

; took the bulk of your sources (rewrote the intro of course as I despise Cube 🙂 ); to simulate your transmitter I employed Tx of the same USART3, filled by DMA in the main loop, followed by a trivial loopdelay; shorted PC10 to PC11 on the DISCO by a jumper for loopback; removed printouts, added several pin toggles to follow the execution path.

(I also had to randomize somewhat the timer ISR, as for certain code length it got inadvertently synchronous with the USART/DMA process, hiding the problem - see below for the role of the timer ISR).

I can confirm your initial finding: the Rx DMA with 4N+k, k=1..3 bytes to transfer, switched-on FIFO and word-transfer on memory side, occasionally ends with NDTR = 0 and TCIF *not* set.

Further findings:

Normally, if TCIF *is* set, the DMA1_Stream1_IRQHandler() is fired and (in its original incarnation) continues to fire as it does not clear TCIF. That puzzled me: as it has the same nesting/group priority as the USART IRQHandler but has a lower number, given the interrupt-entry rules, how comes the USART IRQHandler fires at all? I wrote a separate test to confirm that if you have two interrupts with the same priority active at the same time, and they never deactivate their trigger, they both fire in alternating fashion. If you add a third, then depending on whether it's higher on the list (lower number) or not, it is never fired or it replaces in the alternating pattern one of the former ISRs. I guess, during the tail-chaining process, the list of pending interrupts is polled *except* the active one. This is not a very practical but still interesting snippet of information which is IMO not that clear from the available Cortex-M documentation.
I confirm that if k=0 (i.e. fulfilling the requirement of UM to have NDTR multiple of memory-side transfer width when FIFO is used), TCIF is always set and everything behaves as expected.
FIFO threshold has no influence on all what's written here; the same good or bad behaviour is seen (although from sampling the FIFO level during transfer I see that FIFO itself obeys the set threshold and empties itself if it is reached).
Regardless of whether TCIF is or is not set, all the transferred data are stored into the memory, and - as documented in UM and discussed above - a whole word is stored into memory, with the outstanding bytes having the ''old'' value from the FIFO. For example, when transferring 16+2 bytes
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18
the rx buffer in memory is written by 16+4 bytes
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 03 04
Regardless of whether TCIF is or is not set, HTIF is *not* set at the tranfer end, if k=1..3; it is set properly if k=0. I wonder what exactly is the mechanism behind this one.
When TCIF is *not* set, SxCR.EN is *not* cleared!!! (while NDTR *is* zero). Manually clearing SxCR.EN in this case sets TCIF.
TCIF got *not* set only when the TIM13 ISR fell into a relatively narrow range closely after the USART transfer ended (see between A and B cursor in the picture below).

My assumption was that the problem is due to collision on the APB1 bus, but I was wrong - I changed TIM13 for TIM11 which is on APB2 and the result was the same
Moving the DMA buffer from SRAM1 to SRAM2 removed the issue. While the TIM ISR did not explicitly access SRAM, the stack was in SRAM1. Conclusion thus is, that if (k > 0) && (DMA memory-side write is delayed because of SRAM AHB bus contention) then SxCR.EN does not get cleared and TCIF does not get set.

As said above, this can't be considered a bug as the RM requires MSIZE and NDTR to be so that last transfer is not incomplete. So why am I so obsessed with finding out the exact mechanism of failure here? Because there is little to no material reason for this constraint, and it indicates lack of robustness in the DMA design.

I've encountered two different problems with the DMA in the near past, which I believe are both genuine bugs and result from the same lack of robustness. Unfortunately, they appeared in a rather complex code and I was unable - within a reasonable timeframe - to strip them down to a simple enough example to be exhibited here or anywhere else. And, I admit freely I am not in the $$$$$-buyers cathegory to deserve the level of attention bugs of this nature require.

That's why I am trying to poke through here... 😉

JW