cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F2xx USART DMA read gets stuck (timeout), unable to recover?

TylerD1
Associate II

First time posting here, please bear with me if I'm making some rookie mistakes.

I am having a weird issue with the USART and DMA reception of STM32F217VG. There are two issues that I am struggling with:

1) not able to find the reason why the DMA read times out in the first place (how to avoid eliminate the issue?)

2) not able to recover from the error once it happens (how to fix it without having to reset entire chip??)

Here are more details about my setup. The MCU is reading data from another chip using USART and DMA. The length of incoming frames is not fixed and therefore each frame is read in two parts. First, read the fixed size header without DMA. Then, when the length of the frame is known the remaining bytes are read using DMA.

I am using the HAL drivers and the code is relatively simple. Basically it involves just two basic functions:

  • HAL_USART_TransmitReceive()
  • HAL_USART_Receive_DMA()

First one is used to read the start of frame (fixed size). Then DMA is used to read the remaining bytes (about 100 or so).

I have run lots of tests and the system runs fine for tens of minutes (processing thousands of frames). However, after 30-60 minutes and tens of thousands of frames, the DMA will eventually fail because of timeout. I am using a very large timeout (500ms, while the lengh of each frame is typically 2-3 ms only).

The USART peripheral is used here as SPI master. (All the "real" SPI peripherals are already used, that's why I need to use USART in SPI mode).

This is the first mystery I'm not able to solve: how can the USART read time out in the first place? It is reading fixed number of bytes from another MCU and the bytes are clocked by the STM32. Even if the other MCU was totally dead, the read operation should never time out, right? (The read might return garbage, but timeout does not make sense).

I have verified that the frame length information is not corrupted. All the frames in the system are < 200 bytes.

Any ideas what could cause USART DMA read to time out?

The issue happens maybe once or twice per hour and it is not critical to lose one or two frames. Therefore, I have tried to recover from the situation as it seems to be hard to avoid it completely. This is the second part of the question: how can I safely reset the USART and DMA after the timeout error happens?

This is my current attempt to reset the USART into working state and it is NOT working:

  1. kill the DMA operation that got stuck by calling HAL_USART_DMAStop()
  2. call HAL_USART_DeInit()
  3. call HAL_USART_Init() to re-initialize the USART

The subsequent USART reads will still keep failing. Likely reason seems to be that HAL_USART_TransmitReceive() gets stuck waiting for USART_FLAG_TXE to go high. Possibly some TX data was left in the pipe when the previous DMA read failed and therefore the TXE flag gets stuck in '0' state?

Not sure what to try next. I could try restoring the TXE to '1' manually somehow, but this all seems to be a big kludge already. I would really like to understand what is the actual root cause of the problem here.

I was hoping that there was some way to do a HW reset of the USART peripheral but could not find such option in the reference manual.

Currently the only way to recover from this issue is to reset the entire chip. That is not an acceptable solution for the end product.

I'm grateful for any pointers you might have.

5 REPLIES 5

I'd guess what I wrote in https://community.st.com/s/question/0D53W00000NzkKxSAJ/stm32f303vc-usart-dma-stops-on-receive-error-how-do-i-recover-and-restart-rx applies here, too, even if you use USART in synchronous mode. Note, that the 'F2 is older than 'F3 thus does not have the OVRDIS and DDRE bits.

> I was hoping that there was some way to do a HW reset of the USART peripheral but could not find such option in the reference manual.

You can reset any peripheral in the respective reset register in RCC, but IMO it's not needed here. You may be better off dropping Cube/HAL and doing things properly.

JW

Thanks for the fast reply! I already checked the post that you linked before posting the question, but need to read it again more carefully.

I totally missed the option to reset peripherals via RCC block, this is a valuable tip for now. The project is in such a state that I need to get some patch working as soon as possible. It does not need to be pretty. If a brute force reset of the USART fixes the problem I'm happy with that.

Status update: I was able to get around the problem by performing HW reset of the USART. Now the MCU can keep receiving frames after the DMA timeout happens and only one frame is lost, which is not a big deal in the system that I'm working on.

I will try to dig into the details at some point to find out the root cause of the issue, but this is good enough for now. The error is quite rare, it happens only once every 300k-500k frames (hours of runtime) which makes it a bit harder to systematically debug.

Piranha
Chief II

"Testing shows the presence, not the absence of bugs." /Edsger Dijkstra/

Take a note that such issues can happen because of this:

https://community.st.com/s/question/0D50X0000C5Tns8SQC/bug-stm32-hal-driver-lock-mechanism-is-not-interrupt-safe

And yes - that means that almost the whole HAL library is flawed and unreliable.

Hi,

I think I meet the same problem like you do and have no idea. what do you mean HW reset? DeInit the USART or Init it again?