STM32H7 UART DMA TX/RX issues

GStee.2 · ‎2022-10-19

We are currently seeing issues while trying to get UART communication working : we have have 2 stm32h7 (STM32H753) boards connected via RS-232. On board 1 we are sending 4kB of data each second and on board 2 we are receiving this data.

Tx on board 1 is done through DMA via a 4k buffer. Rx on board 2 is done via a 256 byte DMA buffer (so in order to get all the 4kB sent from board 1 we will have 32 interrupts, (16 for the half transfer and 16 for the transfer complete).

In this configuration we are able to receive all data properly. So far so good (we have done this test for 2 speeds : 921600 kbit and 460800 kbit).

In order to lower the number of interrupts we now increase the DMA Rx buffer from 256 bytes to 512 bytes. From now on it goes wrong. We do see on oscilloscope that board 1 is sending data but we don't see the data coming in on board 2.

We did take into account the information in the knowledge base DMA is not working on STM32H7 devices by aligning the tx buffer on 4kB base and the rx buffer on 512 byte base. furthermore we did disable I/D caching for this test.

Important remark : When we set a breakpoint in the callback of the HT/TC Rx interrupt we do stop in the callback and when we continue from there we are receiving data for a very short while but then it stops again (until we re-apply the breakpoint).

Any idea what could be causing this behavior? Please note that we are not using the FIFO mode of the UART's.

Abdelhamid GHITH · ‎2022-11-04

@IIvan.22 "The issue (MCU freeze)" : did you confirm it is a HW freeze? from my side I think it can also result from firmware issues (CPU no more responding to other ITs when in a critical section or with a high priority IT in active state). Is Hardfault, NMI handlers code instrumented?

>>Having a simple code reproducing the behavior can help to confirm issue root cause.

For further investigation:

Can you have access to debug? If not, could you add in your code a DMA channel triggered by an external event (EXTI) to dump critical memory regions that might help identify what is going on?

Regarding devices with revision id code 0x1003, bit 20 workaround is not needed.

is the issue seen only with revision id code 0x1003?

IIvan.22 · ‎2022-11-04

Abdelhamid,

Yes: NMI, HardFault, MemManage, BusFault and UsageFault are configured to send a message over SWDIO; but it never happens.

Debug info: I got few registers by reading directly from SWDIO, when MCU freezes:

PC: 0x00000298; it is our BusFaultHandler ISR starting address.

LR: 0xfffffffd; meaning it is in the interrupt handler.

SP: 0x20020000; is equal to the top of MSP, not PSP.

xPSR: 0x81000005; it means BusFault exception.

CPU state: was halted, but not by debugger.

Other device revisions: not known / not available.

Abdelhamid GHITH · ‎2022-11-06

Interesting!

is it possible to check if BusFault exception is precise?

>> BusFault Status Register is at 0xE000ED29

From stack frame you can try to identify last PC position before the exception,

>> ARM keil AN209 (Link) that might help.

IIvan.22 · ‎2022-11-08

GStee.2,

If your D-Cache is disabled (bit 16 in CCR is zero, it is after reset), make sure you do not call any cache maintenance functions such as SCB_InvalidateDCache_by_Addr or SCB_CleanDCache_by_Addr, they still will affect memory, but now in unpredictable way.

Or better, initialize and enable all cache as described in PM0253 4.8.5 Initializing and enabling the L1-cache, page 242. At least this resolved our case. All high load tests now run for days without issues. Without D-Cache and removed maintenance calls it also works fine.

Piranha · ‎2022-11-08

If enabling the cache "solves" some issue, then one can be sure that the issue is still there, but the cache just hides it.

> If your D-Cache is disabled ..., make sure you do not call any cache maintenance functions

At least with the current CMSIS version (5.9.0) and otherwise correct code, all SCB_***() functions work properly even with disabled cache. I even tried disabling/enabling D-cache at runtime in a firmware similar to this demo and everything continued working perfectly without interruption.