Showing results for 
Search instead for 
Did you mean: 

CubeMX / UART transmit DMA question


I am using an STM32L476VG which does not have UART transmit/receive fifo's.

Therefor I am using DMA for periodic transmission of some data (3 bytes at 125kbit at about 2kHz).

This is all configured in CubeMX and functioning (DMA in normal mode btw).

I am not sure if this is setup as efficient as it can be because for each transmission the following sequence of events occur:

- initiate transmission using the HAL_UART_Transmit_DMA function

1) DMA interrupt, calling the HAL_UART_TxHalfCpltCallback

2) DMA interrupt, calling UART_DMATransmitCplt - this enables the UART transmit complete interrupt

3) UART transmitter empty interrupt calling HAL_UART_TxCpltCallback

So the hardware is generating and handling 3 interrupts for every transmit DMA request.

Simply using the UART transmit interrupt is less or equal processor overhead (since only 3 bytes are transmitted).

I think it should be possible to transmit the 3 bytes using only a single interrupt (DMA TX done)

The things I would like to know:

1) How can I disable the half way interrupt (in a CubeMx compatible way)?

2) Is this the above the most optimal method of doing things if I want to use DMA?

One solution would be to set DMA to circular mode and perform the UART_EndTransmit_IT stuff from HAL_UART_TxCpltCallback.

This feels more like a hack but this is probably the way it will be implemented if it works.




If you *know* (from timings in your program) that after starting to DMA->UART_Tx there is enough time until next start of this process to transmit all data safely, you don't need no signalization of the transmit process completeness at all, i.e. no interrupt is needed.

Simply don't enable them.

> in a CubeMx compatible way

I don't know, I don't Cube/CubeMX. There may be none. Cube/CubeMX is here to provide you clicking environment, not to make efficient programs.



Yes, all quite a circus for three bytes.

I would personally have a deeper buffer to allow significantly more data to accumulate, then at each DMA TC for the last transfer I'd check the available data and send that, managing a wrap of the buffer if necessary.

Dispense with the HAL/Cube implementation of this, it is not worth fighting, and just code a simple routine to light off the DMA, and catch the DMA TC IRQ.

For RX I use a circular 16-bit wide array of adequate depth, periodically harvesting the data, and marking those words I've extracted already.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Hello, thanks for your answers, this is what I feared - limited by the codegen tool.

@Community member​ , this is how I would configure things since timing is hard but unfortunately the tool is a requirement in this project.

@Community member​, the 'protocol' is fixed so I cannot change timing and as said above I am stuck with the tool so it will probably become circular mode as described in the original post.

Thnx again,


If you (or whomever puts the requirements) insist on using an excavator where a shovel would suffice, the trench will be inevitably wide.


Bob S

The primary question is "does it work"? If so, is the interrupt overhead starving other duties that the CPU needs to handle? If not, let it be. Yes, it is inefficient. But if you have CPU cycles to spare, it doesn't really matter.

Premature optimization is the root of all evil - Donald Knuth