cancel
Showing results for 
Search instead for 
Did you mean: 

Implementing Interrupt driven Stream UART Rx Handling with STM32CubeMX drivers

Kent Swan
Senior
Posted on January 09, 2018 at 17:38

The original post was too long to process during our migration. Please click on the attachment to read the original post.
22 REPLIES 22
Posted on February 03, 2018 at 03:14

A couple of things. First a byte oriented ring buffers work wonders for transmit and receive of stream oriented character data.  I've implemented a thread and interrupt safe ring buffers to handle streams.  You really do not want to poll anything, especially if you're developing a low power device.  Additionally you should be using interrupts and a RTOS. I

've been using FreeRTOS with excellent results.   Typically I'm doing something like this:

                          |--------msg event------->|

RxCpltCallback |==> RxRingBuffer ==>| Task |===>Send==(notbusy)=>TxHandler ============> | TxISR

                                                                        |===(busy)=====TxRingBuffer------>|TxICpltCallback---->|

Under FreeRtos this is extremely efficient as Rx and Tx are running unassisted.  Any required delays in the Task are implemented with osDelay(n) which puts the task to sleep giving up any remaining task time to another ready to run task. Further special conditions detected at the rxCplt call back or the txCplt can signal appropriate tasks via a message queue or semaphore event thus waking that task to do something.

For transmit if not busy I just send the entire message which restarts the TX process. When busy the characters to transmit go into the ring buffer allowing the task to continue unimpeded.  After that each time the the txCplt call back occurs we simply transmit characters pending and so it goes until nothing's left and the TX channel shuts down.

Also the above works quite nicely as a method of routing streams between tasks.  I often do this and signal the other task through a queue message. 

Posted on February 14, 2018 at 07:30

Could this receive stream to a ring buffer work in together with the TransmitDMA_IT call?  The sample UART code calls to reinitialize the UART handle before and after each packet send/receive?  My suspicion is that I’d need to handle the TX character by character manually in the main loop as well as the RX taking this approach, right?

thanks!

Posted on February 14, 2018 at 07:51

TX is one of those things where you can generate data at far higher rates than you can output, bounded primarily by how large a buffer you want to work with.

My approach would be to commit DMA to whatever data you have available each time the last transfer completes. You might have to do a truncated transfer if the buffer wraps or spans, but for the most part you are not going to be sending singular characters. 

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on February 14, 2018 at 18:20

I'm not trying to send huge amounts of data, maybe 50 bytes every 5ms.  But, I guess my real question is mixing the character by character receive with DMA transmits.  Below is code based on the UART DMA sample code UART_TwoBoards_ComDMA.  It calls UART_DeInit and UART_Init multiple times.   Once before the transmit and once before the DMA receive.  If I change to a manual character based RX into my own ring buffer, I'll take the 2nd DeInit/Init and the Receive_DMA call out of the function but I assume the first DeInit/Init calls prior to the Transmit_DMA will hammer my polling for the RX character.  Or do I need to implement 2 UARTs, one for RX and another for TX?

static void UART_Send(void)

{

/*♯♯-2- Start the transmission process ♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯*/

/* While the UART in reception process, user can transmit data through 'aTxBuffer' buffer */

if(HAL_UART_DeInit(&UartHandle) != HAL_OK)

{

Error_Handler();

}

if(HAL_UART_Init(&UartHandle) != HAL_OK)

{

Error_Handler();

}

UartReady = RESET;

if(HAL_UART_Transmit_DMA(&UartHandle, (uint8_t*)&cmd_Response, cmd_Response.msg_len+5)!= HAL_OK) // + 5 for 3 byte header and 2 byte CRC

{

Error_Handler();

}

/*♯♯-3- Wait for the end of the transfer ♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯*/

while (UartReady != SET) // this is set in the TX complete callback

{

}

/* Reset transmission flag */

UartReady = RESET;

/*♯♯-4- Put UART peripheral in reception process ♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯♯*/

if(HAL_UART_DeInit(&UartHandle) != HAL_OK)

{

Error_Handler();

}

if(HAL_UART_Init(&UartHandle) != HAL_OK)

{

Error_Handler();

}

// leave in receive mode

if(HAL_UART_Receive_DMA(&UartHandle, (uint8_t *)UART_aRxBuffer, sizeof(api_msg_t)) != HAL_OK)

{

Error_Handler();

}

}
Posted on February 15, 2018 at 03:11

The hardware is capable of concurrent Rx/Tx operation, but sequential blocking code is not the way to achieve that. You'd want to manage dispatch in interrupt or callback

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
Posted on February 15, 2018 at 03:48

<< block transfer IS NOT the expected normal historical use form of a UART/USART async serial port. >>

On the contrary it is the most common use of a serial port in the PLC and industrial world.  Any type of RS-485 loop running Modbus is transferring framed blocks in half duplex, with line turnarounds.  Since I don't use the HAL I can verify there's nothing in the underlying STM32 hardware that prevents efficient and high speed use to block transfer and then stream out of the buffers, other than app limits on being able to drain the buffer before it overflows.

If you look at circular DMA with the half transfer flag it's ideal for streaming.  A circular DMA buffer large enough for several messages (2 in Modbus, line turns are flow control) into a buffer, with the insert FIFO pointer to it updated at HT events can easily stream the data out after arrival from the remote host.  Even wrapping the FIFO buffer is handled by the DMA hardware, and on cores with a good bus matrix and split SRAM the transfers are nearly free in terms of bus contention.  Tradeoff is larger buffer size with greater latency at half transfer points vs. lower latency with smaller buffers but more DMA overhead (still far less than UART IRQs).

I've implement this across F2/F4/L1/L4/L0 series in several designs using SPL calls and in-house device drivers (UART with block DMA, lightweight FIFO updated from DMA, custom Modbus stack, all tightly integrated into the RTOS), but FreeRTOS V10 also has much of this built in with the new interrupt streaming feature.  Switching to USB and CDC reuses virtually all the code, replacing larger UART DMA buffer with smaller CDC class ones.

If a tool doesn't meet your needs, don't blame the tool.  Find another one or build it yourself.  The HAL is there to entice AVR and PIC users, to make life easier at the lower end of the learning curve.  Using it for complex commercial designs without understand it's limitations is a non-technical issue best left for another forum.  The HAL does show promise if ST invests significant resources over the next 5 years or so, but for now I stay with the SPL and my own framework.

It's ironic to see so much activity around UART and DMA since these kinds of communications issues were solved decades ago.  Even the ancient Z80 4MHz 8-bit CPU and it's support family could handle fairly large message multiplexing loads using DMA and UARTs.  Anyone old enough to remember the DHV11 serial interface for PDP-11s, IBM 2780s and 3270s, or Ethernet based terminal servers from the 1970's?  Same principle, block to buffer to stream from software FIFO.

  Jack Peacock

roger2
Associate
Posted on June 21, 2018 at 06:06

I've also run into this as well. The modifications you talk about in your More Stable Tweak are undesirable to me because they'll affect any other projects I create from those reference files. But I just discovered that the stm32f7xx_it.c file generated in my project by STM32CubeMX contains the IRQHandler functions, and in them are user code sections:

void USART1_IRQHandler(void)
{
 /* USER CODE BEGIN USART1_IRQn 0 */
/* USER CODE END USART1_IRQn 0 */
 HAL_UART_IRQHandler(&huart1);
 /* USER CODE BEGIN USART1_IRQn 1 */
/* USER CODE END USART1_IRQn 1 */
}�?�?�?�?�?�?�?�?�?�?

This opens up the possibility of creating a streaming version of the HAL driver by writing my own ISR functions and completely bypassing the HAL driver's ISR function. If I get any success with that I'll update here.

Posted on June 26, 2018 at 13:01

You're missing a critical point.  Block mode rx will not work for continuous streamed async data. because there is no inter-symbol timing gap that allows for the restart.  Further most modern silicon when programmed for block mode resets the internal serial rx  registers thus compromising the data being received resulting in data loss or framing errors. 

You bring up knowledge dating back to the 70's. As an electrical engineer with a full minor in software, I have been designing systems, electronics and programming complex communications and real time  systems for over 40 years starting with the 9008, 8085, z80, 6502, Digital Equipment PDP8, PDP16, VAX and on and on. I've had a been a research scientist for digital communications involved with high speed sync and async data, compression technologies as well as net communications, and much more.

I've found that the CubeMX/HAL libraries have been reasonably useful in speeding development even though their over all efficiency is much lower than I would like, they are 'good enough' generally..  Referring to your comment Now to your comment 'If a tool doesn't meet your needs, don't blame the tool.  Find another one or build it yourself.' that is, if you noticed, exactly what I have done in this case using the concept of extension of a working environment  rather than throwing the whole thing  out and coding my own replacement.  

Do I ever do what you've espoused, of course I do but only when the supplied implementations are defective or don't meet reasonable coding standards or don't cover the capabilities or performance required.  One of the first things I do when bringing up a new subsystem is to evaluate and test the actual performance against the requirement.  That's what happened in this case and I found I could adapt the existing uart drivers as ST's driver team simply 'forgot' to include this required operational mode.

Thom Cousins
Associate

Hi Kent,

Great guide! Really helped me out.

I think you've forgotten to mention that in HAL_UART_Receive_IT() you need to take out the (Size == 0U) check in the 2nd if statement, as you use RxXferCount to index the buffer, wheras the original function incremented the buffer pointer and decrements RxXferCount. It might also be worth mentioning that when you call HAL_UART_Receive_IT it must be with size=0.

Apart from those things catching me out a bit your implementation worked perfectly!

Thanks again

Kent Swan
Senior

Thanks for the feedback. Glad that I could help out but it still annoys me when a common modality is not selectable with the generative IDE's. Oh well. My major stm32 tasking at the moment is multiple i2S channels running DMA and fully synchronized with in and out audio streams with some simple DSP processes. on the queued packet buffers internally.