stm32f7 usart and data corruption

Evan .1 · ‎2020-04-09

If I understand correctly the STM32F7 USART/UART doesn't have a hardware FIFO implemented. So how do I prevent data corruption if the interrupt aren't handled in time? Timing is now critical.

I see there a couple of examples which use DMA. Only how to use DMA with a big software FIFO. where counter needs to be updated after every read/write cycle?

So how is the intened use for this?

Thanks

Tesla DeLorean · ‎2020-04-09

Interrupt service on UARTs shouldn't be a problem if done so quickly. Problems will occur if you faff around needlessly on processing data in the interrupt or callbacks as you have a single byte time to work with. Data should be queued and processed in worker threads or tasks.

DMA can be used to create a larger ring buffer (ie circular DMA of enough size to manage baud rate, and maximum processing interval), which is periodically swept by the processing task. For continous streams this can be triggered by DMA HT and TC interrupts, where the half buffer time constrains the processing task runtime.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Evan .1 · ‎2020-04-09

Thanks for the reply. Please explain to me how to can guard that every is interrupt in time with very high interrupt load. Should not happen programming is not the code quality I'm after.

Can please explain or point to documentation. How to make a large DMA ring buffer that the application can add data to and the DMA controller empty it in the uart. I don't see how the interfacing is done between the application and the DMA controller.

Thanks.

Tesla DeLorean · ‎2020-04-09

The ring buffer would be predominantly for reception.

Transmit via DMA would be better achieved via a scatter-gather list, or chaining, as the data lengths are known

ahead of time.

A ring buffer for transmit would look like the head pointer manage in software, and the tail vs head being dispatched in the DMA TC interrupt, or UART TXE interrupt, and odd bytes to an idle UART directly.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Evan .1 · ‎2020-04-09

The sugested solution does not solve the problem That one element is read from the hardware at the time. If the interrupt load it high and it takes to long the data is corrupt and overwritten.

berendi · ‎2020-04-09

An STM32F7 can run on 200 MHz. A typical UART interface receives data at 115200 bits/s, i.e. 11520 bytes/s. There are 17361 cycles available to read out a byte.

If the interrupt load is so high that it can delay the UART interrupt that much, then there is something seriously wrong with your system design.

Find out how can you optimize interrupt handling, what else can you delegate to timer channels or DMA, because not the UART would be the bottleneck.

A typical antipattern to look for is calling generalized interrupt handling functions from the instance-specific handlers provided by the hardware, which would then consult some handle structures to decide what to do. In other words, you would gain a lot more by not using HAL.

Evan .1 · ‎2020-04-09

i assume you never made something really complicated. imagine: you have 2 spi interface, 1 i2c 2 canopen implementation Ethernet data transfer,4 uarts and RTOS system clock. And all of them generate a interrupt at the same time. It doesn't happen a lot but it does happens. I don't think 17361 cycles is enough to handle them all.

Tesla DeLorean · ‎2020-04-09

Well clearly if you've got your interrupt priorities and preemptions so broken you can't service anything properly, you've got a whole bunch of issues I can't solve for you.

DMA can reduce and decimate a significant amount of interrupt loading, and reduce servicing latency to a few dozen nanoseconds.

A buffer sufficiently deep can handle fluctuations in your ability to service it in a timely fashion.

If buffer management consumes too many resources, and you can't service/consume the data, use flow control.

If you generate more data than you can move, use throttling.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

berendi · ‎2020-04-09

If you know it better, then why did you ask?

Jack Peacock_2 · ‎2020-04-09

What you describe isn't all that complicated when coordinated with an RTOS. Sure a lot happens but either there's sufficient time to service interrupts or the hardware design is flawed. DMA is the answer to your question, but don't think in terms of byte at a time. If you have a streaming input from the UARTs then set up circular buffers and process the incoming stream, as blocks not bytes, at half transfer points generated by the DMA channel.

If you are careful in using the bus matrix, DMA occurs in parallel with application processing, right down to the bus cycle level. Transfer time is free since you can run the app tasks in parallel. If you cannot process the incoming stream fast enough to prevent overrun then you need a faster processor or you need to go with multiple cores.

Both SPIs use DMA buffers, two each, four channels in total since SPI transfers are bidirectional. I2C can use DMA if the data blocks are large, otherwise IRQ driven transfers are sufficient. CAN buses inherently buffer incoming messages, separating them into low and high priority FIFOs, so very little IRQ overhead there. Ethernet always uses DMA, and has built-in flow control if the PHY is properly configured. Four UARTS use DMA for output and either DMA or IRQ for input, depending on baud rate and data format (DMA for something like Modbus over RS-485, IRQ for a slow keyboard). If incoming UART data arrives too quickly use CTS/RTS flow control to synchronize the transfers. And yes, it can all run with a preemptive RTOS base timer in the milliseconds range.

Even if all these sources generate at IRQ at the same time it doesn't mean they all have to be serviced at once. The IRQ handlers post completion events to RTOS tasks; the tasks rely on priorities to ensure the peripherals are handled in the right order. Nest the IRQ levels, take advantage of tail-chaining to minimize IRQ latency, and plan the IRQ priority levels according to transfer rates.

"Really complicated" is when you try to optimize 8 cores to best handle the data flow with minimal cache misses while throttling DMA transfers to prevent peripherals from being starved of access time through the bus matrix, and all the while dodging those nasty race conditions that show up.

Jack Peacock