stm32f429 USART DMA FIFO error on transmit upon completion

Peeters.Bram · ‎2024-02-09

Hi,

I am using DMA1 stream 3 for USART3 for tx on an stm32f429.

Driver code is based on cubemx generated st hal drivers firmware package version 1.24

The uart and dma initalisation parameters are as follows:

huart3.Instance = USART3;
huart3.Init.BaudRate = 921600
huart3.Init.WordLength = UART_WORDLENGTH_8B;
huart3.Init.StopBits = UART_STOPBITS_1;
huart3.Init.Parity = UART_PARITY_NONE;
huart3.Init.Mode = UART_MODE_TX_RX;
huart3.Init.HwFlowCtl = UART_HWCONTROL_NONE;
huart3.Init.OverSampling = UART_OVERSAMPLING_16;

hdma_usart3_tx.Instance = DMA1_Stream3;
hdma_usart3_tx.Init.Channel = DMA_CHANNEL_4;
hdma_usart3_tx.Init.Direction = DMA_MEMORY_TO_PERIPH;
hdma_usart3_tx.Init.PeriphInc = DMA_PINC_DISABLE;
hdma_usart3_tx.Init.MemInc = DMA_MINC_ENABLE;
hdma_usart3_tx.Init.PeriphDataAlignment = DMA_PDATAALIGN_BYTE;
hdma_usart3_tx.Init.MemDataAlignment = DMA_MDATAALIGN_BYTE;
hdma_usart3_tx.Init.Mode = DMA_NORMAL;
hdma_usart3_tx.Init.Priority = DMA_PRIORITY_LOW;
hdma_usart3_tx.Init.FIFOMode = DMA_FIFOMODE_DISABLE;

I use HAL_UART_Transmit_DMA(...) to transmit buffers (always from the same static buffer to which i copy the data to be transmitted first).

There is a semaphore before that to make sure there is only 1 action at a time till it is complete.

Now for some reason, I always get a FIFO error (LISR.FEIF3) at the exact same spot, 100 pct reproducible (and on multiple boards).
It is not the first message, more like the 30th.
And the message itself is transmitted correctly and completely.
This is confirmed by the DMA registers, LISR.TCIF3 is set to 1 and S3NDTR.NDT is 0.
These and other DMA registers at the moment the interrupt occurs are in the IAR screenshot in attachment.

If I look at the datasheet I get as possible reasons for the FIFO error:
• FIFO error: the FIFO error interrupt flag (FEIFx) is set if:
– A FIFO underrun condition is detected
– A FIFO overrun condition is detected (no detection in memory-to-memory mo
because requests and transfers are internally managed by the DMA)
– The stream is enabled while the FIFO threshold level is not compatible with th
size of the memory burst (refer to Table 48: FIFO threshold configurations)

+

In direct mode, the FIFO error flag can also be set under the following conditions:
• In the peripheral-to-memory mode, the FIFO can be saturated (overrun) if the memory
bus is not granted for several peripheral requests
• In the memory-to-peripheral mode, an underrun condition may occur if the memory bus
has not been granted before a peripheral request occurs

Since I am in direct mode it cannot be because of the threshold.
So if the fault is valid it has to be underrun, but then I would expect that the NDT is non zero to indicate at which point the DMA encountered an underrun.

Am I overlooking something, or am I bumping into some DMA controller bug which causes the occasional spurious FIFO fault ? But strange then that it is not random but at a fixed point in my flow.
For now I plan to modify the interrupt handler to ignore the error if NDT is 0 to deal with it and cross my fingers it only happens for complete transfers.

But it would be nice to completely understand the problem as I don't want to bury a potential real issue .

Does this ring any bells or any suggestions what else I can check ?

Karl Yamashita · ‎2024-02-10

I don't know what kind of data the OP is sending so I am just sending string with a counter. The OP is using memcpy and i'm using sprintf to save to a global variable. I know what you mean by race condition but i'm just making it the worst scenario which doesn't seem create an issue.

I myself use a Rx/Tx queue buffer to hold multiple messages or binary packets. So i don't have to deal with a race condition.

I have logged a text file of the output of the counter incrementing over 100k times and though I haven't looked at every line, what i've looked i don't see any skipped counter numbers. So that means that currently there is no race condition. I can't see where to attach a text file so i'm just pasting a handful of the last part of the log.

Hello World, Counter= 116529<CR><LF>
Hello World, Counter= 116530<CR><LF>
Hello World, Counter= 116531<CR><LF>
Hello World, Counter= 116532<CR><LF>
Hello World, Counter= 116533<CR><LF>
Hello World, Counter= 116534<CR><LF>
Hello World, Counter= 116535<CR><LF>
Hello World, Counter= 116536<CR><LF>
Hello World, Counter= 116537<CR><LF>
Hello World, Counter= 116538<CR><LF>
Hello World, Counter= 116539<CR><LF>
Hello World, Counter= 116540<CR><LF>
Hello World, Counter= 116541<CR><LF>
Hello World, Counter= 116542<CR><LF>
Hello World, Counter= 116543<CR><LF>
Hello World, Counter= 116544<CR><LF>
Hello World, Counter= 116545<CR><LF>
Hello World, Counter= 116546<CR><LF>
Hello World, Counter= 116547<CR><LF>
Hello World, Counter= 116548<CR><LF>
Hello World, Counter= 116549<CR><LF>
Hello World, Counter= 116550<CR><LF>
Hello World, Counter= 116551<CR><LF>
Hello World, Counter= 116552<CR><LF>
Hello World, Counter= 1165
2/10/2024 17:46:58.830 [RX] - 53<CR><LF>
Hello World, Counter= 116554<CR><LF>
Hello World, Counter= 116555<CR><LF>
Hello World, Counter= 116556<CR><LF>
Hello World, Counter= 116557<CR><LF>
Hello World, Counter= 116558<CR><LF>
Hello World, Counter= 116559<CR><LF>
Hello World, Counter= 116560<CR><LF>
Hello World, Counter= 116561<CR><LF>
Hello World, Counter= 116562<CR><LF>
Hello World, Counter= 116563<CR><LF>
Hello World, Counter= 116564<CR><LF>
Hello World, Counter= 116565<CR><LF>
Hello World, Counter= 116566<CR><LF>
Hello World, Counter= 116567<CR><LF>
Hello World, Counter= 116568<CR><LF>
Hello World, Counter= 116569<CR><LF>
Hello World, Counter= 116570<CR><LF>
Hello World, Counter= 116571<CR><LF>
Hello World, Counter= 116572<CR><LF>
Hello World, Counter= 116573<CR><LF>
Hello World, Counter= 116574<CR><LF>
Hello World, Counter= 116575<CR><LF>
Hello World, Counter= 116576<CR><LF>
Hello World, Counter= 116577<CR><LF>
Hello World, Counter= 116
2/10/2024 17:46:58.930 [RX] - 578<CR><LF>
Hello World, Counter= 116579<CR><LF>
Hello World, Counter= 116580<CR><LF>
Hello World, Counter= 116581<CR><LF>
Hello World, Counter= 116582<CR><LF>
Hello World, Counter= 116583<CR><LF>
Hello World, Counter= 116584<CR><LF>
Hello World, Counter= 116585<CR><LF>
Hello World, Counter= 116586<CR><LF>
Hello World, Counter= 116587<CR><LF>
Hello World, Counter= 116588<CR><LF>
Hello World, Counter= 116589<CR><LF>
Hello World, Counter= 116590<CR><LF>
Hello World, Counter= 116591<CR><LF>
Hello World, Counter= 116592<CR><LF>
Hello World, Counter= 116593<CR><LF>
Hello World, Counter= 116594<CR><LF>
Hello World, Counter= 116595<CR><LF>
Hello World, Counter= 116596<CR><LF>
Hello World, Counter= 116597<CR><LF>
Hello World, Counter= 116598<CR><LF>
Hello World, Counter= 116599<CR><LF>
Hello World, Counter= 116600<CR><LF>
Hello World, Counter= 116601<CR><LF>
Hello World, Counter= 116602<CR><LF>
Hello World, Counter= 116
2/10/2024 17:46:59.030 [RX] - 603<CR><LF>
Hello World, Counter= 116604<CR><LF>
Hello World, Counter= 116605<CR><LF>
Hello World, Counter= 116606<CR><LF>
Hello World, Counter= 116607<CR><LF>
Hello World, Counter= 116608<CR><LF>
Hello World, Counter= 116609<CR><LF>
Hello World, Counter= 116610<CR><LF>
Hello World, Counter= 116611<CR><LF>
Hello World, Counter= 116612<CR><LF>
Hello World, Counter= 116613<CR><LF>
Hello World, Counter= 116614<CR><LF>
Hello World, Counter= 116615<CR><LF>
Hello World, Counter= 116616<CR><LF>
Hello World, Counter= 116617<CR><LF>
Hello World, Counter= 116618<CR><LF>
Hello World, Counter= 116619<CR><LF>
Hello World, Counter= 116620<CR><LF>
Hello World, Counter= 116621<CR><LF>
Hello World, Counter= 116622<CR><LF>
Hello World, Counter= 116623<CR><LF>
Hello World, Counter= 116624<CR><LF>
Hello World, Counter= 116625<CR><LF>
Hello World, Counter= 116626<CR><LF>
Hello World, Counter= 116627<CR><LF>

Don't worry, I won't byte.
TimerCallback tutorial! | UART and DMA Idle tutorial!

If you find my solution useful, please click the Accept as Solution so others see the solution.

Karl Yamashita · ‎2024-02-10

So it seems then your code you will start the DMA action if you failed to obtain the semaphore so no wonder you get errors :)

How can i start the DMA if the semaphore is not released? What errors?

Also, are you calling the pollingroutine from a single thread as in your original example ? Because then you should always immediately get the HAL_BUSY return code without the HAL_UART_Transmit_DMA function actually doing something since huart->gState = HAL_UART_STATE_BUSY_TX is set and stays set till it is cleared in UART_EndTransmit_IT right before calling the complete callback. The code should not start 2 simultanious DMA transfers even if your semaphore protection fails ? Only if you are calling it from multiple threads on a preemptive OS you risk that both manage to squeeze past the huart->gState = HAL_UART_STATE_READY check at the beginning of HAL_UART_Transmit_DMA.

I am just following what code you have that i can see. In reality, I use a queue buffer and my transmit routine uses a different mechanism.

I am guessing it only starts doing something because you did not load an initial value in the semaphore ( I don't see the creation of the semaphore and probably the documentation would be wrong anyway :D:D ) ?

Oh come on, lol. STM32CubeIDE generated all the code including creating the semaphore.

Don't worry, I won't byte.
TimerCallback tutorial! | UART and DMA Idle tutorial!

If you find my solution useful, please click the Accept as Solution so others see the solution.

Peeters.Bram · ‎2024-02-10

Regarding the race condition remark:

You take the semaphore (well , you would if the check was correct but lets ignore that for now), start a dma action and then overwrite your buffer , so DMA is potentially still using the data while you are writing into it. It might work, if you get lucky and your dma is fast enough to get the data before you overwrite it, but it also might fail.

In my code i first take the semaphore (so I know no dma action is ongoing), copy data in the buffer, then start the dma action.

>I am just following what code you have that i can see.

No you are not, you are changing the order of things which matter + are using CMSIS wrappers in an incorrect way (well as far as I know, though not immediately your fault as the documentation is wrong, unless you have another implementation than what I have here in my version of freertos ... that is if you are using freertos ? )

>Oh come on, lol. STM32CubeIDE generated all the code including creating the semaphore.

I never generated semaphores with cubeide so i don't know what their initial value will be and I don't see the generated code in your post. But cubeid cannot know how you intend to use the semaphore. Do you want the consumer to be able to run before the producer or not ? Maybe there is a setting in cube for it ? It will also depend on what type of semaphore you generated ? Eg a binary semaphore in freertos is created in the empty state without a parameter to set it. A counting semaphore has an initial value as a parameter.

In my code I am using a binary freertos semaphore so I preload them in initialization after creating them for correct operation.

Karl Yamashita · ‎2024-02-10

The data doesn't matter. How I'm calling osSemaphoreWait and osSemaphoreRelease is the same as your code, more or less, as i have no idea what you're doing before you call SerialDbg_TxCpltCallback. But i have to assume HAL_UART_TxCpltCallback is inline somewhere?.

ST uses a wrapper around FreeRTOS. It just does some extra validation. In the end, it's using FreeRTOS.

/**
* @brief Wait until a Semaphore token becomes available
* @PAram  semaphore_id  semaphore object referenced with \ref osSemaphore.
* @PAram  millisec      timeout value or 0 in case of no time-out.
* @retval  number of available tokens, or -1 in case of incorrect parameters.
* @note   MUST REMAIN UNCHANGED: \b osSemaphoreWait shall be consistent in every CMSIS-RTOS.
*/
int32_t osSemaphoreWait (osSemaphoreId semaphore_id, uint32_t millisec)
{
  TickType_t ticks;
  portBASE_TYPE taskWoken = pdFALSE;  
  
  
  if (semaphore_id == NULL) {
    return osErrorParameter;
  }
  
  ticks = 0;
  if (millisec == osWaitForever) {
    ticks = portMAX_DELAY;
  }
  else if (millisec != 0) {
    ticks = millisec / portTICK_PERIOD_MS;
    if (ticks == 0) {
      ticks = 1;
    }
  }
  
  if (inHandlerMode()) {
    if (xSemaphoreTakeFromISR(semaphore_id, &taskWoken) != pdTRUE) {
      return osErrorOS;
    }
	portEND_SWITCHING_ISR(taskWoken);
  }  
  else if (xSemaphoreTake(semaphore_id, ticks) != pdTRUE) {
    return osErrorOS;
  }
  
  return osOK;
}

/**
* @brief Release a Semaphore token
* @PAram  semaphore_id  semaphore object referenced with \ref osSemaphore.
* @retval  status code that indicates the execution status of the function.
* @note   MUST REMAIN UNCHANGED: \b osSemaphoreRelease shall be consistent in every CMSIS-RTOS.
*/
osStatus osSemaphoreRelease (osSemaphoreId semaphore_id)
{
  osStatus result = osOK;
  portBASE_TYPE taskWoken = pdFALSE;
  
  
  if (inHandlerMode()) {
    if (xSemaphoreGiveFromISR(semaphore_id, &taskWoken) != pdTRUE) {
      return osErrorOS;
    }
    portEND_SWITCHING_ISR(taskWoken);
  }
  else {
    if (xSemaphoreGive(semaphore_id) != pdTRUE) {
      result = osErrorOS;
    }
  }
  
  return result;
}

And i finally caught on to what you were saying. I forgot to check the status when calling osSemaphoreWait

I check for the status now, but i get the same results.

if(osSemaphoreWait(myBinarySem01Handle[uartID], 10) == osOK)

Don't worry, I won't byte.
TimerCallback tutorial! | UART and DMA Idle tutorial!

If you find my solution useful, please click the Accept as Solution so others see the solution.

waclawek.jan · ‎2024-02-11

Hi @Peeters.Bram ,

>> [bootloader should reset peripherals as that's what Cube expects at the beginning]

> True (though to nitpick, you cannot reset everything, eg IWDG)

Fair point; but then we can also argue that the completely opposite approach could be used, too: make the bootloader part of the application, thus peripheral initializations made in bootloader are (mostly) those needed for application and they don't need to be replicated in application. However, this does not fare well with the architecture of Cube/HAL, and - as this thread shows, too - fails in the face of Cube/HAL being updated for whatever reason.

But I digress.

Now, I understand that there is no data loss in an underrun TX scenario, but in an RX overrun scenario there will be data loss imho unless flow control is enabled ?

Not necessarily.

First, UART/SPI/I2C are double-buffered, i.e. at the moment when the holding register gets full and the peripheral indicates this by RXNE or similar flag to DMA, there is still one frame's time for the DMA to succeed storing the first two data.

Second, and here I am speculating as it would need insider access to be sure, IMO "direct mode" still uses the whole FIFO mechanism, except that the trigger levels are at 1. It means, that for Tx direction it attempts to prefetch from memory port only one frame, and for Rx direction it attempts to store to memory port immediately after reading from peripheral port. But, for the latter, IMO, still the full FIFO is available. An indication - yes, not proof - is in this wording from RM:

In direct mode, the FIFO error flag can also be set under the following conditions:
• In the peripheral-to-memory mode, the FIFO can be saturated (overrun) if the memory
bus is not granted for several peripheral requests

( @STOne-32 , this (i.e. what's the risk of data loss in direct mode of the dual-port DMA in peripheral->memory direction, and whether the FIFO error interrupt is indicative of such data loss, and whether FIFO is or is not in fact used in direct mode i.e. whether the number of items stored in DMA is one or up to FIFO size in direct mode) may be an example of topic worthy of "knowledge base"; although, it would be way better to put it properly into AN4031).

I see your point, though - the FIFO error may indicate that something is astray (although not necessarily in fatal way) and needs attention, at least during development. It happens in only in extreme cases, though, and Cube/HAL is written for the masses (it probably perspires that I don't use Cube/HAL and despise it to a certain extent). I too personally wouldn't consider setting up and then investigating the FIFO interrupt in direct mode for slow peripherals unless I see data loss; OTOH for fast peripherals I do use the FIFO as appropriate.

Now to your particular case.

1Mbps UART is unusual as is 64MHz system clock in 'F4, but that still leaves around 640 system clock periods per UART frame. Bus arbitration is round-robin, it means, that all other bus masters would need to hold up the bus for 640 ticks for a problem to occur. Unless...

... unless the symptom stems from UART requesting two frames in quick succession. And it indeed does - if its transmitter has not been disabled previously (so that the idle preamble does not apply), and if its holding register is empty, after writing to holding register it "immediately" transfers that frame to the shift register and indicates holding register empty (TXE) again. I don't know how "immediate" this process is.

... unless the DMA is held up by other, equal or higher priority, DMA streams occupying the memory port for an extensively long time. From the screenshot in your opening post, only one other DMA stream appears to be used, the same UART's Rx, and I don't think that could hold up the memory port for significant time.

... unless the memory bus in question is held up by some other bus-master. The bus arbitrator is again round-robin, and AFAIK there are no priorities i.e. all busmasters are at the same priority, so the total worst case latency is the sum of worst case locked-bus cycles of all other busmasters. Besides the processor (which IMO won't lock the bus for more than 3 cycles worst case, for the worst case unaligned access), it's the other DMA (which can be set to locked bursts, but rarely anybody does that), ETH (which can and probably is set to some bursts, maybe 16-beat?), and OTG HS (I'm not sure with that one, but again I wouldn't expect anything beyond 16-beat bursts). Are these used in your application?

... unless the accessed memory itself generates waitstates for whatever reason. And, indeed, looking again at the screenshot you've posted, you are using FMC (at 0x68xx'xxxx) for the DMA stream in question. Now the question is, what sort of memory is that, how exactly is it set up, and what is the resulting (worst case) latency of its access (read). That indeed may be one of the main source of unorthodox behaviour here, as those latencies apply also to all other busmasters' accesses, so latency stemming from the previous item multiplies with this one.

... unless there's something else I didn't think of. You've mentioned Sleep - well, maybe that's a factor too, I don't know, I don't have extensive experience with Sleep or other low-power modes. However, the thread you've mentioned did not come to a definitive conclusion, and it also was about 'F0, which has a different type of DMA.

Now all this said, in your particular case, any latency means only that the outgoing frame will be delayed. It's rarely an issue. And, also, remember, that unless this is the case of FIFO error being generated due to UART TX DMA being enabled before DMA itself, the error comes in between frames; and it's when data are transferred from holding to shift register is when TXE is risen (so it generates the request DMA can't fulfill immediately as it did not succeed to prefetch yet), so shift register is full at that moment, and even if it takes DMA the 640 system clocks to fetch data and forward them to UART_DR, the net result is still a continuous uninterrupted UART Tx stream.

JW