ADC overrun error. Multiple DMA usage.

mur · ‎2020-05-05

Hi,

I am using a NUCLEO F334 development board.

I am using the ADC with the DMA. I developed that feature and tested and everything looked alright. I have been developing some more over that and I see that after startup the values are correct but after a few ms the read values are not correct and in the checked cases, above the expected value.

I have seen that changing totally unrelated parts of code affect this issue happening or not. Changing the compiler optimization also affected whether the issue happens or not.

Browsing around I found an article discussing the usage of the volatile keyword on variables tied to hardware, like the DMA buffer and using the volatile keyword in that buffer made it look like it was the solution as adding or removing the change allowed for repeatable tests where the issue was happening or not, and I moved on. A few unrelated changes later, the issue is back. Do you have any suggestion of what should I check or what is the reason?

In addition, find attached the a picture with the ADC object before and after crashing.

EDIT:

I kept testing and saw that whenever the ADC error happens, the hadc1 handler has an error code = 2. Which results from the call to HAL_ADC_ErrorCallback(). However

I see that the error is set to 2 after an overrun error.

/* Set ADC error code to ADC IP internal error */

SET_BIT(hadc->ErrorCode, HAL_ADC_ERROR_OVR);

I dont know what might be causing this overrun. I found some an old post (https://community.st.com/s/question/0D50X00009XkaVr/f2xx-what-can-cause-an-adc-overrun) of a user describing the ADC usage with the same configuration as me (multimode, triggered by a timer and handling the data with the DMA) with this same error, but he did not post any solution. In another post he posted an article in his website (https://blog.frankvh.com/2012/01/13/stm32f2xx-stm32f4xx-dma-maximum-transactions/) where he interacted with ST support and concluded that the simultaneous use of more than 2 DMA channels was giving trouble... and I'm using 3 (ADC, UART TX, UART RX).

Then my question, is 8 years later than those posts, is there any extra information on the topic or should I just try to use DMA only twice?

Best regards.

mur · ‎2020-05-14

ST support helped me troubleshoot the issue and it was due to a high DMA usage which was solved by increasing the S/H times of the ADC. In my case I went from 1.5 cycles to 19.5 and that made it.

View solution in original post

TDK · ‎2020-05-05

> I have been developing some more over that and I see that after startup the values are correct but after a few ms the read values are not correct and in the checked cases, above the expected value.

What do "correct" and "not correct" mean here? How do you know they are incorrect?

> I have seen that changing totally unrelated parts of code affect this issue happening or not. Changing the compiler optimization also affected whether the issue happens or not.

Program behavior changing as a result of optimization generally means there's an issue somewhere. It's possible you're writing to an array out of bounds, or overwriting the stack, or accessing a local variable out of scope. You'll have to use the symptoms to diagnose where the issue might be.

I doubt usage of the "volatile" keyword here fixes the underlying issue.

> before and after crashing.

What do you mean by crashing?

If you feel a post has answered your question, please click "Accept as Solution".

mur · ‎2020-05-06

By correct and not correct I mean whether or not the ADC read value matches the voltage. For knowing that I measure the voltage with a scope and do the calculations to know what value should be read by the ADC (*4096/3.3).

By crashing I mean that moment where the ADC stops reading the proper value.

I see, well the only symptom I spotted so far is the ADC not working properly. In addition, I have seen that changing different parts of code fix it, but it looks like is just making a change which affects, not which change. For example I played around with an if..else statement: if I removed the if or the else (by just having the statement from the else) the issue didnt happen. And that was unrelated part of code. Could it be something with how the compilation process organizes the memory and by adding or changing this statements I'm causing the memory to be one way or another?

Best regards.

TDK · ‎2020-05-06

> Could it be something with how the compilation process organizes the memory and by adding or changing this statements I'm causing the memory to be one way or another?

Yes, the memory being rearranged could cause what you're seeing, if there's a bug somewhere.

Are you checking the ADC->DR register to determine if the reading is wrong or are you checking the value stored in memory? Your screenshots show the same DR value in both of them, within tolerance.

Is there a failing malloc call somewhere?

Doubt I'll be able to help much. Just guessing. Good luck.

If you feel a post has answered your question, please click "Accept as Solution".

mur · ‎2020-05-06

I'm using the ADCs in Dual regular simulataneous mode and collecting the values with the DMA. Then on ADC EOC interrupt, I'm parsing the values from the DMA and storing them in memory. The values I'm checking are in memory.

ADCs DR do not match the values before nor after crashing. I think that using the ADC in Scan mode, the DR register should keep the value of the last conversion. But that is not happening. :\

It's of little help, but changing the optimization -O1, -O2 makes the issue disappear. However after further code changes it appears again.

mur · ‎2020-05-08

I kept testing and saw that whenever the ADC error happens, the hadc1 handler has an error code = 2. Which results from the call to HAL_ADC_ErrorCallback(). However

I see that the error is set to 2 after an overrun error.

/* Set ADC error code to ADC IP internal error */

SET_BIT(hadc->ErrorCode, HAL_ADC_ERROR_OVR);

I dont know what might be causing this overrun. I found some an old post (https://community.st.com/s/question/0D50X00009XkaVr/f2xx-what-can-cause-an-adc-overrun) of a user describing the ADC usage with the same configuration as me (multimode, triggered by a timer and handling the data with the DMA) with this same error, but he did not post any solution. In another post he posted an article in his website (https://blog.frankvh.com/2012/01/13/stm32f2xx-stm32f4xx-dma-maximum-transactions/) where he interacted with ST support and concluded that the simultaneous use of more than 2 DMA channels was giving trouble... and I'm using 3 (ADC, UART TX, UART RX).

Then my question, is 8 years later than those posts, is there any extra information on the topic or should I just try to use DMA only twice?

TDK · ‎2020-05-08

Or maybe the DMA overrun flag is getting set is because the DMA overruns? What is your sample rate? Are you using FIFO? I have a project that uses 15 DMA stream at once, works fine.

If you feel a post has answered your question, please click "Accept as Solution".

mur · ‎2020-05-08

I'm sampling at 20kHz with timer trigger. My device is the stm32f334r8 and I don't think I have the fifo feature in the DMA so i will say no. I understand that the overrun is caused by the DMA not being able to service the ADC.

Then the only reason I can think of is that an interrupt with higher priority than the DMA is running and not allowing the DMA requeest to be serviced. Looking at my NVIC (and assuming that the DMA request will have the same priority as the DMA Interrupt, which is some assumption) configuration I dont really know what can the conflicting interrupt be,

In the interrupt for ADC EOC, I set a digital output to '1' on entering the INT and back to '0' at the INT end and analysed the digital output witha logic analyser and the CPU usage time is within threhold (11us) and I can also see that the digital output is at 0 when the error happens (I have a watchpoint at the error variable). What other reasons could there be for the overrun?

The DMA is configured as circular

TDK · ‎2020-05-08

DMA doesnt use interrupts to do its thing. At least, that wouldnt be the cause of the overrun. 20kHz is slow, so I doubt the reason is due to the sample rate.
Is it one channel per 20kHz or all of them at that rate?

If you feel a post has answered your question, please click "Accept as Solution".

mur · ‎2020-05-08

All of them at 20kHz. It is working in scan mode, so all of them are analysed after the trigger.

I was checking the errata sheet (https://www.st.com/resource/en/errata_sheet/dm00115957-stm32f334x4x6x8-rev-z-device-limitations-stmicroelectronics.pdf) and I found out that there's an errata describing my issue but that applies to interleaved and Im using regular so I guess it isnt that, although it does sounds very similar.

2.2 ADC limitations

2.2.1 DMA Overrun in dual interleaved mode with single DMA channel

Description: DMA overrun conditions can be encountered when two ADCs are working in dual interleaved mode with a single DMA channel for both (MDMA[1:0]bits equal to 0b10 or 0b11). This limitation applies in Single, Continuous and Discontinuous mode.WorkaroundThe MDMA [1:0] bits must be kept cleared and each ADC must have its own DMA channel enabled (dual DMA configuration).