cancel
Showing results for 
Search instead for 
Did you mean: 

Strangely accurate ADC noise

EThom.3
Senior III

I've been struggling with a strange ADC problem for a while, and have now made some interesting observations.

The scenario

I'm using an STM32G474, where HRTIM directly controls a peltier driver for temperature regulation. Temperatures are measured by ADC4 (12 bit resolution), running in DMA circular mode, triggered by TIM7, at 1 kHz. This works. Temperatures are coming in, and are being used for PID regulation.

The problem

Quite often – perhaps always – some of the temperature readings are being disturbed, but only at certain temperatures. At first, I thought that it was random noise, but in a graph the noise looks strange. Not random at all.

After monitoring the raw ADC values that the DMA has thrown into memory, I discovered something that might be a clue – or it might just be a source of confusion. Every time a noisy sequence occurs, the ADC readout jumps to a lower nearby value where the five least significant bits are all 1, i.e. xxxx xxx1 1111. This is not random noise. It is if as the values are rounded down to nearest 0x001F.

A single spurious readout once in a while can be handled. But when this occurs, it happens so much that the PID regulation is disturbed.

The disturbances are visible in this chart. The orange trace are raw ADC samples, copied directly from the array that the DMA dumps the ADC values in. ADC values are on the right hand axis (decimal). The blue trace is the converted temperature readout, which is where I first noticed the problem.

Big chart 1.png

The disturbing ADC values in this example are 0x073F, 0x07BF, 0x083F and 0x08BF. All ending with the 11111 bit pattern, and exactly 128 ADC counts apart. I have also run tests where the spacing was 64 counts. This also tells me that this isn't random noise.

Some zoomed-in examples:

83F - 1.png

73F - 1.png

Apparently, it doesn't matter if the temperature is slowly moving up or down:

7BF - 1.png

83F - 2.png

During other tests, I have observed this behaviour at other values – but always with the xxxx xxx1 1111 pattern in the ADC value. I have watched the sampled values in RAM, by using STM32Programmer through the SWD interface. This is where I first spotted the recurring bit pattern.

I've checked my code (obviously), to see that the ADC value array isn't overwritten before is is handled in the interrupt function. The only thing that overwrites this array is the DMA channel.

What does the errata sheet say about this? The only section I've found that might be relevant, is this one:

EThom3_0-1778226117224.png

I'm using all five ADCs for different purposes, all using the same clock source, with prescalers set to 1. Perhaps ADC4 is being disturbed by one of the other ADCs. If this is the case, I may be able to make a work-around.

But again, these aren't random disturbances. They only happen in certain windows, hitting very specific values.

I am unsure about what to show you, regarding setup and code. There is a lot of code in this project. But here is how ADC4 is setup in CubeMX:

EThom3_1-1778226575722.png

Sampling time and offset is the same for all channels.

The function that initialises DMA, and starts ADC4:

void adcInitADC4(void)
{
  ADC4Handle->Instance->CR = ADC_CR_ADVREGEN; // ADC disable
  ADC4Handle->Instance->CR |= ADC_CR_ADCAL; // Start calibration

  ADC4DMAHandle->Instance->CCR &=~ DMA_CCR_EN;              // DMA channel off
  ADC4DMAHandle->Instance->CCR |= DMA_CCR_TCIE;             // Turn on DMA interrupt
  ADC4DMAHandle->Instance->CNDTR = 10;                       // Ten transfers to memory
  ADC4DMAHandle->Instance->CPAR = (uint32_t)&ADC4->DR;      // Set peripheral address (ADC data register)
  ADC4DMAHandle->Instance->CMAR = (uint32_t)&ADC4Buffer[0];   // Set memory address
  ADC4DMAHandle->Instance->CCR |= DMA_CCR_EN;               // Enable DMA

  while (ADC4Handle->Instance->CR & ADC_CR_ADCAL) __asm("NOP");

  ADC4Handle->Instance->CFGR |= ADC_CFGR_DMAEN; // DMA transfer enable

  ADC4Handle->Instance->ISR = ADC_ISR_ADRDY; // Clear ADC ready Flag
  ADC4Handle->Instance->CR |= ADC_CR_ADEN;  // Enable ADC
  while (!(ADC4Handle->Instance->ISR & ADC_ISR_ADRDY)); // Wait for ADC ready
  ADC4Handle->Instance->CR |= ADC_CR_ADSTART;  // Start ADC

  TIM7->CR1 |= TIM_CR1_CEN;
}

Can anyone help shed some light on this weirdness?

1 ACCEPTED SOLUTION

Accepted Solutions
EThom.3
Senior III

Indeed, the problem was caused by ADCs disturbing each other. And in a particularly nasty way, as the disturbances manifested as SAR bit-decision errors. That is why the patterns were so distinct.

To overcome this, I have completely changed the way the ADCs are triggered. In my main timer interrupt, I've made a sequence, so ADCs 1, 2, 4 and 5, are triggered at specific times, where they never collide.

I couldn't do this with ADC3, as it runs at a much higher sampling frequency. But I moved it to be triggered by update events from a different timer, which runs at a slightly odd frequency, not in sync with the main timer interrupt. In addition, I've set that timer to dither, so the exact sampling time wobbles. I believe this should randomise disturbances even further.

The result is this, which is a stark contrast to the first graph I uploaded:

EThom3_0-1778423459916.png

Thanks to all of you for your assistance! Should any of you find yourselves in Jutland some day, beer is on me!

View solution in original post

18 REPLIES 18
Ozone
Principal III

> During other tests, I have observed this behaviour at other values – but always with the xxxx xxx1 1111 pattern in the ADC value.

I tend to agree, this sounds like a systematic issue.
Have you been able to reproduce it on different boards/PCBs, or is it just one ?

The xxx1.1111 pattern is just one bit short of a carry-over to to the next MSB, so I would not exclude a problem of the individual MCU. 
Although I don't have any G4xx board, to add that.


@Ozone wrote:

Have you been able to reproduce it on different boards/PCBs, or is it just one ?


All the prototype boards of this batch exhibit this behaviour.

Well, that sounds very systematic.
I can't comment on G4xx internals, as mentioned.
Perhaps there is some silicon issue, and ST staff can comment.

Peltier elements are usually quite power-hungry.
Can you exclude power supply / VDDA / VREF issues caused by this external sources ?


@Ozone wrote:

 

Peltier elements are usually quite power-hungry.
Can you exclude power supply / VDDA / VREF issues caused by this external sources ?


This was my first hunch – that the switching circuits were inducing noise into the temperature signals. Other ADC inputs are unaffected, so I've ruled out VDDA and VREF noise. (How do you make subscript, by the way?)

I have also ruled out induced noise, as the issue seems to be tied to specific ADC values, rather than how much power the TECs are consuming. In my test setup, they don't eat much anyway. Typically a few W in the tests that I've shown. There is only the ambient air to fight them.

My next test will be connecting a potentiometer in place of a thermistor, so I can adjust the "temperature" without relying on the peltier drivers.

LCE
Principal II

I would also try to rule out any external influence from the drivers and power supplies, ground bounces, whatever. So a poti sounds like a good idea.

Are you sure, that the DMA buffer size gives you enough time to handle the data?

 

@Ozone said:

The xxx1.1111 pattern is just one bit short of a carry-over to to the next MSB,
> so I would not exclude a problem of the individual MCU. 

 

Nice catch!

If this was an FPGA, I would check the internal FlipFlop placement and make the clock restraints stricter.

With the MCU I would play with everything that might affect timing:

- ADC clock prescaler - maybe increase this and reduce:

- sampling time

- DMA buffer size

- interrupt priorities

Mikk Leini
Senior III

What clock are you using for ADC?


@Mikk Leini wrote:

What clock are you using for ADC?


PLLP, delivering 56.66667 MHz. HSE is a 4 MHz crystal, multiplied by 85 to make 340 MHz, and then divided by 2 to make 170 MHz for the SYSCLK, and by 6 for ADC clocks.


@LCE wrote:

I would also try to rule out any external influence from the drivers and power supplies, ground bounces, whatever. So a poti sounds like a good idea.

Are you sure, that the DMA buffer size gives you enough time to handle the data?

 

@Ozone said:

The xxx1.1111 pattern is just one bit short of a carry-over to to the next MSB,
> so I would not exclude a problem of the individual MCU. 

 

Nice catch!

If this was an FPGA, I would check the internal FlipFlop placement and make the clock restraints stricter.

With the MCU I would play with everything that might affect timing:

- ADC clock prescaler - maybe increase this and reduce:

- sampling time

- DMA buffer size

- interrupt priorities


Yes, there is plenty of time. I've made the trick of pulling a pin high in the beginning if the interrupt, and low in the end. Very little time – perhaps 1 % of the clock period or less – is spent in the interrupt.

A colleague fed my original post into some AI thing, which actually returned some interesting hints. While it had crossed my mind that the ADC is of the SAR type, I had not linked problem perhaps being related to the SAR comparator. But this fits very well. If something running in sync with the ADC (and TIM7) is disturbing the ADC, sometimes making the comparator generate a wrong bit in the middle of the conversion, then the remaining bits should all become either 0 or 1. If it always happens at the same time, in relation to the ADC trigger, then the same number of bits would be at fault every time, and the interval between the faulty values would be 2ⁿ. I have seen the intervals 64 and 128, but I'm pretty sure that the microcontroller had been restarted between these tests.

Now I have some things I wish to try, and I may have a work-around in mind. If I find a work-around (and, if possible, a reason), I will post it here.

170/6 isn't 56.6 MHz, but please check that you fit into ADC clock limits in datasheet chapter 5.3.19 Analog-to-digital converter characteristics. If it matters, I am personally a bit sceptic about that asynchronous clock mode. Add to that several errata issues about ADC's interference. Also AN5346 has quite a lot of information about setting up multiple ADC's properly and it further explains the errata issue that you pointed out. By the way, HRTIM has some errata also.

If you have a way to reproduce the issue, then you could try to isolate the problem by letting only ADC4 run or change the clocks, modes, sequences, etc. to see if anything has effect on the issue.