Use of "volatile" keyword

MSull.1 · ‎2021-09-16

Hi, I'm working on an STM32H7 project with very fast (800 KHz) periodic interrupt. I've been looking at the generated code as I tweak the C source. I'm confused about the use of "volatile" for the global variables used by the ISR.

I feel like I need to specify volatile on the variables that the ISR uses so the foreground code doesn't assume that the value won't change between instructions and knows that results need to be stored timely. On the other hand, marking a variable volatile causes the generated code in the ISR to have a lot of unnecessary loads from memory. For example, the C sequence

if (myvar != 0)
{
  myvar -= 1;
  if (myvar == 0)
  {
    ...
  }
}

generates three loads of myvar to a register when myvar is volatile. Without volatile, it gets loaded into a register one time.

I kind of suspect the answer is not to worry about the foreground and just not specify volatile at all. But I'm wondering if there is a way to get the compiler to ensure these variables get treated as volatile in the foreground but not generate all of this redundant load/store activity in the ISR.

TDK · ‎2021-09-16

uint32_t myvar_shadow = myvar;
if (myvar_shadow != 0)
{
  myvar_shadow -= 1;
  myvar = myvar_shadow;
  if (myvar_shadow == 0)
  {
    ...
  }
}

Mark the variable as volatile, and if you're manipulating it, use a different variable to manipulate it, then store it. That other variable will be optimized to a register. You'll be left with a single load and a single store instruction to the memory address.

If you feel a post has answered your question, please click "Accept as Solution".

MSull.1 · ‎2021-09-16

Doh! That makes perfect sense. Thanks.

alister · ‎2021-09-17

Yes you'll need volatile where you need it. You've studied the assembler so you understand what it's doing and why it's required.

Need to consider atomicity too.

800kHz seems hot. Couldn't use a GPIO input capture or some other peripheral instead?

S.Ma · ‎2021-09-18

At cost of portability, if your global variable can be read in 1 memory cycle, you can in certain cases remove volatile, like written by interrupt, read by main loop. Sometime you could also use flags like set by interrupt and data is preserved until data is read by main and flag is cleared.

800kHz interrupt has high probability to fall apart once all interrupt sources and atomic or interrupt disabled code section will cause interrupt overrun. Use the available hw to scale down the interrupt rate.

Another use of volatile is to keep a special code in flash which can be activated in debug mode.

MSull.1 · ‎2021-09-19

I know this is a pretty fast interrupt but I did get it working well. The ISR is implementing a rate generator so there's not really a hardware alternative although I did consider using an FPGA for this project. I chose the STM32H7 because it's fast enough to do this in an ISR yet less expensive than an SoC. The ISR consumes about 10% of the CPU. Which is a lot, but the rest of the application doesn't require high performance so it works out well. This is a 480 MHz CPU and the rest of the functionality worked fine on a previous implementation in a 70 MHz 16-bit PIC. So I can certainly spare 10%.

There are other interrupts happening for timers, UARTS, and the USB peripheral. By setting the 800 KHz interrupt to a high preemptive priority (0) and the others to the minimum preemptive priority (3), I only have two-deep nesting at any given time and the 800 KHz interrupt never gets deferred.

Some of the variables are only accessed in the ISR, so those aren't marked volatile. For the variables that are accessed in the foreground and the ISR, I marked them volatile and used an explicit local register copy as TDK suggested. It probably doesn't matter but I marked those variables with the "register" storage class. I located all of the variables accessed by the ISR in the DTCM.

I had to use a DMB (memory barrier) to get this to work right. Otherwise, weird stuff happened which I interpreted as being some kind of deferred write not completing by the time the next interrupt occurs. I really don't understand this. I thought with the variables in DTCM would basically be as fast as registers.

So, it's one of those cases where it works well and I need to move on with the rest of the project rather than spending more time trying to get rid of the DMB. I am still curious, though.

Piranha · ‎2021-09-25

The volatile storage type is a restriction for compiler and can force a specific order of instructions in compiled code, but that doesn't mean that the CPU will execute those instructions in that same order. Reordering wasn't the case for simpler cores, but Cortex-M7 is dual issue and actually does it. To solve it, one has to deal with the memory types. DTCM is always treated as a normal memory type, but the order of operations is only guaranteed for device and strongly ordered memory types. And yes, the DMB instruction is the correct solution.

Read more about it in AN4838, AN4839 and here:

https://community.st.com/s/question/0D50X0000C4Nk4GSQS/bug-missing-compiler-and-cpu-memory-barriers