Alan Chambers

Efficient typesafe bitfields using C++: Part 1

Blog Post created by Alan Chambers on Mar 30, 2018

As a long time advocate for C++ in embedded development, I thought it might be fun to explore something that's been bothering me for a while...

 

Background

 

The STM32 microprocessors I've used - mainly F4s - have a lot of memory mapped hardware registers, and many of those registers contain two or more bit fields. There are several ways of dealing with bit fields in software, but they all basically (and necessarily) boil down to masking and shifting one or more bits either at compile time or at run time, or some combination thereof, often in conjunction with specific hardware base addresses cast to pointers to C structs whose types more or less map onto the underlying hardware peripherals. Files like stm32f4xx.h are good examples of this.

 

Some fields can be regarded as N-bit integers for which all possible bit combinations are valid. Some fields are integers for which only a range of the possible bit combinations are valid (e.g. RCC.PLLCFGR.PLLN - this notation denotes Peripheral.Register.Field). Quite a lot of fields have enumerated values with meaningful names, and it may be that only certain specific bit combinations are valid.

 

In addition to having a distinct type, there are many fields which can also be regarded as elements of an array of fields of the same type, which may reside in a single register, or spread across two or more registers, such GPIO.MODER.MODE[n] or GPIO.AFRx.AF[n]. These arrays may or may not be contiguous in memory (e.g. SYSCFG.EXTICRx.EXTI[n] is non-contiguous).

 

It is awfully easy to place invalid values in fields, or to place the intended values at the wrong location by shifting them incorrectly. The C compiler will be absolutely silent for most potential errors. Libraries such as the Standard Peripheral Library, HAL and LL go some way to reducing the potential for error by defining enumerations for some for the fields, and/or by providing an API with meaningfully named functions which take care of the bit twiddling needed to update one or more fields. The downside with libraries is that they can also obfuscate what is going on. I've use the SPL a fair bit, but was only happy with it once I stepped through enough of it to understand what it actually does. I have not used LL, but this function from the documentation looks easy enough to understand.

// Make PA4 an output. The two index parameters are uint32_t. 
LL_GPIO_SetPinMode(hgpioa, LL_GPIO_PIN_4, LL_GPIO_MODE_OUTPUT);

The mission

 

I would like to be able to treat the individual fields of a register in much the same way as I would treat the individual members of a struct. Each field should have a distinct type, which the compiler should enforce. All the bit twiddling to mask and shift values should be quietly managed in the background. Arrays of fields which spread over multiple registers should also be managed for the caller.

 

That is to say, I want to be able to write code something like these snippets:

// Make PA4 an output. Pin and Mode are scoped enumerations. 
// Other types won't compile.
GPIOA.MODER.MODE[Pin::P4] = Mode::Output;

// Turn on the HSI and wait for it to be ready.
// Both fields here are type bool.
RCC.CR.HSION = true;    
    while (!RCC.CR.HSIRDY);

And, naturally, I want this to be a cheap abstraction, with object code comparable to equivalent C.

Theoretically, C/C++'s native bit fields can go some way towards providing this functionality. The compiler silently inserts the masking and shifting necessary to access field values correctly. Native bit fields are implemented very efficiently, but are not guaranteed to have the memory layout you expect. I understand from the C standard that this is implementation dependent rather than hardware dependent, but notice that the CMSIS file core_cm4.h uses native bit fields. That seems odd, but perhaps my understanding is incorrect.

Using a template to represent a field

Most of the information about a bit field is known at compile time:

  • The type of the underlying register - I'll assume this to be uint32_t, but it is easy to generalise
  • The address of the register to which it belongs
  • The number of bits in the bit field
  • The start index of the bit field within the register
  • The type of the value which can be stored in the bit field
  • The default or reset value of the bit field - I'll ignore this because the value is set by hardware

 

A neat way to capture all this information and let the compiler do its thing as efficiently as possible is to use a C++ template. The bare bones of it looks something like this:

template <uint32_t BASEADDR, uint8_t FLDSTART, uint8_t FLDSIZE>
struct Field
{
    ...
};

Field is nothing more than an empty struct. The size of an empty struct is 1, which may or may not be important later. The key thing is that the struct is parameterised with three compile time constants:

  • BASEADDR is the address of the register to which the field belongs.
  • FLDSTART is the start index of the bit field within that register.
  • FLDSIZE is the number of bits in the bit field.

 

We could - but won't - use a bit field like this:

void foo()
{   
    // This field is bits [6:4] of the register at 0x40002000.   
    Field<0x40002000, 4, 3> FLD;   
    ...
}

More realistically, we might represent a register as a collection of bit fields like this:

// All the fields have the same register address.
template <uint32_t BASEADDR>
union Register
{
    Field<BASEADDR, 0,  4> FLD1; // Bits [3:0]
    Field<BASEADDR, 4,  3> FLD2; // Bits [6:4]
    Field<BASEADDR, 7, 20> FLD3; // Bits [26:7]
};

// Identical registers at different locations.
Register<0x40002000> REG1;
Register<0x40002004> REG2;

void bar()
{
    REG1.FLD1 =   7;
    REG2.FLD3 = 120;
}

Here we have three fields as notional data members of a union. Using a union means that the size of the register the same size as a single field. I'm not sure if this matters much in practice. I say "notional data" because the field objects do not actually hold any data: rather, they are our interface for reading and writing the fields they represent. If each field did hold data (i.e. a uint32_t), using a union would be essential to ensure that the three representations occupied the same memory locations. The distinction is actually sort of irrelevant: the client software doesn't know or care whether the field objects actually contain data or not, they only have to look as if they do.

 

Now we need to add some features to the struct to actually do some work for us. First up, calculate some additional constants at compile time. We'll need the maximum value for the field is when all bits are one, and the mask for the field, which is the maximum value shifted by the start index.

    constexpr static uint32_t MAX = (1 << FLDSIZE) - 1;
    constexpr static uint32_t MASK = MAX << FLDSTART;

To read the value of the bit field, we overload a cast operator. This allows field objects to be implicitly cast to uint32_t for assignments and other operations. The function casts BASEADDR to a pointer. The pointer is then dereferenced, masked and shifted to obtain the value of the field. reinterpret_cast is preferred to C-style casting, but is still pretty brutal: here it reinterprets the underlying bit pattern to convert an integer to a pointer:

    operator uint32_t() const
    {
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        return ((*ptr & MASK) >> FLDSTART);
    }
To write the value of the field, we overload the assignment operator to accept a uint32_t parameter. This allows uint32_t values to be assigned to the object without casting or an explicit function call. As before, the function casts BASEADDR to a pointer, and then does some bit twiddling on it to insert the assigned value into the proper bits.
    Field& operator=(const uint32_t val)
    {
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        *ptr = (*ptr & ~MASK) | ((MAX & val) << FLDSTART);
        return *this;
    }
And that's basically the whole idea in a nut shell. Here is the whole definition for Field:
template <uint32_t BASEADDR, uint8_t FLDSTART, uint8_t FLDSIZE>
struct Field
{
    constexpr static uint32_t MAX = (1 << FLDSIZE) - 1;
    constexpr static uint32_t MASK = MAX << FLDSTART;

    operator uint32_t() const
    {
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        return ((*ptr & MASK) >> FLDSTART);
    }

    Field& operator=(const uint32_t val)
    {
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        *ptr = (*ptr & ~MASK) | ((MAX & val) << FLDSTART);
        return *this;
    }
};

It is very simple, if you want them, to overload other operators such as &= and ++.

 

Refining the template

 

There are a couple of things I want to change about this struct.
The first is to generalise the data type of the field. At the moment it is assumed to be uint32_t. It is better to support a variety of types, including smaller integers, signed integers, boolean values and, importantly, enumerations. This is quite straightforward to do. We just need to add another template parameter, being the type of the field, and modify the two operators to return and accept values of that type. The type parameter can be defaulted to uint32_t if it is not specified.

 

The second thing I want to do is add some compile time assertions to help avoid errors. Although I haven't done so, it would be trivial to add a run time assertion in the assignment operator to check that the value is in range. After the changes, the code looks like this:
template <uint32_t BASEADDR, uint8_t FLDSTART, uint8_t FLDSIZE, 
    typename FLDTYPE = uint32_t>
struct Field
{
    static_assert(std::is_enum<FLDTYPE>::value || std::is_integral<FLDTYPE>::value,
           "Field type must be integral or enum");
    static_assert(sizeof(FLDTYPE) <= sizeof(uint32_t), "Field type too large");       
    static_assert((FLDSIZE + FLDSTART) <= 32, "Field size plus offset too large");

    constexpr static uint32_t MAX = (1 << FLDSIZE) - 1;       
    constexpr static uint32_t MASK = MAX << FLDSTART; 

    operator FLDTYPE() const       
    {               
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);       
        return static_cast<FLDTYPE>((*ptr & MASK) >> FLDSTART);       
    }   
       
    Field& operator=(const FLDTYPE val)
    {       
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        *ptr = (*ptr & ~MASK) | ((MAX & static_cast<uint32_t>(val)) << FLDSTART);
        return *this;
    }
};

Test driving the template

OK. So now it's time to take this beast for a spin. The following code contains a partial representation of a GPIO peripheral which makes use of the Field template, and a plain C representation of the same peripheral, which is meant to be similar to the code in stm32f4xx.h. The functions at the end each set the value of one of the fields, C style, and then C++ style.
// C++ representation of a peripheral
enum class Mode { Input, Output, Alternate, Analog };
enum class OType { PushPull, OpenDrain };

template <uint32_t BASE>
union GPIO_T
{
    union MODER_T
    {
        constexpr static uint32_t ADDR = BASE + 0x00;
        Field<ADDR, 0, 2, Mode> MODE0;
        Field<ADDR, 2, 2, Mode> MODE1; // ...
    } MODER;
    union OTYPER_T
    {
        constexpr static uint32_t ADDR = BASE + 0x04;
        Field<ADDR, 0, 1, OType> OT0;
        Field<ADDR, 1, 1, OType> OT1;
    } OTYPER;
};

GPIO_T<0x40002000> GPIOA;

// C representation of the same peripheral - kind of like CMSIS
struct GPIO
{
    volatile uint32_t MODER;
    volatile uint32_t OTYPER;
};

#define GPIOA2 ((GPIO*)(0x40002000))
#define GPIO_MODER_MAX     0x03
#define GPIO_MODER_OUTPUT  0x01
#define GPIO_MODER1_START     2
#define GPIO_MODER1_MASK   0x0C // ...

// Don't let the compiler optimise everything!
volatile Mode    mode  = Mode::Output;
volatile uint8_t cmode = GPIO_MODER_OUTPUT;

void foo_c()
{   
    GPIOA2->MODER = (GPIOA2->MODER & ~GPIO_MODER1_MASK) |
        ((cmode & GPIO_MODER_MAX) << GPIO_MODER1_START);
}

void foo_cpp()
{   
    GPIOA.MODER.MODE1 = mode;
}
For my money, the C++ version wins hands down. It is simple and intuitive, and much less prone to error. Mission accomplished! I'll admit that an SPL, HAL or LL function could also hide the bit twiddling (they do) and be just as brief and simple to understand but, even if they were type safe, they still wouldn't be as pretty in my view. And the field definition would be implicit in a load of macros, rather than explicit in the field declaration. Ah but wait... Granted it looks nice and all, how efficient is it? I've heard that templates cause code bloat.
I compiled the code online with the unbelievably excellent Compiler Explorer, which quickly shows what object code snippets will produce. I compiled the code with ARM GCC 7.2.1 with the options -O -std=c++11. The standard is specified just to make sure I stick to features no later than C++11 (for no particular reason). The template works fine with c++98, but you would have to drop or implement the static assertions, replace constexpr with const, and couldn't use enumeration classes.
The output speaks for itself - the two function implementations are pretty much identical. The template bit field abstraction has zero cost! In fact, as if to prove the point on safety, I accidentally mixed up the MAX and MASK C macros when preparing this example. I only spotted the typo because the compiler output was significantly different from what I expected from the C code.
foo_c():
        ldr     r1, .L2
        ldr     r2, [r1]
        ldr     r3, .L2+4
        ldrb    r3, [r3]        @ zero_extendqisi2
        lsl     r3, r3, #2
        and     r3, r3, #12
        bic     r2, r2, #12
        orr     r3, r3, r2
        str     r3, [r1]
        bx      lr
.L2:
        .word   1073750016
        .word   .LANCHOR0
foo_cpp():
        ldr     r3, .L5
        ldr     r3, [r3, #4]
        ldr     r1, .L5+4
        ldr     r2, [r1]
        lsl     r3, r3, #2
        and     r3, r3, #12
        bic     r2, r2, #12
        orr     r3, r3, r2
        str     r3, [r1]
        bx      lr
.L5:
        .word   .LANCHOR0
        .word   1073750016
cmode:
        .byte   1
mode:
        .word   1
GPIOA:
Personally, even as an advocate of C++, I was a bit surprised it turned out quite so well as this. Other compilers may not do such a good job, especially older ones.
To be fair, the situation is not quite so pretty when the optimisation is turned off. It's not terrible, but the operator function is not inlined, and it is less efficient. The C version is also less efficient, of course. And every instantiation of the template with different parameters will have its own version of this function. On the plus side, the compiler only generates code for the template functions you actually call, but it's not hard to see how that could be accused of being a bit bloated. But then, again, you're going to optimise your code for release, aren't you? Or you don't get involved in the whole premature optimisation thing.
Finally for the example code, we need to demonstrate type safety. The names of the enumerators are scoped with the name of the enumeration, and there is much stronger type-checking associated with them than old-school enumerations. The compiler will not implicitly convert integer types to enumerations, or vice versa, and will not implicitly convert an enumeration of one type into another. You can always cast if necessary, which the template does internally:
void foo_cpp()
{   
    GPIOA.MODER.MODE1 = mode;
    GPIOA.MODER.MODE1 = 1;         // Invalid type - doesn't compile

    Mode    x = GPIOA.MODER.MODE0;
    uint8_t y = GPIOA.MODER.MODE0; // Invalid type - doesn't compile
    OType   z = GPIOA.MODER.MODE0; // Invalid type - doesn't compile
          
    GPIOA.OTYPER.OT0  = OType::PushPull;      
    GPIOA.OTYPER.OT0  = 1;         // Invalid type - doesn't compile       
    GPIOA.OTYPER.OT1  = mode;      // Invalid type - doesn't compile
}

Summary and next steps

 

The template struct Field offers us type safe, portable, intuitive bit fields at zero (or at least low) cost over equivalent C code. There are also some useful static assertions, and it is trivial to include run time range checking on field values, if you want that. What's not to like? I have not used this code in a serious project yet, but I only wrote it last week. I hope you'll agree that it's definitely worth considering.
The title indicates that there is a Part 2 to follow. When I get around to writing it, there will be. I still need to discuss how to represent arrays of bit fields, which are a very useful addition. The exercise is a bit harder, but not too bad. And I also want to add "raw" access to the underlying registers, should that become necessary. With these features in place, we can look at creating a low level hardware access layer which is simple to use and pretty much directly maps onto the Peripheral.Register.Field descriptions in the various Programming Manuals and Reference Manuals. We should call CMSIS++ or some such.
As a final note, I would like to acknowledge that I am far from the first to come up with a template for portable bit fields. I was inspired to create mine after watching some CPPCON videos recently. All the other implementations I have seen contain data in the field, for which it is important to compose fields in a union to ensure that the underlying data members occupy the same memory. This approach is the way to go if you use bit fields for things other than memory mapped hardware registers, such as communications packets and the like. I found my more abstract approach more suited to creating arrays of fields which extend to multiple physical registers.
Anyway, I hope all this was at least a teensy bit entertaining, or educational, or something. All comments, criticisms, and whatnot welcome. Thanks for reading.
Al

PS Something weird happened to the formatting - the paragraphs are spaced in edit mode. Oh well...

Attachments

Outcomes