Alan Chambers

Efficient typesafe bitfields using C++: Part 2

Blog Post created by Alan Chambers on Apr 4, 2018

Following on from the previous article, I would like to extend the C++ bit field implementation to make it easier to represent the full range of hardware registers supported by STM32s. Basically, I want to add support for arrays of bit fields.

 

If you haven't read Part 1 at this point, I recommend you do so before continuing.

 

Arrays of bit fields

As I pointed out in Part 1, a lot of STM32 registers contain what are basically arrays of bit fields. A good example of this is GPIO.MODER. This register contains an array of two bit elements, with 16 elements - one for each pin on a port.

 

Some arrays are spread over two or more registers. What we often see in the reference manual in this situation is a bunch of adjacent registers with very similar names. Sticking with GPIO, an example of this is GPIO.AFRL and GPIO.AFRH. These two registers could be regarded as making a single 64-bit register contain 16 4-bit elements. Each element is the alternate function index for the associated pin.

 

Some arrays are discontiguous, meaning that the indices which correspond to valid fields do not form a contiguous range of integral values, but have gaps. A good example of this is SYSCFG.EXTICRn. There are four 32-bit registers each containing four 4-bit elements of the array. For reasons unknown, only the low 16 bits in each register are used. We can handle this quite easily by allowing the array index to be an enumeration rather than a plain integer.

 

Using a template to represent an array

As before, we'll use a simple template to represent the array of fields. A template captures a lot of information at compile time, and allows the compiler to be more efficient. The key to having an object behave like an array is to overload the array subscript operator. We'll want to do something like this:

template <uint32_t REGCOUNT, typename IDXTYPE, uint8_t FLDSTART, uint8_t FLDSIZE, 
    typename FLDTYPE = uint32_t>
struct FieldArray
{   
    Something operator[](IDXTYPE index)   
    {
        ...   
    }

private:
    volatile uint32_t m_regs[REGCOUNT];  
};

FieldArray is a struct similar to Field. REGCOUNT specifies the number of physical registers the array extends across. For GPIO.MODER it is 1; for GPIO.AFRx it is 2. Note that the data in the struct is an array of registers of just this size. There is also a new type parameter, IDXTYPE, being the type that is used to index the array. IDXTYPE will be some form of integral type, including enumerations. Enumerations are useful for several reasons: they can give meaningful names to the items; they can be used to specify a very small range of integers; and, as I mentioned, they can be used to specify discontiguous sets of indices.

 

The array subscript operator has to return something, but what? If we were implementing an array of concrete objects, such as uint32_t values, or even Field objects, the obvious thing to return would be a reference to the object corresponding to the index. That reference would then be used to directly read or modify the object. Unfortunately, it is not possible in C++ for us to return a reference to a bit field: you can't take the address of a bit field. So we'll need to think of something else.

 

I have copied the approach used in the C++ Standard Template Library classes std::bitset and std::vector<bool> to solve this problem. The array subscript operator returns a proxy for the bit field corresponding to the index. The proxy is an object in its own right - an instance of struct FieldRef - and its sole function is to represent a single bit field in a register. This sounds a lot like the Field template from Part 1, and it is. The difference is that the particular register and bit offset of the proxy's bit field are not known at compile time. We have to calculate them from the index. Like this:

    FieldRef operator[](IDXTYPE index)
    {
        uint32_t start = static_cast<uint32_t>(index) * FLDSIZE + FLDSTART;
        volatile uint32_t* ptr = m_regs + (start / 32);
        return FieldRef(*ptr, start % 32);
    }  

The operator first calculates the offset to the first bit of the bit field indicated by the index. This is equivalent to the FLDSTART parameter of Field. In FieldArray, FLDSTART is used to represent a fixed offset before the start of the 0th bit field in the array. I'm not sure how much that will be used in practice, but you never know. The bit offset is then divided by 32 to find the register offset. Finally, the proxy object is created to reference the hardware register which contains the start of the indexed bit field, and is passed the offset of that field within that register. [Note: I used *, / and % rather than explicit shifts and masks for clarity - the compiler optimises them away.] 

 

This all means we can create and use a bit field array like this:

// Array of 4-bit fields of type AltFn, indexed by type Pin, extending over 2
// 32-bit registers.
FieldArray<2, Pin, 0, 4, AltFn> AFR;

// This resolves to the second register (i.e. AFRH), with a bit offset of 4 bits.
AFR[Pin::Pin9] = AltFn::AltFn7;

This is all very nice but is only part of the story. What does FieldRef actually do?

 

Proxy for a bit field array element

The FieldRef proxy is very similar to the Field struct. It overloads the cast operator to return values of the field's type, and it overloads the assignment operator to accept values of the field's type. It further overloads the assignment operator to accept other FieldRef objects. This is a convenience which avoids some explicit casting that might otherwise be necessary.

struct FieldRef
{
    operator FLDTYPE() const  
    {
        return static_cast<FLDTYPE>((m_reg >> m_start) & MAX);
    }

    FieldRef& operator=(const FLDTYPE val)
    {
        m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<uint32_t>(val)) << m_start;
        return *this;
    }

    FieldRef& operator=(const FieldRef& ref)
    {
        FLDTYPE val = ref;
        m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<uint32_t>(val)) << m_start;
        return *this;
    }
};

The reference to the underlying register (m_reg), and the bit field offset (m_start) are calculated by FieldArray when the FieldRef is created, as shown above, and are stored in private member variables. FieldRef will need a constructor to pass in the values for these members.

 

Finally, FieldRef makes sense as a nested structure defined inside FieldArray: it uses the same compile time constants as used to instantiate the array. The constructor is made private so that only friends can create objects, and FieldArray is the only friend. This last bit emphasises that the proxy is really supposed to be an invisible temporary used to facilitate array indexing.

 

The whole definition for FieldArray now looks like this (static and run time assertions omitted). This version of the code generalises the template with a type parameter to represent the type of the underlying register. For STM32 parts this is most likely always uint32_t. I have also not assumed that there are 8 bits in a byte, just for good measure :

template <typename REGTYPE, uint32_t REGCOUNT, typename IDXTYPE, uint8_t FLDSTART, 
    uint8_t FLDSIZE, typename FLDTYPE = REGTYPE>
struct FieldArray
{
    struct FieldRef
    {
        friend struct FieldArray;
        private:  
            FieldRef(volatile REGTYPE& reg, uint8_t start)
            : m_reg(reg)    
            , m_start(start) 
            {
            }
           
        public:   
            constexpr static REGTYPE MAX = (1 << FLDSIZE) - 1;

            operator FLDTYPE() const
            {
                return static_cast<FLDTYPE>((m_reg >> m_start) & MAX);
            }

            FieldRef& operator=(const FLDTYPE val)
            {
                m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<REGTYPE>(val)) << m_start;
                return *this;
            }

            FieldRef& operator=(const FieldRef& ref)
            {
                FLDTYPE val = ref;
                m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<REGTYPE>(val)) << m_start;
                return *this;
            }
           
        private:
            volatile REGTYPE& m_reg;
            uint8_t m_start;
    };

    constexpr static uint8_t REGBITS  = sizeof(REGTYPE) * CHAR_BIT;
    constexpr static uint8_t REGLIMIT = REGBITS * REGCOUNT;
   
    FieldRef operator[](IDXTYPE index)
    {
        uint32_t start = static_cast<uint32_t>(index) * FLDSIZE + FLDSTART;
        volatile REGTYPE* ptr = m_regs + (start / REGBITS);
        return FieldRef(*ptr, start % REGBITS);
    }  

private:
     volatile REGTYPE m_regs[REGCOUNT];
};

Sorry if that was a bit of a whistle stop tour. There is nothing much here that is really new compared to Field, apart from the array subscript operator.

 

Using the array of fields

Now that we have FieldArray defined, we are able to write code something like this:

struct GPIO_T
{
    union MODER_T
    {
        FieldArray<uint32_t, 1, Pin,  0, 2, Mode> MODE;
        uint32_t MODER; // Raw access to underlying register
    } MODER;
    // ... registers omitted
    union LCKR_T
    {
        FieldArray<uint32_t, 1, Pin,  0, 1, bool> LCK;
        Field<uint32_t,              16, 1, bool> LCKK;
        uint32_t LCKR;
    } LCKR;
    union AFR_T
    {
        FieldArray<uint32_t, 2, Pin,  0, 4, AltFn> AF;
        struct
        {
            uint32_t AFRL;
            uint32_t AFRH;
        };
    } AFR;
};

GPIO_T& GPIOA = *reinterpret_cast<GPIO_T*>(0x40002000);

constexpr AltFn AF_USART1_TX = AltFn::AltFn7;

void foo(Pin p)
{
    GPIOA.MODER.MODE[p] = Mode::Alternate;
    GPIOA.AFR.AF[p]     = AF_USART1_TX;
}

For me, using indexed fields for GPIO pin attributes feels very natural and convenient. We could also add individual fields for each pin index, with fields named as in the reference manual, but I'm not sure there's much gain in that. Using an enumeration for the pin index prevents any unfortunate silliness. We can give user-friendly names to generic enumerations like AltFn. For this I prefer constants over macros.

 

Comparison with equivalent C code

Using ARM gcc 7..2.1 with optimisation turned on, I copied the SPL's GPIO_PinAFConfig() function into Compiler Explorer and added a simple GPIO_Typedef to make it compile. I compared this to FieldArray as follows:

void gpio_af_config(GPIO_T& GPIOx, Pin pin, AltFn af)
{
    GPIOx.AFR.AF[pin] = af;
}

void GPIO_PinAFConfig(GPIO_TypeDef* GPIOx, uint16_t GPIO_PinSource, uint8_t GPIO_AF)
{
    uint32_t temp = 0x00;
    uint32_t temp_2 = 0x00;

    temp = ((uint32_t)(GPIO_AF) << ((uint32_t)((uint32_t)GPIO_PinSource & (uint32_t)0x07) * 4)) ;
    GPIOx->AFR[GPIO_PinSource >> 0x03] &= ~((uint32_t)0xF << ((uint32_t)((uint32_t)GPIO_PinSource & (uint32_t)0x07) * 4)) ;
    temp_2 = GPIOx->AFR[GPIO_PinSource >> 0x03] | temp;
    GPIOx->AFR[GPIO_PinSource >> 0x03] = temp_2;
}

The two functions have about the same length of object code. We should expect this, as they do basically the same work, and the optimiser is pretty good. Both of these functions were then called from main(), with the same parameters. Both calls were inlined in that case, leading to 5 instructions for gpio_af_config() and 7 instructions for GPIO_PinAFConfig() - the optimiser left in an unnecessary str/ldr pair in this case (good but not perfect). Given that it is a one-liner, I wonder whether a function like gpio_af_config() is even necessary.

gpio_af_config(gpio::GPIO_T&, gpio::Pin, gpio::AltFn):
  str lr, [sp, #-4]!
  lsl r1, r1, #2
  add r0, r0, #32
  lsr ip, r1, #5
  ldr r3, [r0, ip, lsl #2]
  and r1, r1, #31
  mov lr, #15
  bic r3, r3, lr, lsl r1
  and r2, r2, lr
  orr r3, r3, r2, lsl r1
  str r3, [r0, ip, lsl #2]
  ldr lr, [sp], #4
  bx lr
GPIO_PinAFConfig(GPIO_TypeDef*, unsigned short, unsigned char):
  and r3, r1, #7
  lsl r3, r3, #2
  asr r1, r1, #3
  add r0, r0, r1, lsl #2
  ldr r1, [r0, #128]
  mov ip, #15
  bic r1, r1, ip, lsl r3
  str r1, [r0, #128]
  ldr r1, [r0, #128]
  orr r2, r1, r2, lsl r3
  str r2, [r0, #128]
  bx lr

Conclusion and next steps

I think we are fairly close to being able to write a hardware abstraction layer something like LL, but which requires no documentation other than the Programming Manual, Reference Manual and datasheet relevant to the processor used in your project. Maybe not, but it seems like a nice idea... It's been kind of a goal of mine for quite a while: ever since I learned the SPL had been deprecated.  

 

I think my next trick should be to flesh out a number of peripherals, and write an actual program. I don't think this is difficult so much as a bit tedious.

 

Anyway, thanks again for reading, and all comments, corrections, and whatnot appreciated.

 

Al

Outcomes