Skip navigation
All Places > STM32 MCUs Community > Blog
1 2 3 Previous Next

STM32 MCUs Community

33 posts

As a major Linux fan, I was pleased to find out that ST developed their recently released STM32CubeProgrammer as a cross-platform application. The STM32CubeProgrammer is a convenient tool for flashing firmware onto your STM32 microcontroller, utilizing either an ST-Link debugger interface or the STM32 on-chip bootloader. The goal of this article is to assist you with getting the STM32CubeProgrammer running on your Linux system.



I am the developer and maintainer of the open source OpenBLT bootloader. When working on OpenBLT, I try to work from my Linux machine as much as I can. Fortunately, this has gotten much easier ever since I added support of Atollic TrueSTUDIO to the bootloader demo programs.


When experimenting with and testing out new features in OpenBLT, I often need to quickly flash the bootloader onto the microcontroller. Some of these test programs are built without an IDE, so just GCC for ARM with plain Makefiles. Since the STM32 ST-Link utility is not available under Linux, I resorted to running OpenOCD from the terminal to flash the bootloader onto the microcontroller. Although this solution works, it is far from optimal. I had to interrupt my development work to look-up the correct command-line options for invoking OpenOCD every time I switched microcontroller targets.


Thanks to the STM32CubeProgrammer, I now finally have a quality tool with a convenient user interface for quickly flashing firmware files onto STM32 microcontrollers. The only downside was that it took me a little while to figure out how to get the STM32CubeProgrammer running on my Linux system. Here are step-by-step instructions that will hopefully save you some time.


Start by downloading the installer of the STM32CubeProgrammer from the ST website. You now have the file After extracting this zip-archive, you’ll see the installer SetupSTM32CubeProgrammer-1.0.0.linux. You can start the installation either by double-clicking this file from your file manager or by starting it from the terminal. Root privileges are not needed, because STM32CubeProgrammer can install itself into your home directory. From here on, simply follow the instructions from the installation wizard. The default settings work just fine.



After completing the installation, you can start STM32CubeProgrammer by selecting it from the program menu in your desktop environment or by starting it from the terminal. Unfortunately, the chances are high that the program doesn’t start. Here is what I saw after starting STM32CubeProgrammer from the terminal:


voorburg@debian ~/STM32CubeProg/bin $ ./STM32CubeProgrammer
Error: Could not find or load main class


Luckily, there is a simple solution to resolve this error by installing some missing dependencies. The STM32CubeProgrammer is developed in Java and based on the JavaFX graphical user interface toolkit library. This means that both OpenJDK and OpenJFX need to be installed on your Linux system. Most Linux distributions install OpenJDK by default, but not OpenJFX. So the secret sauce for getting STM32CubeProgrammer running is to install OpenJFX. Here is how you install OpenJFX on a Debian/Ubuntu system:


sudo apt install openjfx


The approach is similar for other Linux distributions that have a different package manager. On Fedora the package is also called openjfx, on Arch the package is called java-openjfx, and on openSUSE the package is called java-1_8_0-openjfx.


After installing OpenJFX, you’ll see that the STM32CubeProgrammer runs without problems and that you now have access to a great tool for quickly flashing firmware onto your STM32 microcontroller.


If you are interested in a more customizable bootloader solution, compared to the STM32 on-chip bootloader, have a look at the OpenBLT project. By default, it ships with GUI and CLI tools for making firmware updates. Additionally, it comes with a host programming library (LibOpenBLT) that gives you a powerful and easy to use API for quickly building your own firmware update tool.


Following on from the previous article, I would like to extend the C++ bit field implementation to make it easier to represent the full range of hardware registers supported by STM32s. Basically, I want to add support for arrays of bit fields.


If you haven't read Part 1 at this point, I recommend you do so before continuing.


Arrays of bit fields

As I pointed out in Part 1, a lot of STM32 registers contain what are basically arrays of bit fields. A good example of this is GPIO.MODER. This register contains an array of two bit elements, with 16 elements - one for each pin on a port.


Some arrays are spread over two or more registers. What we often see in the reference manual in this situation is a bunch of adjacent registers with very similar names. Sticking with GPIO, an example of this is GPIO.AFRL and GPIO.AFRH. These two registers could be regarded as making a single 64-bit register contain 16 4-bit elements. Each element is the alternate function index for the associated pin.


Some arrays are discontiguous, meaning that the indices which correspond to valid fields do not form a contiguous range of integral values, but have gaps. A good example of this is SYSCFG.EXTICRn. There are four 32-bit registers each containing four 4-bit elements of the array. For reasons unknown, only the low 16 bits in each register are used. We can handle this quite easily by allowing the array index to be an enumeration rather than a plain integer.


Using a template to represent an array

As before, we'll use a simple template to represent the array of fields. A template captures a lot of information at compile time, and allows the compiler to be more efficient. The key to having an object behave like an array is to overload the array subscript operator. We'll want to do something like this:

template <uint32_t REGCOUNT, typename IDXTYPE, uint8_t FLDSTART, uint8_t FLDSIZE, 
    typename FLDTYPE = uint32_t>
struct FieldArray
    Something operator[](IDXTYPE index)   

    volatile uint32_t m_regs[REGCOUNT];  

FieldArray is a struct similar to Field. REGCOUNT specifies the number of physical registers the array extends across. For GPIO.MODER it is 1; for GPIO.AFRx it is 2. Note that the data in the struct is an array of registers of just this size. There is also a new type parameter, IDXTYPE, being the type that is used to index the array. IDXTYPE will be some form of integral type, including enumerations. Enumerations are useful for several reasons: they can give meaningful names to the items; they can be used to specify a very small range of integers; and, as I mentioned, they can be used to specify discontiguous sets of indices.


The array subscript operator has to return something, but what? If we were implementing an array of concrete objects, such as uint32_t values, or even Field objects, the obvious thing to return would be a reference to the object corresponding to the index. That reference would then be used to directly read or modify the object. Unfortunately, it is not possible in C++ for us to return a reference to a bit field: you can't take the address of a bit field. So we'll need to think of something else.


I have copied the approach used in the C++ Standard Template Library classes std::bitset and std::vector<bool> to solve this problem. The array subscript operator returns a proxy for the bit field corresponding to the index. The proxy is an object in its own right - an instance of struct FieldRef - and its sole function is to represent a single bit field in a register. This sounds a lot like the Field template from Part 1, and it is. The difference is that the particular register and bit offset of the proxy's bit field are not known at compile time. We have to calculate them from the index. Like this:

    FieldRef operator[](IDXTYPE index)
        uint32_t start = static_cast<uint32_t>(index) * FLDSIZE + FLDSTART;
        volatile uint32_t* ptr = m_regs + (start / 32);
        return FieldRef(*ptr, start % 32);

The operator first calculates the offset to the first bit of the bit field indicated by the index. This is equivalent to the FLDSTART parameter of Field. In FieldArray, FLDSTART is used to represent a fixed offset before the start of the 0th bit field in the array. I'm not sure how much that will be used in practice, but you never know. The bit offset is then divided by 32 to find the register offset. Finally, the proxy object is created to reference the hardware register which contains the start of the indexed bit field, and is passed the offset of that field within that register. [Note: I used *, / and % rather than explicit shifts and masks for clarity - the compiler optimises them away.] 


This all means we can create and use a bit field array like this:

// Array of 4-bit fields of type AltFn, indexed by type Pin, extending over 2
// 32-bit registers.
FieldArray<2, Pin, 0, 4, AltFn> AFR;

// This resolves to the second register (i.e. AFRH), with a bit offset of 4 bits.
AFR[Pin::Pin9] = AltFn::AltFn7;

This is all very nice but is only part of the story. What does FieldRef actually do?


Proxy for a bit field array element

The FieldRef proxy is very similar to the Field struct. It overloads the cast operator to return values of the field's type, and it overloads the assignment operator to accept values of the field's type. It further overloads the assignment operator to accept other FieldRef objects. This is a convenience which avoids some explicit casting that might otherwise be necessary.

struct FieldRef
    operator FLDTYPE() const  
        return static_cast<FLDTYPE>((m_reg >> m_start) & MAX);

    FieldRef& operator=(const FLDTYPE val)
        m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<uint32_t>(val)) << m_start;
        return *this;

    FieldRef& operator=(const FieldRef& ref)
        FLDTYPE val = ref;
        m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<uint32_t>(val)) << m_start;
        return *this;

The reference to the underlying register (m_reg), and the bit field offset (m_start) are calculated by FieldArray when the FieldRef is created, as shown above, and are stored in private member variables. FieldRef will need a constructor to pass in the values for these members.


Finally, FieldRef makes sense as a nested structure defined inside FieldArray: it uses the same compile time constants as used to instantiate the array. The constructor is made private so that only friends can create objects, and FieldArray is the only friend. This last bit emphasises that the proxy is really supposed to be an invisible temporary used to facilitate array indexing.


The whole definition for FieldArray now looks like this (static and run time assertions omitted). This version of the code generalises the template with a type parameter to represent the type of the underlying register. For STM32 parts this is most likely always uint32_t. I have also not assumed that there are 8 bits in a byte, just for good measure :

template <typename REGTYPE, uint32_t REGCOUNT, typename IDXTYPE, uint8_t FLDSTART, 
    uint8_t FLDSIZE, typename FLDTYPE = REGTYPE>
struct FieldArray
    struct FieldRef
        friend struct FieldArray;
            FieldRef(volatile REGTYPE& reg, uint8_t start)
            : m_reg(reg)    
            , m_start(start) 
            constexpr static REGTYPE MAX = (1 << FLDSIZE) - 1;

            operator FLDTYPE() const
                return static_cast<FLDTYPE>((m_reg >> m_start) & MAX);

            FieldRef& operator=(const FLDTYPE val)
                m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<REGTYPE>(val)) << m_start;
                return *this;

            FieldRef& operator=(const FieldRef& ref)
                FLDTYPE val = ref;
                m_reg = (m_reg & ~(MAX << m_start)) | (MAX & static_cast<REGTYPE>(val)) << m_start;
                return *this;
            volatile REGTYPE& m_reg;
            uint8_t m_start;

    constexpr static uint8_t REGBITS  = sizeof(REGTYPE) * CHAR_BIT;
    constexpr static uint8_t REGLIMIT = REGBITS * REGCOUNT;
    FieldRef operator[](IDXTYPE index)
        uint32_t start = static_cast<uint32_t>(index) * FLDSIZE + FLDSTART;
        volatile REGTYPE* ptr = m_regs + (start / REGBITS);
        return FieldRef(*ptr, start % REGBITS);

     volatile REGTYPE m_regs[REGCOUNT];

Sorry if that was a bit of a whistle stop tour. There is nothing much here that is really new compared to Field, apart from the array subscript operator.


Using the array of fields

Now that we have FieldArray defined, we are able to write code something like this:

struct GPIO_T
    union MODER_T
        FieldArray<uint32_t, 1, Pin,  0, 2, Mode> MODE;
        uint32_t MODER; // Raw access to underlying register
    } MODER;
    // ... registers omitted
    union LCKR_T
        FieldArray<uint32_t, 1, Pin,  0, 1, bool> LCK;
        Field<uint32_t,              16, 1, bool> LCKK;
        uint32_t LCKR;
    } LCKR;
    union AFR_T
        FieldArray<uint32_t, 2, Pin,  0, 4, AltFn> AF;
            uint32_t AFRL;
            uint32_t AFRH;
    } AFR;

GPIO_T& GPIOA = *reinterpret_cast<GPIO_T*>(0x40002000);

constexpr AltFn AF_USART1_TX = AltFn::AltFn7;

void foo(Pin p)
    GPIOA.MODER.MODE[p] = Mode::Alternate;
    GPIOA.AFR.AF[p]     = AF_USART1_TX;

For me, using indexed fields for GPIO pin attributes feels very natural and convenient. We could also add individual fields for each pin index, with fields named as in the reference manual, but I'm not sure there's much gain in that. Using an enumeration for the pin index prevents any unfortunate silliness. We can give user-friendly names to generic enumerations like AltFn. For this I prefer constants over macros.


Comparison with equivalent C code

Using ARM gcc 7..2.1 with optimisation turned on, I copied the SPL's GPIO_PinAFConfig() function into Compiler Explorer and added a simple GPIO_Typedef to make it compile. I compared this to FieldArray as follows:

void gpio_af_config(GPIO_T& GPIOx, Pin pin, AltFn af)
    GPIOx.AFR.AF[pin] = af;

void GPIO_PinAFConfig(GPIO_TypeDef* GPIOx, uint16_t GPIO_PinSource, uint8_t GPIO_AF)
    uint32_t temp = 0x00;
    uint32_t temp_2 = 0x00;

    temp = ((uint32_t)(GPIO_AF) << ((uint32_t)((uint32_t)GPIO_PinSource & (uint32_t)0x07) * 4)) ;
    GPIOx->AFR[GPIO_PinSource >> 0x03] &= ~((uint32_t)0xF << ((uint32_t)((uint32_t)GPIO_PinSource & (uint32_t)0x07) * 4)) ;
    temp_2 = GPIOx->AFR[GPIO_PinSource >> 0x03] | temp;
    GPIOx->AFR[GPIO_PinSource >> 0x03] = temp_2;

The two functions have about the same length of object code. We should expect this, as they do basically the same work, and the optimiser is pretty good. Both of these functions were then called from main(), with the same parameters. Both calls were inlined in that case, leading to 5 instructions for gpio_af_config() and 7 instructions for GPIO_PinAFConfig() - the optimiser left in an unnecessary str/ldr pair in this case (good but not perfect). Given that it is a one-liner, I wonder whether a function like gpio_af_config() is even necessary.

gpio_af_config(gpio::GPIO_T&, gpio::Pin, gpio::AltFn):
  str lr, [sp, #-4]!
  lsl r1, r1, #2
  add r0, r0, #32
  lsr ip, r1, #5
  ldr r3, [r0, ip, lsl #2]
  and r1, r1, #31
  mov lr, #15
  bic r3, r3, lr, lsl r1
  and r2, r2, lr
  orr r3, r3, r2, lsl r1
  str r3, [r0, ip, lsl #2]
  ldr lr, [sp], #4
  bx lr
GPIO_PinAFConfig(GPIO_TypeDef*, unsigned short, unsigned char):
  and r3, r1, #7
  lsl r3, r3, #2
  asr r1, r1, #3
  add r0, r0, r1, lsl #2
  ldr r1, [r0, #128]
  mov ip, #15
  bic r1, r1, ip, lsl r3
  str r1, [r0, #128]
  ldr r1, [r0, #128]
  orr r2, r1, r2, lsl r3
  str r2, [r0, #128]
  bx lr

Conclusion and next steps

I think we are fairly close to being able to write a hardware abstraction layer something like LL, but which requires no documentation other than the Programming Manual, Reference Manual and datasheet relevant to the processor used in your project. Maybe not, but it seems like a nice idea... It's been kind of a goal of mine for quite a while: ever since I learned the SPL had been deprecated.  


I think my next trick should be to flesh out a number of peripherals, and write an actual program. I don't think this is difficult so much as a bit tedious.


Anyway, thanks again for reading, and all comments, corrections, and whatnot appreciated.



As a long time advocate for C++ in embedded development, I thought it might be fun to explore something that's been bothering me for a while...




The STM32 microprocessors I've used - mainly F4s - have a lot of memory mapped hardware registers, and many of those registers contain two or more bit fields. There are several ways of dealing with bit fields in software, but they all basically (and necessarily) boil down to masking and shifting one or more bits either at compile time or at run time, or some combination thereof, often in conjunction with specific hardware base addresses cast to pointers to C structs whose types more or less map onto the underlying hardware peripherals. Files like stm32f4xx.h are good examples of this.


Some fields can be regarded as N-bit integers for which all possible bit combinations are valid. Some fields are integers for which only a range of the possible bit combinations are valid (e.g. RCC.PLLCFGR.PLLN - this notation denotes Peripheral.Register.Field). Quite a lot of fields have enumerated values with meaningful names, and it may be that only certain specific bit combinations are valid.


In addition to having a distinct type, there are many fields which can also be regarded as elements of an array of fields of the same type, which may reside in a single register, or spread across two or more registers, such GPIO.MODER.MODE[n] or GPIO.AFRx.AF[n]. These arrays may or may not be contiguous in memory (e.g. SYSCFG.EXTICRx.EXTI[n] is non-contiguous).


It is awfully easy to place invalid values in fields, or to place the intended values at the wrong location by shifting them incorrectly. The C compiler will be absolutely silent for most potential errors. Libraries such as the Standard Peripheral Library, HAL and LL go some way to reducing the potential for error by defining enumerations for some for the fields, and/or by providing an API with meaningfully named functions which take care of the bit twiddling needed to update one or more fields. The downside with libraries is that they can also obfuscate what is going on. I've use the SPL a fair bit, but was only happy with it once I stepped through enough of it to understand what it actually does. I have not used LL, but this function from the documentation looks easy enough to understand.

// Make PA4 an output. The two index parameters are uint32_t. 

The mission


I would like to be able to treat the individual fields of a register in much the same way as I would treat the individual members of a struct. Each field should have a distinct type, which the compiler should enforce. All the bit twiddling to mask and shift values should be quietly managed in the background. Arrays of fields which spread over multiple registers should also be managed for the caller.


That is to say, I want to be able to write code something like these snippets:

// Make PA4 an output. Pin and Mode are scoped enumerations. 
// Other types won't compile.
GPIOA.MODER.MODE[Pin::P4] = Mode::Output;

// Turn on the HSI and wait for it to be ready.
// Both fields here are type bool.
RCC.CR.HSION = true;    
    while (!RCC.CR.HSIRDY);

And, naturally, I want this to be a cheap abstraction, with object code comparable to equivalent C.

Theoretically, C/C++'s native bit fields can go some way towards providing this functionality. The compiler silently inserts the masking and shifting necessary to access field values correctly. Native bit fields are implemented very efficiently, but are not guaranteed to have the memory layout you expect. I understand from the C standard that this is implementation dependent rather than hardware dependent, but notice that the CMSIS file core_cm4.h uses native bit fields. That seems odd, but perhaps my understanding is incorrect.

Using a template to represent a field

Most of the information about a bit field is known at compile time:

  • The type of the underlying register - I'll assume this to be uint32_t, but it is easy to generalise
  • The address of the register to which it belongs
  • The number of bits in the bit field
  • The start index of the bit field within the register
  • The type of the value which can be stored in the bit field
  • The default or reset value of the bit field - I'll ignore this because the value is set by hardware


A neat way to capture all this information and let the compiler do its thing as efficiently as possible is to use a C++ template. The bare bones of it looks something like this:

template <uint32_t BASEADDR, uint8_t FLDSTART, uint8_t FLDSIZE>
struct Field

Field is nothing more than an empty struct. The size of an empty struct is 1, which may or may not be important later. The key thing is that the struct is parameterised with three compile time constants:

  • BASEADDR is the address of the register to which the field belongs.
  • FLDSTART is the start index of the bit field within that register.
  • FLDSIZE is the number of bits in the bit field.


We could - but won't - use a bit field like this:

void foo()
    // This field is bits [6:4] of the register at 0x40002000.   
    Field<0x40002000, 4, 3> FLD;   

More realistically, we might represent a register as a collection of bit fields like this:

// All the fields have the same register address.
template <uint32_t BASEADDR>
union Register
    Field<BASEADDR, 0,  4> FLD1; // Bits [3:0]
    Field<BASEADDR, 4,  3> FLD2; // Bits [6:4]
    Field<BASEADDR, 7, 20> FLD3; // Bits [26:7]

// Identical registers at different locations.
Register<0x40002000> REG1;
Register<0x40002004> REG2;

void bar()
    REG1.FLD1 =   7;
    REG2.FLD3 = 120;

Here we have three fields as notional data members of a union. Using a union means that the size of the register the same size as a single field. I'm not sure if this matters much in practice. I say "notional data" because the field objects do not actually hold any data: rather, they are our interface for reading and writing the fields they represent. If each field did hold data (i.e. a uint32_t), using a union would be essential to ensure that the three representations occupied the same memory locations. The distinction is actually sort of irrelevant: the client software doesn't know or care whether the field objects actually contain data or not, they only have to look as if they do.


Now we need to add some features to the struct to actually do some work for us. First up, calculate some additional constants at compile time. We'll need the maximum value for the field is when all bits are one, and the mask for the field, which is the maximum value shifted by the start index.

    constexpr static uint32_t MAX = (1 << FLDSIZE) - 1;
    constexpr static uint32_t MASK = MAX << FLDSTART;

To read the value of the bit field, we overload a cast operator. This allows field objects to be implicitly cast to uint32_t for assignments and other operations. The function casts BASEADDR to a pointer. The pointer is then dereferenced, masked and shifted to obtain the value of the field. reinterpret_cast is preferred to C-style casting, but is still pretty brutal: here it reinterprets the underlying bit pattern to convert an integer to a pointer:

    operator uint32_t() const
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        return ((*ptr & MASK) >> FLDSTART);
To write the value of the field, we overload the assignment operator to accept a uint32_t parameter. This allows uint32_t values to be assigned to the object without casting or an explicit function call. As before, the function casts BASEADDR to a pointer, and then does some bit twiddling on it to insert the assigned value into the proper bits.
    Field& operator=(const uint32_t val)
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        *ptr = (*ptr & ~MASK) | ((MAX & val) << FLDSTART);
        return *this;
And that's basically the whole idea in a nut shell. Here is the whole definition for Field:
template <uint32_t BASEADDR, uint8_t FLDSTART, uint8_t FLDSIZE>
struct Field
    constexpr static uint32_t MAX = (1 << FLDSIZE) - 1;
    constexpr static uint32_t MASK = MAX << FLDSTART;

    operator uint32_t() const
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        return ((*ptr & MASK) >> FLDSTART);

    Field& operator=(const uint32_t val)
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        *ptr = (*ptr & ~MASK) | ((MAX & val) << FLDSTART);
        return *this;

It is very simple, if you want them, to overload other operators such as &= and ++.


Refining the template


There are a couple of things I want to change about this struct.
The first is to generalise the data type of the field. At the moment it is assumed to be uint32_t. It is better to support a variety of types, including smaller integers, signed integers, boolean values and, importantly, enumerations. This is quite straightforward to do. We just need to add another template parameter, being the type of the field, and modify the two operators to return and accept values of that type. The type parameter can be defaulted to uint32_t if it is not specified.


The second thing I want to do is add some compile time assertions to help avoid errors. Although I haven't done so, it would be trivial to add a run time assertion in the assignment operator to check that the value is in range. After the changes, the code looks like this:
template <uint32_t BASEADDR, uint8_t FLDSTART, uint8_t FLDSIZE, 
    typename FLDTYPE = uint32_t>
struct Field
    static_assert(std::is_enum<FLDTYPE>::value || std::is_integral<FLDTYPE>::value,
           "Field type must be integral or enum");
    static_assert(sizeof(FLDTYPE) <= sizeof(uint32_t), "Field type too large");       
    static_assert((FLDSIZE + FLDSTART) <= 32, "Field size plus offset too large");

    constexpr static uint32_t MAX = (1 << FLDSIZE) - 1;       
    constexpr static uint32_t MASK = MAX << FLDSTART; 

    operator FLDTYPE() const       
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);       
        return static_cast<FLDTYPE>((*ptr & MASK) >> FLDSTART);       
    Field& operator=(const FLDTYPE val)
        volatile uint32_t* ptr = reinterpret_cast<volatile uint32_t*>(BASEADDR);
        *ptr = (*ptr & ~MASK) | ((MAX & static_cast<uint32_t>(val)) << FLDSTART);
        return *this;

Test driving the template

OK. So now it's time to take this beast for a spin. The following code contains a partial representation of a GPIO peripheral which makes use of the Field template, and a plain C representation of the same peripheral, which is meant to be similar to the code in stm32f4xx.h. The functions at the end each set the value of one of the fields, C style, and then C++ style.
// C++ representation of a peripheral
enum class Mode { Input, Output, Alternate, Analog };
enum class OType { PushPull, OpenDrain };

template <uint32_t BASE>
union GPIO_T
    union MODER_T
        constexpr static uint32_t ADDR = BASE + 0x00;
        Field<ADDR, 0, 2, Mode> MODE0;
        Field<ADDR, 2, 2, Mode> MODE1; // ...
    } MODER;
    union OTYPER_T
        constexpr static uint32_t ADDR = BASE + 0x04;
        Field<ADDR, 0, 1, OType> OT0;
        Field<ADDR, 1, 1, OType> OT1;
    } OTYPER;

GPIO_T<0x40002000> GPIOA;

// C representation of the same peripheral - kind of like CMSIS
struct GPIO
    volatile uint32_t MODER;
    volatile uint32_t OTYPER;

#define GPIOA2 ((GPIO*)(0x40002000))
#define GPIO_MODER_MAX     0x03
#define GPIO_MODER_OUTPUT  0x01
#define GPIO_MODER1_START     2
#define GPIO_MODER1_MASK   0x0C // ...

// Don't let the compiler optimise everything!
volatile Mode    mode  = Mode::Output;
volatile uint8_t cmode = GPIO_MODER_OUTPUT;

void foo_c()
        ((cmode & GPIO_MODER_MAX) << GPIO_MODER1_START);

void foo_cpp()
    GPIOA.MODER.MODE1 = mode;
For my money, the C++ version wins hands down. It is simple and intuitive, and much less prone to error. Mission accomplished! I'll admit that an SPL, HAL or LL function could also hide the bit twiddling (they do) and be just as brief and simple to understand but, even if they were type safe, they still wouldn't be as pretty in my view. And the field definition would be implicit in a load of macros, rather than explicit in the field declaration. Ah but wait... Granted it looks nice and all, how efficient is it? I've heard that templates cause code bloat.
I compiled the code online with the unbelievably excellent Compiler Explorer, which quickly shows what object code snippets will produce. I compiled the code with ARM GCC 7.2.1 with the options -O -std=c++11. The standard is specified just to make sure I stick to features no later than C++11 (for no particular reason). The template works fine with c++98, but you would have to drop or implement the static assertions, replace constexpr with const, and couldn't use enumeration classes.
The output speaks for itself - the two function implementations are pretty much identical. The template bit field abstraction has zero cost! In fact, as if to prove the point on safety, I accidentally mixed up the MAX and MASK C macros when preparing this example. I only spotted the typo because the compiler output was significantly different from what I expected from the C code.
        ldr     r1, .L2
        ldr     r2, [r1]
        ldr     r3, .L2+4
        ldrb    r3, [r3]        @ zero_extendqisi2
        lsl     r3, r3, #2
        and     r3, r3, #12
        bic     r2, r2, #12
        orr     r3, r3, r2
        str     r3, [r1]
        bx      lr
        .word   1073750016
        .word   .LANCHOR0
        ldr     r3, .L5
        ldr     r3, [r3, #4]
        ldr     r1, .L5+4
        ldr     r2, [r1]
        lsl     r3, r3, #2
        and     r3, r3, #12
        bic     r2, r2, #12
        orr     r3, r3, r2
        str     r3, [r1]
        bx      lr
        .word   .LANCHOR0
        .word   1073750016
        .byte   1
        .word   1
Personally, even as an advocate of C++, I was a bit surprised it turned out quite so well as this. Other compilers may not do such a good job, especially older ones.
To be fair, the situation is not quite so pretty when the optimisation is turned off. It's not terrible, but the operator function is not inlined, and it is less efficient. The C version is also less efficient, of course. And every instantiation of the template with different parameters will have its own version of this function. On the plus side, the compiler only generates code for the template functions you actually call, but it's not hard to see how that could be accused of being a bit bloated. But then, again, you're going to optimise your code for release, aren't you? Or you don't get involved in the whole premature optimisation thing.
Finally for the example code, we need to demonstrate type safety. The names of the enumerators are scoped with the name of the enumeration, and there is much stronger type-checking associated with them than old-school enumerations. The compiler will not implicitly convert integer types to enumerations, or vice versa, and will not implicitly convert an enumeration of one type into another. You can always cast if necessary, which the template does internally:
void foo_cpp()
    GPIOA.MODER.MODE1 = mode;
    GPIOA.MODER.MODE1 = 1;         // Invalid type - doesn't compile

    Mode    x = GPIOA.MODER.MODE0;
    uint8_t y = GPIOA.MODER.MODE0; // Invalid type - doesn't compile
    OType   z = GPIOA.MODER.MODE0; // Invalid type - doesn't compile
    GPIOA.OTYPER.OT0  = OType::PushPull;      
    GPIOA.OTYPER.OT0  = 1;         // Invalid type - doesn't compile       
    GPIOA.OTYPER.OT1  = mode;      // Invalid type - doesn't compile

Summary and next steps


The template struct Field offers us type safe, portable, intuitive bit fields at zero (or at least low) cost over equivalent C code. There are also some useful static assertions, and it is trivial to include run time range checking on field values, if you want that. What's not to like? I have not used this code in a serious project yet, but I only wrote it last week. I hope you'll agree that it's definitely worth considering.
The title indicates that there is a Part 2 to follow. When I get around to writing it, there will be. I still need to discuss how to represent arrays of bit fields, which are a very useful addition. The exercise is a bit harder, but not too bad. And I also want to add "raw" access to the underlying registers, should that become necessary. With these features in place, we can look at creating a low level hardware access layer which is simple to use and pretty much directly maps onto the Peripheral.Register.Field descriptions in the various Programming Manuals and Reference Manuals. We should call CMSIS++ or some such.
As a final note, I would like to acknowledge that I am far from the first to come up with a template for portable bit fields. I was inspired to create mine after watching some CPPCON videos recently. All the other implementations I have seen contain data in the field, for which it is important to compose fields in a union to ensure that the underlying data members occupy the same memory. This approach is the way to go if you use bit fields for things other than memory mapped hardware registers, such as communications packets and the like. I found my more abstract approach more suited to creating arrays of fields which extend to multiple physical registers.
Anyway, I hope all this was at least a teensy bit entertaining, or educational, or something. All comments, criticisms, and whatnot welcome. Thanks for reading.

PS Something weird happened to the formatting - the paragraphs are spaced in edit mode. Oh well...

On the ST's FAQ page, the following could be read some time ago:

Use Cortex-M3 Bit-banding feature for interrupt clearing since it is an atomic operation and NVIC pending interrupts will be ignored during this operation, however Read-Modify-Write is not.

Now this FAQ page is gone already, but the quote is perpetuated on the web and in some materials. The problem with it is, that it is not entirely true. It's true that NVIC pending interrups will be ignored during bit-banding, but it's not true that it's a good method to clear interrupts. In fact, it's dangerous, don't do it unless you exactly know what are you doing.


So let's get this straight.


Bit-banding is a feature of ARM Cortex-M3 and Cortex-M4 processors, allowing certain portions of memory space (including a portion which is usually mapped to peripherals) to be accessed in bit-wise manner. This feature was introduced to attract programmers used to bit-addressable memory from other mcu architectures, most prominently the x51. It is present only in Cortex-M3 and M4, i.e. not present in M0 and M0+, nor in M7. Even in M3 and M4 it is an optional feature and implementers (semiconductor manufacturers) may chose whether to implement it or not - ST's implementation always do implement it, i.e. bit-banding is available on the 'F1, 'F3, 'F4, and L1 and 'L4 subfamilies.


The bit-wise access is realized through a respective alias region, where every single bit in the original memory./peripheral address space has assigned a corresponding word (32-bits). Reading that word return 0x00000000 or 0x00000001, depending on what is the state of the corresponding bit; and when writing to that word, the lowermost bit will be actually written into the original bit, not affecting other bits in the word containing the original bit.


This is how things look like from the processor's (and thus the programmer's) point of view. But to understand, what is going on, we need to get down to the nasty details of how is this feature implemented in hardware.


The truth is, that contrary to x51, there is no special hardware allowing to flip individual bits. The processor is still interfaced through a 32-bit bus matrix to 32-bit memories and peripherals, so it can only manipulate data in 32-bit chunks (more precisely, it can also do it in 8-bit and 16-bit chunks, if the attached memory or peripheral implements the byte-select signals of the AHB bus; but never in single-bit). So the trick lies in a simple attachment between the processor's S-port and the bus matrix: when the processor attempts to read from the bit-addressable area, the attachment converts the bit-address to the basic word's address, reads from the bus matrix at that address, takes the read word and rotates it the required number of bits and submits that as result to processor (the processor is stalled by the attachment all that time). Writing is slightly more tricky: the attachment issues first a read on the real word address, then takes the read data, masks the required bit, replaces it with the written one, and then performs the writeback through the bus matrix.


So, a bit-banding write is in fact a read-modify-write operation on a whole 32-bit word, from the point of view of the attached memory or peripheral. During this time, the AHB bus is locked down (there's a special signal for that in the bus), so no other master (such as DMA) can interfere. The processor is left to run until it attempts to access the S-port again, when it is stalled until the operation ends.


This means, atomicity of the operation is preserved, as far as the program is concerned (in this the quote is true); and also the possibility of other busmasters interfering has been taken care of. So what could possibly go wrong?


The peripheral itself.


In many peripherals, there are status words containing individual status bits indicating the states through which the internal state machine of the peripheral has passed. As these are set by hardware, they are usually of the clear-by-writing-1 (c1) or clear-by-writing-0 type (c0) - in the former, writing 1 clears such bit but writing 0 leaves it unaffected (and in the latter it's exactly the opposite), so the proper operation to clear certain bits in such register is to write a mask, not to read-modify-write. And this applies not only to software RMW (i.e. register |= mask or register &= ~mask, depending on whether it's c1 or c0 type which many users already know is no-no), but also to the hardware RMW. If the hardware sets a bit while other bit is being cleared through RMW, the writeback clears the newly set bit, too. The following scheme may perhaps illustrate this better on the case of TIM_SR register (which is c0):


The write from BB's internal register clears unexpectedly the CC2 interrupt flag. I made up the particular numbers - I don't know what will be the latencies exactly, so the "sweet spot" for the bitbanding write instruction timing for the problem to occur will be most likely different from 30. Note, that even then the CC interrupt *will* happen as the signal has already started to been passed to NVIC; except that in that ISR, when checking for interrupt source, none will be found.


I tried to visualize this risk in a simple example (to be compiled with augmented device headers) for the 'L476 DISCO. The whole system is run on a slow system clock, MSI set to 100kHz, so that the result is visible on blinking LEDs. There are no AHB/APB prescalers nor prescalers in the timer, as that's the simplest possible setting directly converting to the scheme above. A timer (TIM1) is run with ARR set so that it overflows roughly at a 10Hz rate. There are two interrupts set, one from Update and the other from Capture2. Green LED is toggled at the update rate (in fact it is toggled by hardware through CH1; I might've do it in the Update ISR by software, the result would be the same); red LED is toggled in the CC2 interrupt.To find the "sweet spot", the CC2 event is delayed from the start of cycle more and more in each update cycle, simply by incrementing the CCR2 content (shadowing is switched on for the changing CCR2 to be accepted correctly). The fact that CC2 interrupts are missed because of the bit-banding clearing of Update flag, when the "sweet spot" is reached, is visualized by red LED stopping to toggle from time to time, while green LED toggles continuously:



In the isrCnts struct-array there are counters counting the occurence of Update ISR with Update flag set (.up), occurence of CC ISR itself (.cc) and occurence of that ISR with CC2 flag set (.cc2). This is how the vicinity of the "sweet spot" in this counter looks like:

{up = 28, cc = 28, cc2 = 28}, 
{up = 29, cc = 29, cc2 = 29},
{up = 30, cc = 30, cc2 = 30},
{up = 31, cc = 31, cc2 = 30},
{up = 32, cc = 32, cc2 = 30},
{up = 33, cc = 33, cc2 = 30},
{up = 34, cc = 34, cc2 = 30},
{up = 35, cc = 35, cc2 = 30},
{up = 36, cc = 36, cc2 = 31},
{up = 37, cc = 37, cc2 = 32},
{up = 38, cc = 38, cc2 = 33},



For reasons that aren't entirely clear Element14 decided to flush current inventory of STM32439I-EVAL2 boards last week, so instead of $500+S/H I picked one up for about $80 landed.

I had actually been searching of an MB1063 (640x480) panel to use on my H743I-EVAL, which I'd picked up with an MB1166 DSI panel, and tested with an MB1046 (480x272) I'd gotten with an earlier F439I-EVAL.


In previous experiments I've had the DSI-HDMI adapter on the F769I-DISCO driving an 640x480 and 800x600 screen, which seems to be about the ceiling for the ADV7533 with 2-lanes, and learned that with very high video bandwidth SDIO/SDMMC writes would fail sooner or later if actually tested or mildly stressed. It turns out that POLLED MODE gets a TX UNDERRUN error even at nominal 25 MHz bus speeds, and more so if you do it on fast cards at higher rates. You have to use DMA, and you need to port most of the examples to use DMA mode because they are using POLLED mode. Not really run into this before because a) most of my stuff in recent years is head-less, and b) my SPL implementation uses DMA, and I have data loggers running for months on end without issues. The same app would die in under and hour on the F769I-DISCO, mainly because I would close a file and open a new one at the top of each hour, and failures earlier finally catch up with FatFs to the point it stops working properly.


Anyway, with a new STM32(F)439I-EVAL2 hand I decided to see how HAL would work. Turns out the MB1063 with a 25.17 MHz pixel clock can be broken in under a minute.


I have RS232 and SWV channels outputting diagnostics, I instrument SD_write() to chirp on failure. This doesn't act as a drag on bench marking, but does permit me to understand when SDIO/FATFS comes off the rails.


DRESULT SD_write(BYTE lun, const BYTE *buff, DWORD sector, UINT count)

    if (res!= RES_OK) printf("W %9d %3d %d (%08X)\n", sector, count, res, uSdHandle.ErrorCode);
  return res;


I then stress FatFs by writing a 650MB file of pseudo-random data, very easy to generate very long, complex, and yet reproducible data patterns which test the integrity of the file system and the media. I don't find writing the same dull 0x00 or 0xFF data pattern to every cluster on the media a helpful way to test things thoroughly, inducing failure quickly and obviously saves a lot of time in these experiments. The key with block storage devices is that they give you back what you put into them.

In STM32, there is often a shortage of SPI modules, while USARTs/UARTs are relatively abundant. For example, the 'F20x/40x line has 3 SPIs and 4 USARTs + 2 UARTs. SPIs can also act as I2S, so in applications, where I2S are needed together with some high-speed SPI, it's not uncommon to run out of SPIs.


SPIs are, among other things, often used in master mode to control shift-registers, or shift-register-like peripherals. This is where USARTs in synchronous mode can come to help out.


The setup is relatively trivial: in GPIO, besides USART_RX and USART_TX a third pin has to be set to appropriate AF for USART_CK. In USART itself, baudrate has to be set in USART_BRR; and as a last step of setup TE/RE as appropriate, and UE, in USART_CR1 - these steps are identical to setting up the USART for UART. Don't bother setting parity etc. - these may work just as with USART, but it would be surprising if any shift-register-like peripheral would need them.  In USART_CR3, set DMAT/DMAR as appropriate, if you intend to use DMA - just as with UART.


The synchronous-specific part is in USART_CR2: CLKEN has to be set to enable the synchronous mode and the clock output onto USART_CK pin. CPOL/CPHA has to be adjusted as needed - they work exactly as in "normal" SPI, and there's even the timing chart in the USART synchronous mode subchapter of USART chapter in RM, if you are unsure which to choose.  There's a gotcha in form of LBCL bit - I fail to see any merit in not having this bit set.


So, an example setup of a transmitting-only USART used to control an OLED display with SPI-like interface may look like

#define OR |
OLED_SPI_USART->BRR = 16;  // ----> 2.625MHz
    OR ( 0                       * USART_CR2_ADD_0    )  /* Address of the USART node */
    OR ( 0                       * USART_CR2_LBDL     )  /* LIN Break Detection Length */
    OR ( 0                       * USART_CR2_LBDIE    )  /* LIN Break Detection Interrupt Enable */
    OR ( 1                       * USART_CR2_LBCL     )  /* Last Bit Clock pulse */
    OR ( 0                       * USART_CR2_CPHA     )  /* Clock Phase */
    OR ( 0                       * USART_CR2_CPOL     )  /* Clock Polarity */
    OR ( 1                       * USART_CR2_CLKEN    )  /* Clock Enable */
    OR ( USART_CR2_STOP__1_BIT   * USART_CR2_STOP_0   )  /* Bit 0 */
    OR ( 0                       * USART_CR2_LINEN    )  /* LIN mode enable */
    OR ( 0                       * USART_CR1_SBK      )  /* Send Break */
    OR ( 0                       * USART_CR1_RWU      )  /* Receiver wakeup */
    OR ( 0                       * USART_CR1_RE       )  /* Receiver Enable */
    OR ( 1                       * USART_CR1_TE       )  /* Transmitter Enable */
    OR ( 0                       * USART_CR1_IDLEIE   )  /* IDLE Interrupt Enable */
    OR ( 0                       * USART_CR1_RXNEIE   )  /* RXNE Interrupt Enable */
    OR ( 0                       * USART_CR1_TCIE     )  /* Transmission Complete Interrupt Enable */
    OR ( 0                       * USART_CR1_TXEIE    )  /* Transmitter Empty Interrupt Enable */
    OR ( 0                       * USART_CR1_PEIE     )  /* PE Interrupt Enable */
    OR ( 0                       * USART_CR1_PS       )  /* Parity Selection - 0 = even, 1 = odd */
    OR ( 0                       * USART_CR1_PCE      )  /* Parity Control Enable */
    OR ( 0                       * USART_CR1_WAKE     )  /* Wakeup method - 0 = Idle Line, 1 = Address Mark */
    OR ( 0                       * USART_CR1_M        )  /* Word length - 0 = 8-data-bit, 1 = 9-data-bit */
    OR ( 1                       * USART_CR1_UE       )  /* USART Enable */
    OR ( 0                       * USART_CR1_OVER8    )  /* USART Oversampling by 8 enable */
    OR ( 1                       * USART_CR3_DMAT     )  /* DMA Enable Transmitter */


The runtime handling is again exactly just as with normal UART, whether polled, interrupt-driven or DMA-based.



Major differences to SPI and potential gotchas:

  • master only
  • there's no framing signal (NSS)
  • USART transmits LSB-first, only. This may be a major pain for certain chips with SPI-like interfaces, as SPI is usually MSB-first. Thanksfully, ARM has a bitswap instruction with an appropriate function in CMSIS; bitswapping a byte is then (__RBIT(X) >> 24)
  • Baudrate calculation applies just as with normal UART. This limits the achievable maximum data rate, as bit lengths are multiples of 8 input clock (APB clock) periods (if USART_CR1.OVER8 is set - otherwise multiples of 16 APB clocks, of course). Don't have hopes setting a below-1-fractional baudrate divider.
  • Maximum data rate limited further by the fact, that there are gaps between consecutive bytes (the clock is not continuous even with the tightest feeding of the data register). This is caused by the internal logic generating start and stop bits, and the clock is stopped during that time.
  • 8-bit data only. Oh, 9-bit mode probably works, too; but that's not that usual with the SPI-like protocols and chips.



[update] For those not owning a nucleo_f429zi, but willing to reproduce the demo, it's now working with Ethernet over USB. More info bellow


On 2017, I introduced to you zephyr Project, the open source RTOS supported by Linux Foundation.


Today, I’d like to share one part or Zephyr Ecosystem: its Micropython port.

mircopython is an “implementation of Python3,  optimized to run on microcontrollers”.

A scripting language running on a microcontroller may seem weird, for a given task, it will be slower and have a larger footprint than usual C. But they are also quite powerful and easy to learn. Since you don’t need to compile, it allows quick prototyping and hardware evaluation and allows to go  quickly from the idea to the prototype running on your board.


To demonstrate Micropython is powerful, I’d like to show you a http dashboard, running on a STM32F4. It has been developped with help of Paul Sokolovsky, who initiated zephyr port in Micropython.

A sensor shield is plugged onto the nucleo_f429zi board. PC and board are connected via Ethernet to the same local router.

Alternatively, you can also run the demo with a nucleo board with USB enabled and use ethernet over USB. This is for instance working with a nucleo_f412zg with user USB port connected to the PC.

With a browser, I connect to the board that runs a Micropython powered http server. Data collected from the sensor are displayed in the dashboard. Then, embedded java script preforms dynamic rendering of the web page:

  • 2 gauges widget for humidity and temperature (collected by HTS221 sensor)
  • Widget goes red when magnetic field is detected (byLIS3MDL magneto-sensor)
  • Widget reproduces the board movement (thanks to LSM6DS0 accelerometer)


Here is a video (If you're having trouble to open it, copy the link and check it directly on the hosting website):


One nice part of Zephyr is that it provides an abstraction on hardware. Upper API does not depend on SoC. Since Micropython port relies on this upper API, it has no dependency with board. Which means it could run on any board already ported in Zephyr (as long as it has enough memory to sustain micropython binary).


By the way, footprint for this application is 200Kb of Flash and 60Kb of SRAM, so it would run on smaller parts and for instance on STM32F401CC (256K Flash, 64 K RAM). Though, for this exact application you'll need ethernet port...


To be able to run this demo, you need to install zephyr and micropython:


Following are the instructions to reproduce the demo for Ubuntu users. Please refer to each project instructions for use on Windows.


$ cd ~/zephyt-project :

$ source

$ cd ~/micropython/ports/zephyr :

$ make BOARD=nucleo_f429zi flash


$ make BOARD=nucleo_f412zg flash 


In the board console:

>> import dashboard

If using ethernet, connect with your browser to the IP address printed in console:

>>> [net]/dhcpv4] [INF] handle_ack: Received:

Or, in case you're using ethernet over USB, you need to configure your PC to access new  ethernet device (more info here), then connect the board with the address configured ( in my case)





PS: Some hints, if you don’t want to download all branches available in my repo

If you don’t have a zephyr repo yet:

            git clone –b sensor_dashboard_demo –single-branch

If you already have a zephyr repo:

git remote add erwango

git fetch erwango sensor_dashboard_demo

ST has extended the STM32L4 technology providing more performance (up to 120 MHz) , more embedded memory (up to 2 Mbytes of Flash memory and 640 Kbytes of SRAM) and richer graphics and connectivity features while keeping the best‐in‐class ultra‐low‐power capability.

The STM32L4+ series shatters processing capabilities limits in the ultra‐low‐power world by delivering 150 DMIPS/409 CoreMark score while executing from internal Flash memory and by embedding 640 Kbytes SRAM enabling more advanced consumer, medical and industrial low-power applications and devices.

Let discover the amazing powerful of the STM32L4+ Discovery board with 3 Graphics demo already embedded.


If you want to learn more on STM32L4+, visit our STM32 Education web page with free online training!





In some of our STM32 the bootloader (system memory) supports SPI communication to reflash the STM32 internal flash.


Here are some drivers for the Total Phase Aardvark.


Here is an example of usage.


I tested it on a Nucleo L476 and used SPI2 of the STM32 to communicate to the


Note: make sure that your STM32L476 is a revision 4 (rev 3 do not support SPI for bootloader).


Make sure to have BOOT0 connected to VDD (please see jumper below connecting BOOT0 and VDD). This will permit the STM32 to boot in Bootloader mode after reset or power cycling.


Here are my connections to the Aarvark:

On your PC you will need to install CodeBlocks: I recommend that you download CodeBlocks with mingw included(GCC/C++ Compiler), named codeblocks-16.01mingw-setup.exe on the following link :


Then open the project that is attached to this email with CodeBlock:


Make sure that the Aarvark driver is installed correctly.


In the code you will need the following calls in the main functions:


void main (void)



    //initialize Aardvark hardware

    handle = Hardware_Init();


    //Connect to Bootloader

    printf("\n Connecting to BL System Memory ... ");




    printf("\n Writing File to FLASH Memory ... ");

    WriteMemory(handle, 0x08000000,"FileUSER.bin");    // FileUSER.bin must be under project directory

//(0x08000000 is the FLASH address and it could be changed)




Then build the code on CodeBlock


Connect both Nucleo board and Aarvark.


Press reset on the Nucleo board and then press Run on the Codeblocks to run the app which will program the image to the STM32L4.


You can verify that the programming was done correctly by reading the flash using ST-LINK Utility.





This blog is dedicated to the attendees of the "Moving from 8 to 32 bits hands-on" workshops taking place in many cities across Europe through March 2018: 


You can find a nearby city and register using the following link : 


Moving from 8 to 32 bits hands-on workshop 



This is the right place for you to put your comments and feedback on the workshop.

Please specify in the comments where you have attended to the workshop


You can use this place to ask questions you did not ask during the workshop.

Please also specify in the comments where you have attended to the workshop


Best regards,


Thanks to the close collaboration between the stm32duino community and ST, Arduino™ IDE brings simplicity to the STM32 ecosystem of developers.STM32duino

Indeed Arduino software libraries have been ported on top of the STM32 Cube software drivers to enable Arduino developers to take direct benefit from any STM32 mcu (see

The Arduino™ IDE can simply be configured to support boards from each STM32 mcu serie : to help you get started with the Arduino™ IDE on STM32, follow the "Getting Started" wiki (stm32duino wiki:


Only one package to support all STM32 mcu series. See the Latest release to follow the core development.

Non exhaustive list of proposed targets:


Full list is available here: GitHub - stm32duino - boards available 

Compatibility of the core has been extended and allows to benefit of more Arduino Libraries. See Libraries · stm32duino/wiki for further information.


STM32 libraries have been developed to support dedicated hardware features and X-NUCLEO :

  • Ethernet
  • SD
  • ...

So have a look here to see which one are available or thanks the Arduino "Library Manager" using "stm32" to filter your search.


You can find in attachment of this blog post the presentation of the STM32duino workshop done by parata.carlo in Maker Faire Rome 2017. It shows how to start to play with NUCLEO and X-NUCLEO libraries in the Arduino™ IDE.



From STM32F1, the world first 32-bit ARM® Cortex®-M MCU announced on June 11th to STM32H7, the world most efficient MCU released in 2017, the STM32 ARM® Cortex®-M microcontroller has grown into a large product family and a wide ecosystem of tools and software to become a market leader.


Let’s come back on 10 of the main achievements that paved the road to this success through the last 10 years.


  1. From the first 1st 72 Mhz / 20 Kbyte Cortex-M3 up to 400 MHz / 2 Mbytes Cortex-M7 : a 10 years product journey and technological evolution 
  2. A constant evolution of the portfolio to anticipate and match the market expectations in term of embedded features, performance, power consumption and cost. The STM32 family has now more than 10 series and 700 part numbers including the STM32H7, the world’s most efficient MCU in terms of processing performance (it scored 2020 in the EEMBC CoreMarkTM) and the STM32L4, the market leader in terms of power consumption performance (It scored 253 in the EEMBC ULPMark™ (more on ULPBench).
  3. A decade of business records
  4. In 2010, we introduced the first STM32 Discovery Kit (the STM32VLDISCOVERY) an affordable and complete solution for the evaluation of the outstanding capabilities of STM32 MCUs, followed in 2014 by the 1st STM32 Nucleo, the new prototypers’ best friend.  
    We have sold today more than 1 million of these development kits.

  5. In 2014, we launched the STM32Cube, a 100% free software solution that help developers’ live thanks to the combination of
    STM32CubeMX a PC configuration and initialization tool
    - STM32Cube embedded software libraries : the generic embedded software components required to develop an application (HAL, LL, middleware components, and examples).
    With more than 300,000 downloads and with regular releases of new functionalities, STM32Cube is now the pillar of the STM32 Ecosystem.

  6. In 2017 the STM32 development ecosystem was been ranked among the top three in semiconductor companies by EETimes (

  7. We probably have the most complete market portfolio to address the key segments of
    • IoT for Wired and Wireless connected Objects with
      • Ultra Low Power MCUs (wireless)
      • Very high performance MCUs (wired)
      • Graphical User Interface
      • Security 
    • Industrial
      • Safety operation
      • Very High performance
    • Motor Control with expertise for Consumer, Industrial and Appliances
    • Sensing Applications (motion, audio sensing) with a complete development kit offer featuring ST Sensors

  8. We have launched the ST-MCU-FINDER application that enables developers to explore from their mobile device, or from their desktop, the latest microcontrollers as well as hardware and software tools to connect the complete portfolio of STM32 32-bit and STM8 8-bit microcontrollers and development boards.
  9. We have made a longevity commitments for the whole STM32 family, ensuring 10 years of life time for each single part number, which is a strong requirement from the industrial market in particular.

  10. We have introduced the ST Community that enables customers, partners, developers, makers, schools, universities, ST employees and all ST-product enthusiasts to collaborate, connect, communicate, learn and share their insights via a powerful social community application.


While we are increasing our presence in the education, makers and hobbyist communities to make our technology even more accessible, the STM32 product portfolio and ecosystem will continue its evolution to accompany the next technology revolutions and help developers bring innovative applications on the market.


Jump on the fast train of innovation and release you creativity for another 10 years of success with STM32 !




Zephyr Project is an open source project supported by the Linux Foundation. Publicly launched on February 2016, it aims at providing a secure and scalable RTOS for the IoT and fit into smallest memory footprints, which Linux cannot address.


On June 15th, Zephyr OS V1.8 has been released. Among various features, such as Tickeless Kernel or MPU support, Zephyr RTOS supports now 16 different STM32 based boards, mostly ported by community, but also the Disco L475 IoT (B-L475E-IOT01a) promoted by ST as a flagship for IoT applications.


In order to ease adoption and open contributions to a larger community, Zephyr also made recently some changes. Main source tree is now hosted on GitHub to speed up the review and acceptance of incoming pull requests. Besides, Windows environment support has been added recently and you can now work seamlessly on Zephyr on Linux, Mac Os and Windows.


Even though Zephyr takes advantage of the stability of the world famous Wind River's commercial OS VxWorks from which it is derived, it is a young, innovative and ambitious project with a lot of on-going works such as Device Tree support to ease configuration, Javascript runtime support for quick prototyping,  and wide choice of connectivity, starting with BT 5.0.


To make this happen fully, Zephyr community looks forward to your contributions. If you're curious or interested, check Zephyr project website, contribution guidelines and come and enjoy cool wind of open source!



Visit STM32 Education page for Motor Control here. This page is designed to help engineers find the necessary material (hardware, software and documentation) to develop the best solution for their applications by type of motor:

  • Permanent magnet synchronous motors (PMSM) 
  • Brushless DC electric motors (BLDC)
  • Stepper motors

Committed to motor control for more than 20 years, ST offers a complete and wide range of material to help you find the best solution and get your application to market quickly:

  • Software development tools including software development kits (SDKs) and the ST Motor Control Workbench which includes a user-friendly graphical user interface (GUI) and firmware library configurator
  • Firmware libraries and sample application code
  • Motor Control development kits and expansion boards for both control and power management
  • Dedicated documentation: application notes, user manuals, getting started guides and hands-on presentations