Bit-banding is dangerous when used on hardware-set status registers

Blog Post created by waclawek.jan on Mar 18, 2018

On the ST's FAQ page, the following could be read some time ago:

Use Cortex-M3 Bit-banding feature for interrupt clearing since it is an atomic operation and NVIC pending interrupts will be ignored during this operation, however Read-Modify-Write is not.

Now this FAQ page is gone already, but the quote is perpetuated on the web and in some materials. The problem with it is, that it is not entirely true. It's true that NVIC pending interrups will be ignored during bit-banding, but it's not true that it's a good method to clear interrupts. In fact, it's dangerous, don't do it unless you exactly know what are you doing.


So let's get this straight.


Bit-banding is a feature of ARM Cortex-M3 and Cortex-M4 processors, allowing certain portions of memory space (including a portion which is usually mapped to peripherals) to be accessed in bit-wise manner. This feature was introduced to attract programmers used to bit-addressable memory from other mcu architectures, most prominently the x51. It is present only in Cortex-M3 and M4, i.e. not present in M0 and M0+, nor in M7. Even in M3 and M4 it is an optional feature and implementers (semiconductor manufacturers) may chose whether to implement it or not - ST's implementation always do implement it, i.e. bit-banding is available on the 'F1, 'F3, 'F4, and L1 and 'L4 subfamilies.


The bit-wise access is realized through a respective alias region, where every single bit in the original memory./peripheral address space has assigned a corresponding word (32-bits). Reading that word return 0x00000000 or 0x00000001, depending on what is the state of the corresponding bit; and when writing to that word, the lowermost bit will be actually written into the original bit, not affecting other bits in the word containing the original bit.


This is how things look like from the processor's (and thus the programmer's) point of view. But to understand, what is going on, we need to get down to the nasty details of how is this feature implemented in hardware.


The truth is, that contrary to x51, there is no special hardware allowing to flip individual bits. The processor is still interfaced through a 32-bit bus matrix to 32-bit memories and peripherals, so it can only manipulate data in 32-bit chunks (more precisely, it can also do it in 8-bit and 16-bit chunks, if the attached memory or peripheral implements the byte-select signals of the AHB bus; but never in single-bit). So the trick lies in a simple attachment between the processor's S-port and the bus matrix: when the processor attempts to read from the bit-addressable area, the attachment converts the bit-address to the basic word's address, reads from the bus matrix at that address, takes the read word and rotates it the required number of bits and submits that as result to processor (the processor is stalled by the attachment all that time). Writing is slightly more tricky: the attachment issues first a read on the real word address, then takes the read data, masks the required bit, replaces it with the written one, and then performs the writeback through the bus matrix.


So, a bit-banding write is in fact a read-modify-write operation on a whole 32-bit word, from the point of view of the attached memory or peripheral. During this time, the AHB bus is locked down (there's a special signal for that in the bus), so no other master (such as DMA) can interfere. The processor is left to run until it attempts to access the S-port again, when it is stalled until the operation ends.


This means, atomicity of the operation is preserved, as far as the program is concerned (in this the quote is true); and also the possibility of other busmasters interfering has been taken care of. So what could possibly go wrong?


The peripheral itself.


In many peripherals, there are status words containing individual status bits indicating the states through which the internal state machine of the peripheral has passed. As these are set by hardware, they are usually of the clear-by-writing-1 (c1) or clear-by-writing-0 type (c0) - in the former, writing 1 clears such bit but writing 0 leaves it unaffected (and in the latter it's exactly the opposite), so the proper operation to clear certain bits in such register is to write a mask, not to read-modify-write. And this applies not only to software RMW (i.e. register |= mask or register &= ~mask, depending on whether it's c1 or c0 type which many users already know is no-no), but also to the hardware RMW. If the hardware sets a bit while other bit is being cleared through RMW, the writeback clears the newly set bit, too. The following scheme may perhaps illustrate this better on the case of TIM_SR register (which is c0):


The write from BB's internal register clears unexpectedly the CC2 interrupt flag. I made up the particular numbers - I don't know what will be the latencies exactly, so the "sweet spot" for the bitbanding write instruction timing for the problem to occur will be most likely different from 30. Note, that even then the CC interrupt *will* happen as the signal has already started to been passed to NVIC; except that in that ISR, when checking for interrupt source, none will be found.


I tried to visualize this risk in a simple example (to be compiled with augmented device headers) for the 'L476 DISCO. The whole system is run on a slow system clock, MSI set to 100kHz, so that the result is visible on blinking LEDs. There are no AHB/APB prescalers nor prescalers in the timer, as that's the simplest possible setting directly converting to the scheme above. A timer (TIM1) is run with ARR set so that it overflows roughly at a 10Hz rate. There are two interrupts set, one from Update and the other from Capture2. Green LED is toggled at the update rate (in fact it is toggled by hardware through CH1; I might've do it in the Update ISR by software, the result would be the same); red LED is toggled in the CC2 interrupt.To find the "sweet spot", the CC2 event is delayed from the start of cycle more and more in each update cycle, simply by incrementing the CCR2 content (shadowing is switched on for the changing CCR2 to be accepted correctly). The fact that CC2 interrupts are missed because of the bit-banding clearing of Update flag, when the "sweet spot" is reached, is visualized by red LED stopping to toggle from time to time, while green LED toggles continuously:



In the isrCnts struct-array there are counters counting the occurence of Update ISR with Update flag set (.up), occurence of CC ISR itself (.cc) and occurence of that ISR with CC2 flag set (.cc2). This is how the vicinity of the "sweet spot" in this counter looks like:

{up = 28, cc = 28, cc2 = 28}, 
{up = 29, cc = 29, cc2 = 29},
{up = 30, cc = 30, cc2 = 30},
{up = 31, cc = 31, cc2 = 30},
{up = 32, cc = 32, cc2 = 30},
{up = 33, cc = 33, cc2 = 30},
{up = 34, cc = 34, cc2 = 30},
{up = 35, cc = 35, cc2 = 30},
{up = 36, cc = 36, cc2 = 31},
{up = 37, cc = 37, cc2 = 32},
{up = 38, cc = 38, cc2 = 33},