Misprint in F4/G4 Programming Manual (Bitband)

flyer31 · ‎2021-07-19

There is a misprint in your PM0214 Programming manual "The Cortex M4 processor" in the chapter chapter 2.2.5 "Bit-Banding", above figure 9. Bit-Banding is nicely described there, but of course all bit addresses have to end by "00"... Your first example address there says "0x23FFFFED" - but this is WRONG, this must please be corrected to "0x23FFFE0". (As your calc formula is correct, this error is quite evident to anybody looking deeper into this ... but anyway quite disturbing for many people I think...).

(in PM0214 Rev10 this is on page 33/262)

flyer31 · ‎2021-07-19

One further point, which is quite important I think:

In this Programming manual it says, that the SRAM Bytes 0...1MByte are supported (Adress range 0x20000000-0x200FFFFF). This is perfectly fine of course for complete STM32G4 RAM range, as STM32G4 anyway has only 128kB RAM - so COMPLETE RAM can be bit-banded, this is very nice for Interrupt-Prgoramming.

But concerning Peripherals the description is a bit strange / ambiguous:

In Programming manual it says, that Peripheral bit-banding region is 0x40000000-0x400FFFFF.
But in STM32G4 many important Peripherals (all GPIO, also ADCs and DACs) are on AHB2 which in addressed at 0x48000000... so OUTSIDE this "Peripheral Bit Banding area" defined in Programming manual ... . But in the Reference manual Rev. 4, chapter 47.9, page 2093/2126 it clealy says that AHB supports bitbanding, so I hope also AHB2? )

If I understand this bit banding concept correctly, the offset for the bit address is always 0x02000000, so (so I assume that the Bitbanding for this AHB2 adress range of 0x48000000... will be 0x4A000000.

So the if I use the following #define to calculate the BB-Adress, I hope all fine, also for AHB2 in STM32G4:

#define BITNR(BITMASK)       __clz(__rbit(BITMASK))
typedef unsigned int* PBB;
#define BB_OFFSET 	0x02000000
#define BB_SRAMMASK 0xF0000000
#define BITADR_BB(Var, BitMask)       \
           ((PBB) ( ( ((uint32_t)&(Var)) & BB_SRAMMASK) | BB_OFFSET | (( ((uint32_t)&(Var)) & (BB_OFFSET-1)) << 5) | (BITNR(BitMask) << 2)) )
#define ClrBitMskAtomic( Var, BitMask)    \
           (*BITADR_BB(Var,BitMask) = 0)
#define SetBitMskAtomic( Var, BitMask)     \
           (*BITADR_BB(Var,BitMask) = 1)
#define SetBitMskBoolAtomic( Var, BitMask, bo)     \
           (*BITADR_BB(Var,BitMask) = bo)
#define GetBitMskAtomic(Var, BitMask)       \
           (*BITADR_BB(Var,BitMask))

(for such BB-Pointer I defined a separate typedef PBB, to make this more clear in cases that I pass such a bitband pointer address to some function).

Could somebody from STM32 please best confirm, that this will works also for AHB2 peripherals in STM32G4?

ChahinezC · ‎2021-09-01

Hello @flyer31,

I will check and get back to you.

Chahinez.

ChahinezC · ‎2021-09-01

The bit-banding is implemented only for given SRAM and Peripherals regions (two regions only). Outside those regions the bit-banding is not implemented. This is Cortex-M4 property. Peripherals address must be inside allowed address space which can be aliased by bit banding (1MB above 0x4000 0000 => from 0x4000 0000 to 0x4010 0000). This is written in M4 programming manual and also in ARM documentation on web.

In the reference manual, is written a note inside the chapter "AHB-AP (AHB access port)" : Bitband transactions are supported. But this means that by debugging you can generate to AHB bus also bit-banded operation - the same operation like CPU performs if bit-banding is programmed in user code (AHB to APB bridge).

Result is that AHB peripherals (at address 0x4800 0000) cannot be accessed by bit-banding.

Generally: bit-banding is sometimes useful - but it also uses read-modify-write operation in background (same as "standard" code). So - access is not faster. But usage of bit-banding uses less code size.

Chahinez.

flyer31 · ‎2021-09-01

Hi ChahinezC,

thank you for answer.

But are you sure that you are correct here?

As you write yourself, bit banding is for SRAM AND Peripherals region. So it defenitely WORKS for SRAM. In first post example (which is take out of your programming manual G4 2.2.5, and this is for address 0x0x200FFFFF: This the last Byte of 1MB RAM ... do not ask me why they took this somehow "bizarre address" in this example, but there are some M4 controllers who have 1MB RAM, and it for sure should work for such a controller ... (of course generally it would be much nicer to use some more "normal SRAM address here", e. g. byte at end of 64k RAM range, which then e. g. would be address 0x2000FFFF).
In the second example I tried SRAM address 0x48000000, which cleary IS inside peripheral block, but outside the two regions "0x4000000-0x400FFFFF" and specified in table 15 of 2.2.5... but as I tried this, bit banding also WORKS in this 0x4800000 address range.
I do not understand why you mean that bit band access to set bits is read-modify-write. If you use a bit-banded address to set bits or to clear bits, you clearly do this in ONE ARM ASSEMBLER STR command, so this is clearly NOT read-modify-write ??? (or how could such a ONE ARM ASSEMBLER command be "read-modify-write" - this would somehow be a "strange miracle"? You would need such atomic bit access of course not in "normal program flow", but for interrupt programming this is extremely nice and useful, especially e. g. for fast interrupts as they occur in typical motor drive applications - in such cases it can be VERY nice to have atomic bit access to some of the timer register bits... .

Tesla DeLorean · ‎2021-09-01

3) It's a RMW at the bus level, the single instruction performs a compound interaction on the bus. Basically the core sequences AHB transactions to fain bit level interactions.

On peripherals this should NOT be considered ATOMIC, for example actions on the TIM SR can be a hazard. Had some discussions on this a decade+ ago.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

waclawek.jan · ‎2021-09-01

@ChahinezC yes, the 0x23FFFFED is obviously a rudimentary typo in PM0214 (as of rev.10, being there ever since), as flyer31 said in his original post, D being typed instead of 0
@flyer31 I tried this, bit banding also WORKS in this 0x4800000 address range How did you try, exactly?
tried to write up this particular RMW issue here not sure it makes things clearer but I tried. @ChahinezC , BB write may be not faster from point of view of the peripheral, but from processor point of view it's a single write to write buffer which allows program to proceed while the memory subsystem performs the RMW in parallel, so it speeds up the program. @flyer31 Also note that bitbanding is an optional component (i.e. not even all Cortex-M3/M4-based mcus have it, also AFAIK all M3/M4-based STM32s do have it), and it's not present in M0/M0+/M7, which may be an unpleasant surprise if you want to migrate. Disclaimer: I do like and use bitbanding in my programs extensively.

JW

flyer31 · ‎2021-09-01

Thank you, this is very helpful, especially your RMW explanation in 3 with this link, this is interesting.

But I typically would use it for more "general conf bits" in timer applications, which are clearly NOT accessed by hardware. So in this case bit banding WILL be clearly very useful. Of course you are right about "bad surprises" when migrating to M7 ... I had this already... but in M7 there are quite a bit more "bit set / bit clear 32bit registers" which I then can use alternatively, I came around this in M7 so far... .

For interrupt programming also the bit-banding of course is useful for interrupt programming of user-defined memory SRAM bits... and this then does not touch this hardware RMW issue in any way (as long as I do not use parallel memory DMA or other nasty things - but this is something I would not try on such basic configuration memory anyway...).

So for me bit-banding in M4 clearly keeps very nice and interesting, especially for STM32G4 applications, and this STM32G473 really is somehow a super-great and flexible chip.

Concerning your question at 2: I just used my "standard bit band macros", which I wrote int he code part of my second post here... there you see how the address is calculated, and this works nicely "for God sake". (I just would like to get confirmation by ST that this DOES work in principle and ist not some "strange lucky event"). (see my macro BITADR_BB in this second post here, there you see the calc formula to get the BB address, and this nicely also works for the range 0x48000000...).

Tesla DeLorean · ‎2021-09-01

I think the power of it was significantly over-sold in the "This is a great feature of the core.." slide deck, and subsequent iterations of the core the value was diminished because enough people found the corner cases and it just complicated things (pipeline, in-order-completion) and spawned a lot of hazards which ARM just doesn't like to add transistors/logic to remedy.

Back in the day this also came up with the NVIC / tail-chaining hazard, where if the write buffers hadn't cleared and the pipeline allowed, an interrupt would spuriously re-enter. Basically if you cleared the interrupt source too late in the handler this didn't reflect back to the NVIC/CPU when the BX LR to the call-gate executed and it tail-chained immediately to what it thought was the still pending IRQ, clock cycles later this would clear, and a read back of the peripheral register would indicate nothing was pending. This is generally why it's a good policy to check the interrupting source to validate it has something pending rather than plough forward into servicing.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

flyer31 · ‎2021-09-01

Thank you, yes this is quite clear to me, that I would not use it to a "hardware-writable" register as the SR register... .

Mainly two possible applications come to my mind immediately:

changing somehow the output enable bits CCxNe/CCxE in a TIMx_CCER register in some interrupt routine (which then also might happen in normal program flow and must not interfere / must be atomic)... this then would be in peripheral RAM range 0x40000000...
changing any "own" RAM user bits in some "user status register" (of course marked volatile in C code)... this then would be in my own SRAM range 0x20000000...

With this I am perfectly happy, I do not want any miracles... .