USART help - basic IRQ handler

john · ‎2008-08-22

Posted on August 23, 2008 at 07:34

joseph239955 · ‎2011-05-17

Posted on May 17, 2011 at 12:38

Hi STOne-32,

I am not sure using DSB will be able to solve this issue.

DSB instruction wait until the Cortex-M3 interfaces

finish the outstanding operations. However, the

peripheral bus bridge in STM32 contain a write buffer,

which is not visible to the core. So for the core,

all the outstanding transfers are completed, while

the AHB to APB bridge has it own write buffer that

is still doing the write and the core is not aware of this.

So the DSB could be completed, and the write buffer

still carrying out the last write. I might be wrong

on this one - I don't know the design details of the

AHB to APB bridge used by ST in STM32, but I would say

doing a read after the write is the safest solution.

regards,

Joseph

lanchon · ‎2011-05-17

Posted on May 17, 2011 at 12:38

I don't really know much about memory models, and I don't know the first thing about the cortex-m3; I use C exclusively. I didn't know the m3 had mem barriers and a defined mem model, I thought it was a ''sequential consistency assumed'' kind of device since ST's manual doesn't mention a word.

> - all writes to the same location are serialized and

> - a read after a write will get the written value

this is too weak of a model to be useful, it doesn't order accesses to different registers of a peripheral. complying to this weak model would require paying the price of lots of (unnecessarily I think) mem barriers. and probably all the STM32 code I've seen to date would be incorrect.

> the peripheral bus bridge in STM32 contain a write buffer,

> which is not visible to the core.

why do you suppose the AHB/APB bridges are oblivious to the barriers? is the AHB unable to signal them? do you have more info about the bridges?

so a barrier wouldn't work to flush a pending write and a bogus read would. and I assume a bogus write followed by barrier would also work, right?

> a syncronisation barrier should be the fastest solution because there is no device access (read or write) necessary

this is assuming that a barrier works (flushes the AHB/APB bridges): I think your statement may not be true. I assume the barrier would flush all bridges, while an access would only flush the route to the peripheral. if both APBs have pending writes and one of the buses is configured with a slow clock, it could be faster to flush the write on the fast bus by using a bogus access instead of a barrier.

anyway, this is all theoretical since the GCC version of the C mem barrier function in the fwlib can't be inlined. so unless using a very slow peripheral clock, a bogus access would probably be faster from C.

thanks for all the info!

joseph239955 · ‎2011-05-17

Posted on May 17, 2011 at 12:38

Hi Lanchon,

AHB interface do not provide any information of write buffer status.

As a result the DSB mechanism would not know the external buffer is still going. (Inside Cortex-M3, there is additional signal from write buffer to tell the processor the write buffer status).

A bogus write should work if the buffer on the peripheral bridge is only one entry deep. Here I assumed that the interrupt signal from the peripheral go low as soon the write take place, and the propagation delay of this signal to Cortex-M3 is no more than the propagation delay of the peripheral bridge.

16-32micros · ‎2011-05-17

Posted on May 17, 2011 at 12:38

Hi Joseph,

Reading back the value after write is the the safest procedure to prevent such scenario of interrupts race condition when Peripherals clocks are in same clock domain (I mean synchronous) and are very slow compared to AHB and System-Bus which is for some xx Mhz.

We have implemented a synchronization mechanism with some Critical system peripherals which have asynchronous clokcs (RTC for instance 32KHz) when we have to wait a special bit before clearing the flags.

However, it might be very interesting if it is the case to have the Cortex-M3 implementing in the NVIC path some feedbacks signals coming from external(like AHB/APBs bridges) when the system/CPU is handling exceptions. I will check with our designers how this was implemented and let you know.

Cheers,

STOne-32.

tibo · ‎2011-05-17

Posted on May 17, 2011 at 12:38

Hi all,

> I am not sure using DSB will be able to solve this issue.

> DSB instruction wait until the Cortex-M3 interfaces

> finish the outstanding operations. However, the

> peripheral bus bridge in STM32 contain a write buffer,

> which is not visible to the core. So for the core,

> all the outstanding transfers are completed, while

> the AHB to APB bridge has it own write buffer that

> is still doing the write and the core is not aware of this.

> So the DSB could be completed, and the write buffer

> still carrying out the last write.

Joseph, in your book on page 94 you mentioned the signal HMASTLOCK for

bit-band access. Here is my view: if the write buffer is not flushed after

the HMASTLOCK is released by the CPU after a read-modify-write access,

then it is possible that the device change the value during the time between

the ''release the lock'' and the ''write the new value to the device'' (because

the value is only in the buffer and not written to the device and the CPU has

no knowledge about the buffer and as a consequence releases the lock after

writing to the buffer). If this would be true, the CPU can not guarantee, that

bit-banding is always atomic for peripherals. So, I suppose that the CPU is

able to signal (to flush) the buffer (and does it during bit-banding) and she

should do it during the DSB too. Do you agree?

> don't really know much about memory models, and I don't know the first thing

> about the cortex-m3; I use C exclusively. I didn't know the m3 had mem barriers

> and a defined mem model, I thought it was a ''sequential consistency assumed''

> kind of device since ST's manual doesn't mention a word.

The model of the CM3 is a parallel world (you have at a minimum two parallel

working ''processors'': CPU and DMA). You can see it, if you have a look on the

complex Bus-Matrix with the parallel paths, the complex Semaphore-Handling,

etc. So, I think to ''think'' only in a sequential processors can lead in very rare and

frustrating (because difficult to find) race conditionsâ€¦

> > - all writes to the same location are serialized and

> > - a read after a write will get the written value

>

> this is too weak of a model to be useful, it doesn't order accesses to different

> registers of a peripheral. complying to this weak model would require paying

> the price of lots of (unnecessarily I think) mem barriers. and probably all the

> STM32 code I've seen to date would be incorrect.

I don't have the exact memory model (I have only the documents on the Internet),

but from the description ([1] page A3-23ff) you can follow, that writes are

never reordered, so I think it is not necessary to ''sync'' after every write, but to

''sync'' after the last write, if this is a critical situation. E.g. if you configure a UART,

you can do everything in a normal way and the last step has to be to switch the

UART ''on'' (or to write data) and then to use a ''sync'', if it is critical situation.

Perhaps for the most situations this is a little bit a theoretical discussion, because it

is really rare, but if you can not tolerate 1 wrong byte on - say 1.000.000 correct

bytes, it is essential (and everyone knows M. law â€¦).

> > a synchronisation barrier should be the fastest solution because there is no device

> > access (read or write) necessary

>

> I think your statement may not be true. I assume the barrier would flush all bridges,

> while an access would only flush the route to the peripheral. if both APBs have

> pending writes and one of the buses is configured with a slow clock, it could be

> faster to flush the write on the fast bus by using a bogus access instead of a barrier.

Yes, you are right, I think so too.

> anyway, this is all theoretical since the GCC version of the C mem barrier function

> in the fwlib can't be inlined. so unless using a very slow peripheral clock, a bogus

> access would probably be faster from C.

You can use: #define __DSB () __asm__(''dsb'').

[1] ARMv7-M Architecture Application Level Reference Manual

joseph239955 · ‎2011-05-17

Posted on May 17, 2011 at 12:38

Hi tibo,

HMASTLOCK is only available in AHB, it is not present in APB (peripheral bus). So the peripheral will not see the lock signal, and HMASTLOCK cannot stop the APB peripheral from changing it values during lock transfers.

But when you are using it with bit-band on a GPIO port, with some pins are input and some are output, this race condition does not cause any problem. If some of the inputs change state during the read-modify-write, write to the bits configured for input will have no effect, and the bits configured for output will only have one bit changed.

In general, peripheral designers need to consider these type of issues. If a control register bit can be changed by both program accesses and by hardware, it should be put as a separated register. Otherwise it is too easy to make mistake in programming and lose the information, and the peripheral cannot be reused on Cortex-M1 or ARM7TDMI, as bit-band is not available.

For status bits(flags) which is writable can be designed to be cleared only if you write 1 to it. In very rare cases, a peripheral might have to be developed with AHB interface so that it can see HMASTLOCK signal. The processor can only guarantee the operation is atomic at the bus interface level, the rest of the system also need to be designed correctly to make atomic operation possible.

The term ''flush'' might be a bit misleading in this context. What DSB does is to ''wait'' until the bus interface (including internal write buffer) inside Cortex-M3 finish it operation, before starting the next instruction. But neither bit-band or DSB can detect if the write buffer on the peripheral bus outside the core is still active.

[ This message was edited by: joseph.yiu on 19-08-2008 07:34 ]

picguy · ‎2011-05-17

Posted on May 17, 2011 at 12:38

I know this is not STM32 specific but it shows one very good way of doing what you want. Specifically separate logical and physical I/O. I could write it for you but then I would have to send a bill. So just read

http://www.hmtown.com/fifo.htm

lanchon · ‎2011-05-17

Posted on May 17, 2011 at 12:38

thanks guys.

> if the write buffer is not flushed after

> the HMASTLOCK is released by the CPU after a read-modify-write access,

> [...] the CPU can not guarantee, that bit-banding is always atomic for peripherals.

I think it can cause 1) async-changing bits shouldn't be writeable like joseph said, and 2) since APBs are single ''master'' (client is the AHB) accesses are ordered and APBs don't need HMASTLOCK to behave atomically; so no source of changes would break a RMW action.

> it is not necessary to ''sync'' after every write, but to

> ''sync'' after the last write

that's the abstraction I'm using now, just that writes to the two APBs could be independently delayed and that APB reads have flushing semantics. I'm not using DMA for now and I'm seeing the core as sequential. I want to learn as little as possible about the CM3; I haven't read the core manual and fully ignore the characteristics of the buses. I might get burnt though...

> You can use: #define __DSB () __asm__(''dsb'').

well I'd like to avoid something like that. I don't know much about GCC and in particular I don't know what GCC would do with that statement. could loads or stores be moved around it? what about volatile accesses? will the asm block disable optimizations for the containing function? or the other way around, will my asm be optimized? will it be treated as a barrier of some kind?

again, I'd rather stay ignorant of the core and GCC as much as possible and use standard C++. anyway, joseph says dsb won't flush APB writes, so not much use...

> HMASTLOCK is only available in AHB, it is not present in APB (peripheral bus).

joseph, did you obtain this info from publicly available docs? isn't the APB an ST-proprietary bus? or is it defined by arm?

BTW, IMHO anything that departs from the ''what you don't know won't hurt you'' principle should be clearly stated in the STM32 ref manual. it's not reasonable to expect users to familiarize with the core, that's why abstractions like C are used.

joseph239955 · ‎2011-05-17

Posted on May 17, 2011 at 12:38

Hi lanchon,

Yes, since APB is single master, there is no need for a lock signal.

In the STM32 case, the HMASTLOCK from Cortex-M3 should stop the DMA controller from inserting transfers between locked transfers.

APB is part of AMBA specification from ARM.

You can find information on

http://www.arm.com/products/solutions/amba2overview.html

and specification on (require registration):

http://infocenter.arm.com/help/topic/com.arm.doc.set.amba/index.html

lanchon · ‎2011-05-17

Posted on May 17, 2011 at 12:38

Quote:

picguy:

I know this is not STM32 specific but it shows one very good way of doing what you want. [...]

http://www.hmtown.com/fifo.htm

this link proposes incorrect code. it says:

> Pointer and data must be updated in the correct order. This is always true in interruptible code and is good discipline in an ISR. In our getData() function the code should look something like this:

char getData() {

char temp;

if (in == out) {whatever;}

temp = *in; // The order here

in+++; // is important

return temp;

{

that's a mistake, the compiler and the processor are free to reorder things the way they like. C only guarantees that things happen as if by sequential execution, but nothing specific can be said about the state of memory when you interrupt code. getData() needs to synchronize with the ISR using some kind of nonportable (non-C) functionality.