cancel
Showing results for 
Search instead for 
Did you mean: 

Erratum "Delay after an RCC peripheral clock enabling" - suggested workaround fails to work

[This is a reconstruction of a thread I've started on Apr 7, 2017, but got lost in the two forum software transitions since then.]

In STM32F40x and STM32F41x Errata sheet  DocID022183 Rev 8, 2.1.13  Delay after an RCC peripheral clock enabling, there are three workarounds suggested.

The third one is:
3.  Or simply insert a dummy read operation from the corresponding register just after enabling the peripheral clock.

It is not clear, what is "corresponding register" - is it the register of RCC, or the register of peripheral which clock is being enabled?

However, regardless of this, it does not appear to be working in either case. In the following code (with all clocks settings at their reset defaults):

 

8000214:    2201          movs    r2, #1 
8000216:    6b18          ldr    r0, [r3, #48]    ; 0x30 
8000218:    f440 5080     orr.w    r0, r0, #4096    ; 0x1000 
800021c:    6318          str    r0, [r3, #48]    ; 0x30 
800021e:    6b18          ldr    r0, [r3, #48]    ; 0x30  <------------------ readback 
8000220:    600a          str    r2, [r1, #0] 
8000222:    2000          movs    r0, #0 
8000224:    bf00          nop 
8000226:    bf00          nop

 

 where r3 is preloaded with RCC address and r1 with CRC address, the readback is performed just after the write to RCC, and subsequently the now-presumably-enabled CRC data register is written, which should result in CRC for one word containing 0x00000001 being calculated, so CRC->DR should read as 0xC3C5C0CC. However, placing a breakpoint at the last nop and reading out CRC->DR shows that it is at 0xFFFFFFFF, so the previous write to CRC->DR has been ignored.

I tried also

 

 800021e:    6808          ldr    r0, [r1, #0]  <------------------ readback

 

with the same result.

The two nops or one dsb at that place (i.e. the two other suggested workarounds) work as expected, CRC->DR contains 0xC3C5C0CC at the breakpoint.

ST, please comment.

JW

PS. I experimented with this as it appears to have a similar timing relationship as the CRC-reset-to-data issue I reported here {link leads to my own thread titled "CRC data ignored after CRC Reset" which is another lost thread, to which a later thread links, too}{edit - original thread reconstructed}.
---

{Clive Two.Zero} {Apr 7, 2017 5:15 PM}
Perhaps it is not exactly the same errata, but one specific to the logic of the CRC peripheral. The original errata is one where a pipelined back-to-back write via the write buffer (enabling clock, writing peripheral) provided no margin.

Adding the read of the RCC register after the write gave the peripheral additional clocks to recognize it had just come out of a reset state with clocks for the first time.

The CRC peripheral clearly seems to need more cycles of its state machine to get its act together. It speaks to a deficiency in the CRC logic, because it should be possible to do single cycle computation (in HW its way faster than a 32-bit full-adder)
---

{waclawek.jan} @ {Clive Two.Zero} on {Apr 8, 2017 12:17 PM}
Perhaps it is not exactly the same errata, but one specific to the logic of the CRC peripheral.

Agree. I will retry on Monday with GPIO.

JW

---

{Clive Two.Zero} @ {waclawek.jan} on {Apr 8, 2017 6:15 PM}
I generally try to push the clock stuff up earlier in the initialization. Have you profiled the power consumption of the CRC peripheral? ie Clock disabled, vs clock enabled but peripheral unused.

The APB/AHB clock hazard is more systemic, a bit like the Pipeline/Write-Buffer issue getting IRQ clearing quickly enough to propagate to the Tail-Chaining logic.

ARM has generally eschewed putting logic in to hide all hazards. Intel erects a lot of cliff top fencing to save every lemming.

---

{waclawek.jan} @ {Clive Two.Zero} on {Apr 8, 2017 8:53 PM}
> I generally try to push the clock stuff up earlier in the initialization.

Me too.

> Have you profiled the power consumption of the CRC peripheral?

No. As I've said above, I was trying to find a simple, fast but safe solution to the {CRC problem} I discovered earlier (and it was lots of fun finding, as it was the compiler which kept putting the CRC-reset and CRC-data-write instructions a random number of other instructions apart upon attempts to add debugging code around). Yes a number of nops appear to solve the problem but then I'd have to resort to inline assembler, which is not my forte - gcc is kind enough to reorder NOPs inserted as standalone/macros, too.

I looked at the errata as a source of inspiration.

> ARM has generally eschewed putting logic in to hide all hazards. Intel erects a lot of cliff top fencing to save every lemming.

I'd say in this case it's ST's guilt. As in the case of consecutive writes to CRC->DR, the peripherals (including RCC) which require guard times may insert a sufficient number of waitstates, or provide a readback indicator. At the end of the day, I am absolutely content with things being dangerous, as long as they are clearly and concisely documented, possibly with an accompanying simple and clean program example - which unfortunately is far from being the case.

Jan

---

{waclawek.jan} @ {waclawek.jan} on {Apr 10, 2017 10:02 AM}
Confirming the same problem still pertains when writing to GPIO (an APB-related readback might make a difference in case of enabling APB-based peripherals but I am not interested in experimenting further with what is ST's task to ensure. [EDIT] OK so I tried with APB1/TIM2, and the problem did not show up at all, i.e. write to TIM2_CR1 after immediately after write to RCC_APB1ENR was successful, no matter what the AHB/APB divider was.)

Interestingly, when write buffer on processor-to-bus interface is disabled, one nop is sufficient, but ldr still not (IMO this is related to processor giving up/reacquiring the bus in case of nops, while still keeping the bus in case of successive st/ld - but again this is ST's task to explain).

As I've said, my inline asm skills are not the top, so please don't laugh too loudly. {indentations, tabulators lost, sorry}

 

 

// SCnSCB->ACTLR = SCnSCB_ACTLR_DISDEFWBUF_Msk;  // reduces needed nr of nops  
{    
register uint32_t tmp1, tmp2;    
__asm volatile(      
"movs  %[t1], #1    "     "\n\t"      
"ldr   %[t2], [%[p2], #48]    "     "\n\t"      
"orr   %[t2], #4 "     "\n\t"      
"str   %[t2], [%[p2], #48]    "     "\n\t"
#if(0)      
  "nop"                     "\n\t"      
  "nop"                     "\n\t"
#elif(0)      
  "dsb"                     "\n\t"
#elif(0)      
  "ldr   %[t2], [%[p2], #48]    "     "\n\t"
#elif(1)      
  "ldr   %[t2], [%[p], #0x14]    "     "\n\t"
#else    
  // nothing
#endif      
"str   %[t1], [%[p], #0x14]" "\n\t"      
"nop"                     "\n\t"      
"nop"                     "\n\t"      
"nop"                     "\n\t"      
:[t1] "=&r" (tmp1)      
,[t2] "=&r" (tmp2)      
:[p]   "r" (GPIOC)      
,[p2]   "r" (RCC)    );  
}

 

 

JW

14 REPLIES 14
Piranha
Chief II

There is an another topic about these issues. By the way, the document Jan talks about, is now ES0182 - Rev 14 and the section is "2.2.13 Delay after an RCC peripheral clock enabling". And for L4 and F7 series the issue is not present in errata, because they documented it in the reference manuals. Actually that is the place, where it should be, because it is not a flaw, but a deliberate design feature. Anyway, there are two potential code issues with peripheral clock enabling.

First, the code must ensure that the write to the corresponding RCC register has actually been completed. The DSB instruction introduces a delay of at least 1 cycle and drains the CPU write buffer, but it doesn't know about the buses, peripherals or anything outside of the CPU. For this issue the less restrictive DMB instruction does the same, but ST doesn't mention it because they don't understand the difference between these two instructions. The NOP instruction architecturally doesn't guarantee anything at all and should be used only for padding. Though it does provide a delay of 1 cycle on at least the Cortex cores M0/M3/M4. I don't know about M23/M3x and I definitely wouldn't rely on it on M7/M55/M85. The only correct solution, which forces the write to always go completely though down to the RCC, is to read back the corresponding or any other RCC register.

Second, the code must wait for a clock enable logic to synchronize with the peripheral bus and actually enable the peripheral clock. For F4 series ST states that the delay is 2 cycles for AHB and 1+PRESC cycles for APB buses. For AHB they say that it's AHB cycles. For APB they don't say anything, but it should also be AHB (!) cycles. If so, this translates to 2 peripheral clock cycles for AHB and (1 .. 2] peripheral clock cycles for APB. This also becomes clear by reading the F7 reference manuals like RM0410 Rev 4 section "5.2.12 Peripheral clock enable register (RCC_AHBxENR, RCC_APBxENRy)", which simplifies this and explicitly states that the delay must be 2 peripheral clock cycles. The same is stated in the reference manuals for L4 series, just the very important word "peripheral" is missing in those.

Taking it all into account, a correct code looks like this:

 

RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
(void)RCC->AHB1ENR;
// Wait for 2 peripheral clock cycles.

// Use the peripheral.

 

So, from the three "workarounds" in errata, the two DSB and NOP related ones are just a complete nonsense and doesn't guarantee anything. The third read-back related one actually means a read-back from RCC, which they also fail to state, and solves just the first issue, but ignores the second one, and this is what the HAL and LL implements. The HAL and LL in addition does a bitwise AND operation, but that is also a nonsense, which doesn't guarantee anything. As I'm saying it all the time - almost all of the SPL/HAL/LL/Cube code is broken. And not only the code, but even, when ST's documentation gives some advice on how to write the code, the advice is also often inadequate, wrong or just a nonsense.

> And for L4 and F7 series the issue is not present in errata, because they documented it in the reference manuals.

Oh, I didn't know that, thanks.

> The NOP instruction architecturally doesn't guarantee anything at all and should be used only for padding.

I don't really care for the "architectural" things. That's just another word to say "nonportable", but I don't subscribe to the notion of universal portability in microcontrollers, in the same way as I don't subscribe to thus don't seek an universal bulletproof solution independent of context (a.k.a. driver).

So, if a NOP does provide a reliable at-least-one-cycle delay on CM4, and if the problem's reliable solution is at-least-N-cycles delay, then for me N NOPs on CM4 are a perfect solution too. I really don't care that it won't work on CM7 or CM23, as they are attached to a different harness which modifies the problem anyway.

JW

FBL
ST Employee

Hello @waclawek.jan  

Thank you all for this interesting discussion.

Workarounds are a list of applicable measures. So, it is possible some measures are recommended in specific situations and dependent from the use case.

>if a NOP does provide a reliable at-least-one-cycle delay on CM4

NOP instruction does not guarantee one cycle delay. It can increase execution time, leave it unchanged, or even reduce it. However, it is used for instruction alignment, not necessarily time-consuming. Maybe 2 LDR instructions introduce enough delay.

Did HAL macro fail? If not, what do you suggest?

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.


Hi @FBL ,

An error is an error. No matter whether any Cube/HAL macro works or not, no matter that in other STM32 the workaround works, no matter that there exist specific circumstances when the workaround works - if in STM32F407 exist conditions when the workaround does not work, as proven by the experiment, it's then not a workaround and shall be removed from the STM32F407 errata.

JW

PS. You are quoting from Cortex-A manual, which is irrelevant for Cortex-M4.