cancel
Showing results for 
Search instead for 
Did you mean: 

STM32U5A5 instruction fault after WFI

kamnxt
Associate

Struggling with a really weird issue here.

I'm working on a project using Zephyr on an STM32U5A5ZJT6Q. We are using an LSE-clocked LPTIM as the tick source, PM enabled with STOP0 and STOP1 modes.

At some point, we started noticing random crashes, which seemed to occur sporadically, more often on some boards than others. These would usually be in the `idle` zephyr thread, specifically after a `wfi` instruction.

Digging around, I found some errata (specifically 2.2.26 for STM32U575xx STM32U585xx, related to flash prefetch, and now also 2.2.6 for STM32U59xxx STM32U5Axxx, related to wake-up from stop mode).

HardFault on wake-up from Stop mode may occur in debug mode
Description
A HardFault may occur at wake-up from Stop mode when the following conditions are met:
• Device is in debug mode.
• DBG_STOP bit is set in DBGMCU_CR.
• A wake-up event/interrupt from an SRD peripheral (except EXTI) occurs in a timing window of four clock
cycles during Stop mode entry sequence. SRD peripherals are the ones connected to AHB3 and APB3.
Workaround
None.

Usually what happens here is that we get a UsageFault / undefined instruction fault (not HardFault) at an ISB after the WFI. We suspected this was related to flash prefetching, but it also happens (although less often?) when putting the idle function in RAM:

Disassembly of section .ramfunc:

20000000 <arch_cpu_idle>:
20000000: b508 push {r3, lr}
20000002: f000 f80d bl 20000020 <__sys_trace_idle_veneer>
20000006: b672 cpsid i
20000008: 2300 movs r3, #0
2000000a: f383 8811 msr BASEPRI, r3
2000000e: f3bf 8f6f isb sy
20000012: f3bf 8f4f dsb sy
20000016: bf30 wfi
20000018: b662 cpsie i
2000001a: f3bf 8f6f isb sy
2000001e: bd08 pop {r3, pc}

20000020 <__sys_trace_idle_veneer>:
20000020: f85f f000 ldr.w pc, [pc] ; 20000024 <__sys_trace_idle_veneer+0x4>
20000024: 080d9dc7 .word 0x080d9dc7
...

With this code, the fault always happens at 2000001a, and checking from our fault handler, we see a pending interrupt from the LPTIM timer.
However, this also seems to happen without a debugger connected and a full power cycle, which means we shouldn't be in debug mode?

Could you either confirm that this is erratum 2.2.6, or if this might be something else? I tried adding an extra ISB between `wfi` and `cpsie i`:

Disassembly of section .ramfunc:

20000000 <arch_cpu_idle>:
20000000: b508 push {r3, lr}
20000002: f000 f811 bl 20000028 <__sys_trace_idle_veneer>
20000006: b672 cpsid i
20000008: 2300 movs r3, #0
2000000a: f383 8811 msr BASEPRI, r3
2000000e: f3bf 8f6f isb sy
20000012: f3bf 8f4f dsb sy
20000016: bf30 wfi
20000018: f3bf 8f6f isb sy
2000001c: b662 cpsie i
2000001e: f3bf 8f6f isb sy
20000022: bd08 pop {r3, pc}
20000024: 0000 movs r0, r0

And this made the fault PC point to 0x20000020, and fail with a HardFault this time, with bit 30 (FORCED) set in HFSR.

Maybe there's a way to figure out what gets corrupted, and find a workaround for this issue?

0 REPLIES 0