cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7 WWDG expires during Flash Erase

ADunc.1
Senior

I have been stuck on a tricky little issue for some time now.

Using an STM32H753, on a reasonably large and complex application, when trying to erase the upper flash bank, the watchdog expires resetting the chip!

My suspicion is that the CPU is stalling during the erase (~5s), but watchdog keeps going until it resets.

I am running low on ideas to debug this, I am open to any help. Here are some things of significance:

  • This is a long and established firmware, this issue only came about after some recent changes. Until then flash erase has worked flawlessly for years.
  • Rolling back versions and changes does not help highlight the issue. It seems to be more related to flash memory placement. Even inserting one assembly instruction can cause/stop this problem.
  • Watchdog has EWI interrupt enabled. This fires if I force a watchdog to occur. But it does not before the reset during flash erase.
  • Flash is set up as dual bank. I am only erasing the upper bank. No memory in the upper bank is ever (intentionally) accessed by main code. With the exception of flash programming, which occurs well after erase is complete. All code only ever executes out of the lower bank.
  • I checked for access to upper bank memory by setting a MPU region. A mem manage fault would occur if I forced an access to this region, but never during normal operation or during flash erase. This was tricky to do without altering flash memory placement!
  • I checked errata, I am not using flash swap.
  • Breakpoint on reset after watchdog, then manually tracing back running threads and interrupt stacks did not lead us to anything useful.
  • I can not see what register values were right before reset as all registers and peripherals are reset by the watchdog.
  • Disabling the watchdog, and showing the STM32cube tick counter (uwTick) in the SWV trace, I see it go flat during the flash erase. So the timer interrupt is not being serviced. If I pause execution during this time, the code stops during the timer ISR, but not until the flash erase finishes. I suspect the ISR code executes after the CPU is paused as some time is needed to break the CPU. But, I always see the same stack trace. But nothing in the registers or ram, or variables around that area suggests any access to the upper flash bank.

It seems most likely something is trying to access the flash while it is being erased causing the CPU to stall. I guess due to a code bug. I am after ideas, or any insight into the STM32H7 that could help track this issue to its cause.

Thanks in advance...

4 REPLIES 4
alister
Lead

Some ideas...

  • Yes the core should stall if it attempts to read or execute from a flash bank during its erase.
  • The code to enable an MPU region to detect accesses of the upper bank during its erased could be faulty somehow and not cover every case of access. For execution case, an inspection of the map file should reveal if .text extends into the upper bank.
  • You might experiment masking interrupts during the erase to prevent other threads/interrupts executing during the erase.
  • You might configure a timer to count say every microsecond and use that to measure erase times. or you might toggle an output before and after that you can measure.
  • EWI not firing and uwTick not changing seems consistent with a stall.
  • Is your power supply ok during the erase?
  • Is __HAL_RCC_GET_FLAG(RCC_FLAG_WWDG1RST) set on boot after a suspected watchdog? Other unexpected flags in RCC_RSR?
  • If your watchdog is less than twice the max erase time, you would have to schedule your erase to begin immediately after the WWDG is serviced.
  • You might consider the watchdog's purpose and whether its period might be safely relaxed. Obviously you want to know why this is occurring though.

Thanks for your ideas. Here is my progress on them so far:

  • Yes the core should stall if it attempts to read or execute from a flash bank during its erase.
  • The code to enable an MPU region to detect accesses of the upper bank during its erased could be faulty somehow and not cover every case of access. For execution case, an inspection of the map file should reveal if .text extends into the upper bank.

I have messed around with the MPU a lot and added test cases. It always has a mem manage fault. But never during normal operation. It is all access disabled, so any access to any address in that region by the CPU should mem manage fault. I checked the map file. also the linker script exposes only the lower bank so nothing is located in upper bank by the compiler.

  • You might experiment masking interrupts during the erase to prevent other threads/interrupts executing during the erase.

Currently working through this. Unfortunately changing almost anything stops the problem from happening!

  • You might configure a timer to count say every microsecond and use that to measure erase times. or you might toggle an output before and after that you can measure.

I can see the erase duration in the SWV graph. Approx 5-6 seconds.

  • EWI not firing and uwTick not changing seems consistent with a stall.
  • Is your power supply ok during the erase?

Yup, rock soild. This is also an existing product that is pretty solid. Literally tens of thousands of flash erases have been done before, never with any problem. Just since a big recent code restructure. Even now, adding a single nop can cause/fix the problem so think is firmware/timing related.

  • Is __HAL_RCC_GET_FLAG(RCC_FLAG_WWDG1RST) set on boot after a suspected watchdog? Other unexpected flags in RCC_RSR?

Yup, watchdog reset flag is set. Also disabling the watchdog stops the reset happening, leaving just the CPU stall.

  • If your watchdog is less than twice the max erase time, you would have to schedule your erase to begin immediately after the WWDG is serviced.

Watchdog is serviced every 16ms (10 - 20msm window) by a system thread that runs during the erase operation. The thread that initiates erase is suspended during erase.

  • You might consider the watchdog's purpose and whether its period might be safely relaxed. Obviously you want to know why this is occurring though.

There is safety related compliance aspects that mean the watchdog needs to be left enabled at all times, and on a short leash!

Thanks for the feedback. Helps to discuss it and makes me think down alternate paths.

The CPU should not halt at all during erase as is dual bank mode so main code including watchdog servicing should truck along as normal during the erase.

I am thinking there is a memory overrun or pointer issue somewhere and at runtime there is an access into that address space by accident. But buggered if I can find or prove that!

Stefano Ugolini
Associate II

Hi ADunc.1, 

Did you find the issue?
We have a similar behavior, our FW is running on bank 1 while the FW erases bank 2.

CPU is not stall during flash erase operation although we noticed that DMAs interrupt where significantly delayed.

Does your execution time depends on interrupts?

Does anyone know if flash erase operation affects interrupt executions?

Amy
Associate

Hi,

I would like to know if you found the root cause of it. I had a similar issue, watchdog resets when erasing flash. It is not watchdog problem. 

I suspected this "It seems most likely something is trying to access the flash while it is being erased causing the CPU to stall." but I have not found the proof yet.

Thanks,