STM32H755: flash write/erase issue with IWDG1/WWDG1 enabled

murali.karicheri · ‎2024-02-07

Hi,

We have a custom board based on STM32H755. We have two configuration of this SoC. In the normal operation we run a Bare-metal application on M7 and Zephyr OS application on M4 in a dual core mode. When requiring firmware update, we run a Zephyr application on M7 alone.

1) In the dual core mode, M4 Zephyr OS sees two flash partitions

image2 - read only - 896KiB

bootloader - read/write - 128KiB

2) In the firmware update application

M7 sees both banks and following partitions

Bank 0

image1 - read-write- 896KiB

configuration read/write - 128KiB

Bank 1

image 2 - read-write- 896KiB

bootloader - read only - 128KiB

I am able to update bootloader partition in 1) without IWDG1/WWDG1 enabled. Similarly I am able to update image1, configuration and image 2 partitions without IWDG1/WWDG1 enabled.

But when I enable IWDG1/WWDG1, firmware update doesn't work on Bank 1 partitions (bootloader partition in case 1) and image 2 partition on Bank 1. It either get hung or board reset reset cause Watchdog reset.

So there is some interaction between Flash erase/write on bank 1 with watchdog. Need your help to understand this issue and for a potential solution.

Pavel A. · ‎2024-02-07

Hmm what comes to mind first... the speculative execution issue causing weird delays? ST recommends using MPU to block the problematic address regions.

murali.karicheri · ‎2024-02-07

Thanks for the quick reply. We have that in our code base. Hope this is what you are referring to. We have the below code in M7 code which is the master

/*

* System memory attributes inhibit the speculative fetch,

* preventing the RDSERR Flash error

* Copied from https://github.com/zephyrproject-rtos/zephyr/pull/60765

*/

MPU_InitStruct.Enable = MPU_REGION_ENABLE;

MPU_InitStruct.BaseAddress = D1_AXIICP_BASE;

MPU_InitStruct.Size = MPU_REGION_SIZE_512KB;

MPU_InitStruct.AccessPermission = MPU_REGION_PRIV_RW;

MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;

MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;

MPU_InitStruct.Number = MPU_REGION_NUMBER3;

MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;

MPU_InitStruct.SubRegionDisable = 0x00;

MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;



HAL_MPU_ConfigRegion(&MPU_InitStruct);

Pavel A. · ‎2024-02-07

@murali.karicheri No this is something different. I don't know about the Flash RDSERR issue. This code seems to disable execution (and instruction fetch) from some internal AXI memory. What I meant is this:

https://drive.google.com/file/d/1g_-mDfAIYs99pRifeAvfpunHBVIdq_a8/view

and page 8 in https://www.st.com/resource/en/application_note/dm00272912-managing-memory-protection-unit-in-stm32-mcus-stmicroelectronics.pdf

It is about the un-populated address areas of external memories.

TDK · ‎2024-02-07

Works without IWDG, so the cpu is getting stalled during the update causing IWDG to trigger a reset. Likely this is during the erase procedure. What does your code for this update look like? The cpu will be stalled if it tries to read from a bank which has an erase operation in progress. This is likely what you're running in to.

If you feel a post has answered your question, please click "Accept as Solution".

murali.karicheri · ‎2024-02-08

Thanks @Pavel A. We have that enabled as well. This covers the entire 4GB space. Could you please confirm if this is you are referring to? The external memory area 0x60000000 to 0xE0000000 is part of this region.

/* Configure the MPU as Strongly ordered for not defined regions */

 MPU_InitStruct.Enable = MPU_REGION_ENABLE;

 MPU_InitStruct.BaseAddress = 0x00;

 MPU_InitStruct.Size = MPU_REGION_SIZE_4GB;

 MPU_InitStruct.AccessPermission = MPU_REGION_NO_ACCESS;

 MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

 MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;

 MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;

 MPU_InitStruct.Number = MPU_REGION_NUMBER0;

 MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;

 MPU_InitStruct.SubRegionDisable = 0x87;

 MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;



 HAL_MPU_ConfigRegion(&MPU_InitStruct);

Pavel A. · ‎2024-02-08

Yes it looks so. But as @TDK replied if any of your code touches the affected flash bank it will stall. Hopefully your erase code runs in RAM (including timer interrupt handler etc.)

murali.karicheri · ‎2024-02-09

Hi,

I have tried moving my code for flash erase/write() application functions as well as following related code to RAM and doesn't seem to do anything to solve the issue

zephyr_code_relocate(FILES ${ZEPHYR_BASE}/drivers/flash/flash_shell.c LOCATION RAM)

zephyr_code_relocate(FILES ${ZEPHYR_BASE}/drivers/flash/flash_stm32h7x.c LOCATION RAM)

zephyr_code_relocate(FILES ${ZEPHYR_BASE}/drivers/flash/flash_page_layout.c LOCATION RAM)

zephyr_code_relocate(FILES ${ZEPHYR_BASE}/drivers/timer/cortex_m_systick.c LOCATION RAM)

zephyr_code_relocate(FILES ${ZEPHYR_BASE}/drivers/timer/sys_clock_init.c LOCATION RAM)

What else required to be moved?

But I still don't understand how watchdog triggered in this case. The Watchdog is initialized and pet by M7. So it is theoretically watching the M7 Bare-metal software. The flash erase/write is happening on M4 which is running Zephyr OS. Even if it stalls CPU, it works perfectly fine without IWDG1 enabled. We don't enable IWDG2 related to M4. So there shouldn't be any dependency in this case. Do you know why the M7 Watchdog fires when M4 do something? Also M7 code is running from Bank 0. So M7 CPU is not stalled since erase/write happens on Bank 1 (done by M4)

Pavel A. · ‎2024-02-09

> What else required to be moved?

IMHO better not to relocate the zephyr stuff this way, but to write a small, self contained erase function + reset the watchdog + maybe, timer interrupt handler for timeout. Read the disassembly & map. Otherwise you never are sure that anything is not forgotten.

> The flash erase/write is happening on M4

If the M7 touches code in the erased bank it will hang too. Where is the code that resets the watchdog?

TDK · ‎2024-02-09

The evidence would suggest that the M7 code does get stalled. If you want to test, set up a pin to get toggled in a timer interrupt every 1ms or something and verify activity during the erase/write on the M4.

If it does get stalled, should be a number of methods of tracking down where/why. Monitor an independent timer in an interrupt. When it has a large jump, breakpoint and look at where the call stack is at. Will require watchdog to be disabled during debugging, naturally.

If it's not getting stalled, well then you're either not petting the watchdog appropriately, or there is a critical silicon bug that is dependent on what another core is doing. You can decide yourself which of those is more likely.

If you feel a post has answered your question, please click "Accept as Solution".