2024-02-07 03:30 PM
Hi,
We have a custom board based on STM32H755. We have two configuration of this SoC. In the normal operation we run a Bare-metal application on M7 and Zephyr OS application on M4 in a dual core mode. When requiring firmware update, we run a Zephyr application on M7 alone.
1) In the dual core mode, M4 Zephyr OS sees two flash partitions
image2 - read only - 896KiB
bootloader - read/write - 128KiB
2) In the firmware update application
M7 sees both banks and following partitions
Bank 0
image1 - read-write- 896KiB
configuration read/write - 128KiB
Bank 1
image 2 - read-write- 896KiB
bootloader - read only - 128KiB
I am able to update bootloader partition in 1) without IWDG1/WWDG1 enabled. Similarly I am able to update image1, configuration and image 2 partitions without IWDG1/WWDG1 enabled.
But when I enable IWDG1/WWDG1, firmware update doesn't work on Bank 1 partitions (bootloader partition in case 1) and image 2 partition on Bank 1. It either get hung or board reset reset cause Watchdog reset.
So there is some interaction between Flash erase/write on bank 1 with watchdog. Need your help to understand this issue and for a potential solution.
2024-02-07 04:06 PM
Hmm what comes to mind first... the speculative execution issue causing weird delays? ST recommends using MPU to block the problematic address regions.
2024-02-07 04:12 PM
Thanks for the quick reply. We have that in our code base. Hope this is what you are referring to. We have the below code in M7 code which is the master
/*
* System memory attributes inhibit the speculative fetch,
* preventing the RDSERR Flash error
* Copied from https://github.com/zephyrproject-rtos/zephyr/pull/60765
*/
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = D1_AXIICP_BASE;
MPU_InitStruct.Size = MPU_REGION_SIZE_512KB;
MPU_InitStruct.AccessPermission = MPU_REGION_PRIV_RW;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER3;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
MPU_InitStruct.SubRegionDisable = 0x00;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
2024-02-07 04:31 PM - edited 2024-02-07 04:45 PM
@murali.karicheri No this is something different. I don't know about the Flash RDSERR issue. This code seems to disable execution (and instruction fetch) from some internal AXI memory. What I meant is this:
https://drive.google.com/file/d/1g_-mDfAIYs99pRifeAvfpunHBVIdq_a8/view
It is about the un-populated address areas of external memories.
2024-02-07 05:45 PM
Works without IWDG, so the cpu is getting stalled during the update causing IWDG to trigger a reset. Likely this is during the erase procedure. What does your code for this update look like? The cpu will be stalled if it tries to read from a bank which has an erase operation in progress. This is likely what you're running in to.
2024-02-08 10:59 AM - edited 2024-02-08 11:00 AM
Thanks @Pavel A. We have that enabled as well. This covers the entire 4GB space. Could you please confirm if this is you are referring to? The external memory area 0x60000000 to 0xE0000000 is part of this region.
/* Configure the MPU as Strongly ordered for not defined regions */
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.BaseAddress = 0x00;
MPU_InitStruct.Size = MPU_REGION_SIZE_4GB;
MPU_InitStruct.AccessPermission = MPU_REGION_NO_ACCESS;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER0;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
MPU_InitStruct.SubRegionDisable = 0x87;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
2024-02-08 12:05 PM
Yes it looks so. But as @TDK replied if any of your code touches the affected flash bank it will stall. Hopefully your erase code runs in RAM (including timer interrupt handler etc.)
2024-02-09 10:24 AM - edited 2024-02-09 10:26 AM
Hi,
I have tried moving my code for flash erase/write() application functions as well as following related code to RAM and doesn't seem to do anything to solve the issue
2024-02-09 11:02 AM
> What else required to be moved?
IMHO better not to relocate the zephyr stuff this way, but to write a small, self contained erase function + reset the watchdog + maybe, timer interrupt handler for timeout. Read the disassembly & map. Otherwise you never are sure that anything is not forgotten.
> The flash erase/write is happening on M4
If the M7 touches code in the erased bank it will hang too. Where is the code that resets the watchdog?
2024-02-09 11:10 AM - edited 2024-02-09 11:12 AM
The evidence would suggest that the M7 code does get stalled. If you want to test, set up a pin to get toggled in a timer interrupt every 1ms or something and verify activity during the erase/write on the M4.
If it does get stalled, should be a number of methods of tracking down where/why. Monitor an independent timer in an interrupt. When it has a large jump, breakpoint and look at where the call stack is at. Will require watchdog to be disabled during debugging, naturally.
If it's not getting stalled, well then you're either not petting the watchdog appropriately, or there is a critical silicon bug that is dependent on what another core is doing. You can decide yourself which of those is more likely.