2025-09-29 11:40 AM - edited 2025-09-29 11:45 AM
I am implementing a firmware-update mechanism on STM32H755 (Nucleo board, MCU rev U) in which the two flash banks are independent. New firmware is flashed by erasing the "other" bank (which is always mapped at 0x0810 0000 and accessed with the ____2 register set, e.g. FLASH_CR2), then writing the candidate firmware image into that bank, verifying a SHA hash after programming, and finally toggling the SWAP_BANK bit in option bytes, performing an option-byte update, and software reset to boot into the new image. This works, and I can perform one update after another, ping-ponging back and forth between the banks. Note that the running application is always mapped at 0x0800 0000, is [much] less than 1 MByte, and never touches the other bank at 0x0810 0000 except to erase and program it when an update is requested.
The issue I am having is that when the code runs in Bank 2 and an update commences, the bank erase of Bank 1 is stalling the CPU for several seconds. The same thing does *not* happen when the code is running in Bank 1 and performs a bank erase on Bank 2. Also, the actual programming never stalls the CPU---only the initial bank erase, and only when it is Bank 1 being erased.
AFAICT, there is no difference at all between these two scenarios---everything should be identical between them, down to the contents of every single register and memory location, *except* for the SWAP_BANK bit in the option bytes. So why would one behave differently? (Note, for this test, I am updating with an identical firmware image, so there is no difference pre- and post-update.)
I was able to find corroborating evidence of this unexpected behavior in another forum post, here:
(Note the OP is not describing it---it's later in the thread, user @MDrew.1 , and they apparently never got to the bottom of it.)
I know better than to claim a silicon bug :) although I cannot think of another explanation. Can you? Best-case scenario, someone has figured out an undocumented way around this problem.
Why does this matter? Well, I'd like to use a watchdog timer, and 4+ seconds is a pretty long watchdog expiration. Also, other RTOS tasks are supposed to be running in the background during the update process, and they do, just fine---when the code is running in Bank 1.
Solved! Go to Solution.
2025-10-03 4:55 PM
I'm happy to report that I found the root cause of the bus stall. And it's not what I hypothesized above.
@mƎALLEm was on the right track, except the issue was not speculative access at all. (It would be surprising for the CPU to speculatively access system memory, out of the blue, unless something else was reading nearby addresses.) No, the issue was intentional access...
My ADC driver was reading the VREFINT calibration data at 0x1ff1e860, which I forgot is technically part of Bank 1 flash. It seems odd that a region you can't erase, and can't program, can nevertheless block a flash erase operation in that bank. Presumably there is a hardware reason for this? I fell into the trap of thinking of it as a separate ROM area, disconnected from flash.
Setting up the MPU to block system-memory accesses quickly gave me a MemManage fault, which allowed a quick pinpoint of the problem code. In my project this is easy to resolve by storing the cal data once during initialization instead of accessing it repeatedly.
2025-10-04 4:52 AM
thanks for coming back with the solution. Nice catch, and far from being straightforward!
While the 'H7 DS+RM clearly place these "constants" into one of the FLASH banks, some other families (I've checked the mid-range 'L4) don't, but as the "constants" are surely factory-programmed into the system-portion of FLASH, they are likely to constitute some of the banks (and probably Bank1), too. I wonder, what percentage of the dozens of unresolved threads reporting similar problems on this forum throughout the years could be traced to this very issue...
JW