STM32H7Sx: unknown setting causing data corruption

Steve Melnikoff · ‎2026-02-16

We are seeing a very odd data corruption issue which is highly repeatable, and appears to be related to a configuration setting which we haven't yet identified.

In our system (running on an STM32H7S7), we're using the STiROT + OEMuROT secure bootloader to load our application. After a minute or two, the application causes a hard fault.

The issue comes down to a POP instruction. I can see in the debugger that a value is being copied from the stack to a register, but one of the bits is flipped. Subsequent instructions do some indirection which ultimately causes the hard fault.

It's always the same instruction in the same line of code. It always occurs within the first minute or two of execution.

Using the debugger, we can see that the value in the stack (stored in DTCM) is correct. The error occurs when it's copied to the register. And although the error always occurs at the same address, it doesn't appear to be a fault with that location, as when I moved the stack to elsewhere in DTCM, exactly the same issue occurred in the relocated stack too.

But here's the thing: if we run the application without the secure bootloader, the error never occurs, and we can run the application for hours with no problem.

So: is there a configuration setting that could explain this?

(See also this post from @ken5, which describes similar symptoms when transferring data from external RAM - on the same project; we're working together on this.)

ST: whether or not you reply privately, please also post any comments here on the forum for the benefit of other users.

Steve Melnikoff · ‎2026-02-16

It looks like this may be due to RAM ECC being enabled, even though we had disabled it in the code.

boot_hal_cfg.h allows the user to disable RAM ECC by commenting out the line:

#define OEMIROT_USER_SRAM_ECC

After disabling this, we noticed that RAM ECC was nonetheless enabled, and tracked it down to the equivalent setting in STiROT_Config.xml ("SRAM ECC management activation")(this also causes ob_flash_programming.bat to be updated).

After disabling that, the data corruption issue appears to have gone away (pending further testing).

Our application runs at 600 MHz. Both OEMuROT and the processor datasheet make passing reference to not using RAM ECC when the processor is running at full speed; but is there a definitive statement on this in a document?

View solution in original post

Steve Melnikoff · ‎2026-02-16

It looks like this may be due to RAM ECC being enabled, even though we had disabled it in the code.

boot_hal_cfg.h allows the user to disable RAM ECC by commenting out the line:

#define OEMIROT_USER_SRAM_ECC

After disabling this, we noticed that RAM ECC was nonetheless enabled, and tracked it down to the equivalent setting in STiROT_Config.xml ("SRAM ECC management activation")(this also causes ob_flash_programming.bat to be updated).

After disabling that, the data corruption issue appears to have gone away (pending further testing).

Our application runs at 600 MHz. Both OEMuROT and the processor datasheet make passing reference to not using RAM ECC when the processor is running at full speed; but is there a definitive statement on this in a document?

smartsplay · ‎2026-02-16

Hi Steve,

Since the corruption only occurs when running via STiROT and OEMuROT and never when booting the application directly, this strongly suggests a configuration difference introduced by the secure boot path rather than a RAM or compiler issue.

On STM32H7Sx, please check whether the secure bootloader configures the MPU differently for DTCM or stack memory. In particular, verify access permissions, secure versus non-secure attribution, and that DTCM is not marked cacheable.

Also confirm that MSP is fully reinitialized and correctly aligned before jumping from OEMuROT to the application. A partial reset or leftover CPU state could affect POP behavior.

Another area to review is cache and ECC initialization. If caches or ECC are enabled in the secure bootloader but not handled consistently in the application, this could explain why the stack value is correct in memory but corrupted when loaded into a register.

Comparing the exact MPU, cache, and clock configuration between secure and non-secure boot flows may help identify the missing setting.

Hope this helps narrow it down.

Steve Melnikoff · ‎2026-02-16

Thank you for your suggestions. It looks like I posted shortly before you. :)

As you can see, it was indeed the ECC settings.

CMYL · ‎2026-03-13

Dear @Steve Melnikoff,

I just wan to enrich your solution with the official information in the WIKI page :

Security:OEMiRoT OEMuRoT for STM32H7S - stm32mcu in the "system clock frequency" section."

In this wiki, it is mentioned that it is possible to increase the system clock up to 600MHz if the constraints related to power and temperature are met, and if the RAMECC security feature is disabled (boot_hal_cfg.h, ob_flash_programming.bat (.sh) for OEMiRoT bootpath or STiROT_Config.xml for STiRoT + OEMuRoT bootpath).

Best Regards

Steve Melnikoff · ‎2026-03-13

That's useful; thank you.

What I would like to see in addition is:

An explicit statement in the chip reference manual, datasheet and/or errata document, that RAMECC cannot be used above 400 MHz (or whatever the maximum is).
Ideally, an explanation as to why this is the case.