Firmware Self-Upgrade from RAM sometimes fails with hard fault

JArwe.1 · ‎2022-08-10

Hi!

I want to update the firmware of my STM32H743 device over I2C. The way I would like to do this is:

Transfer the new firmware (~200 KB) over I2C and store it in RAM (all in user code)
Execute a function which is executed from RAM (ITCM to be specific)
This function clears and reprograms flash, then issues a system reset to load the new firmware

The function looks like this:

__attribute__ ((section(".itcm"))) void UpdateFirmware(uint8_t* buffer, int size)
{
    __disable_irq();
    HAL_FLASH_Unlock_ITCM();
 
    __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_EOP | FLASH_FLAG_OPERR | FLASH_FLAG_WRPERR | FLASH_FLAG_PGSERR);
 
    for (int i = 0; i < size; i += FLASH_SECTOR_SIZE)
    {
        FLASH_Erase_Sector_ITCM(FLASH_SECTOR_0 + i, FLASH_BANK_1, VOLTAGE_RANGE_3);
    }
 
    for (int i = 0; i < size; i += FLASH_PROGRAM_BYTE_SIZE)
    {        
        HAL_FLASH_Program_ITCM(FLASH_TYPEPROGRAM_FLASHWORD, FLASH_ADDRESS + i, (uint32_t)(buffer + i));
    }
 
    HAL_FLASH_Lock_ITCM();
 
    SystemReset_ITCM();
}

The functions with the '_ITCM' suffix are essentially copies of the regular HAL functions that have the '__attribute__ ((section(".itcm")))' applied to them. The '.itcm' section is defined in my linker file, and is copied to ITCM at startup. The idea is that the whole 'UpdateFirmware' function can run entirely from RAM after the flash is erased.

I know that my code works in principle, because often it does work absolutely fine.

However, sometimes, the MCU seems to stop after the Erase step and goes into a Hard Fault. It is quite hard to debug at this point, as the flash is already empty at this point. This issue may be related to communication on the I2C interface, though I am not 100% sure at this point.

I have a theory why this happens. I think it has to do with the fact that the NVIC table is gone after the flash is erased, and for some reason the MCU tries to execute a handler, and goes into a hard fault as the handler is not there anymore.

However, as you can see, I use '__disable_irq();' at the very beginning of the function. Is this not enough to prevent that from happening? Am I missing something?

Btw. I know that there might be better ways to update the firmware. I have considered them and unfortunately the way I've described seems to be the only option I have.

Any help would be much appreciated.

Piranha · ‎2022-08-11

And what will happen if the power goes out while the flash is erased/programmed?

Why a normal bootloader in the first flash sector is not an option?

Georgy Moshkin · ‎2022-08-11

Add CRC check for I2C..

Does section(".itcm") guarantee that FLASH_Erase_Sector_ITCM is located in RAM? For example:

__attribute__ ((section(".itcm"))) someFunc(int a, int b, int c)
{
   a=b+c;
   externalFunc(c); // is it in ITCM RAM? I think that it may be located elsewhere..
}

Disappointed with crowdfunding projects? Make a lasting, meaningful impact as a Tech Sponsor instead: Visit TechSponsor.io to Start Your Journey!

JArwe.1 · ‎2022-08-11

> And what will happen if the power goes out while the flash is erased/programmed?

Then the upgrade fails and the Firmware needs to be repaired with ST-Link. I am aware of this.

> Why a normal bootloader in the first flash sector is not an option?

Because I need all other flash sectors for other stuff.

This is why I mentioned in my post that I am aware that there are better ways to do this. In a future hardware revision I will do it differently. But for now I am stuck with what I have.

JArwe.1 · ‎2022-08-11

Thanks for your suggestions!

A am doing a CRC check on the incoming firmware, so I think this should be fine.

Regarding your other point: No, the 'section(".itcm")' by itself does not guarantee that any functions called in the body will be located in ITCM. But I have defined FLASH_Erase_Sector_ITCM like this:

__attribute__ ((section(".itcm"))) void FLASH_Erase_Sector_ITCM(uint32_t Sector, uint32_t Banks, uint32_t VoltageRange)
{
  // function body copied from HAL source
}

And I also did this with the other functions, and also any sub-functions that would be called inside them.

I don't think there is a general problem with this approach, because most of the time it actually works...

Georgy Moshkin · ‎2022-08-11

Not 100% sure, maybe HAL_FLASH_IRQHandler should work for correct operation of HAL driver logic. Or interrupt is re-enabled somewhere despite calling __disable_irq();

I think it is possible to save memory areas with registers and compare them: comp reg1.bin reg2bin, and then look up all the differences through reference manual.

reg1.bin - after hard fault, reg2.bin - after erase loop.

Disappointed with crowdfunding projects? Make a lasting, meaningful impact as a Tech Sponsor instead: Visit TechSponsor.io to Start Your Journey!

Bob S · ‎2022-08-11

> This issue may be related to communication on the I2C interface, though I am not 100% sure at this point.

My first boss had a saying on his wall: "One measurement is worth a thousand guesses".

Find out WHY your get the fault. If using CubeIDE, it has a fault analyzer that will (should) tell you exactly what line of code generated the fault. If not using CubeIDE then add code to handle the fault and display the fault registers (search this forum or the web, there are plenty out there).