Jump to application from bootloader not working

DavidNaviaux · ‎2023-12-17

I have reviewed the many posts from others that have not been able to jump from a bootloader the application code and nothing that I tried worked.

My MCU has its 128kB of flash in two banks, one at 0x08000000 and the other at 0x08040000. When the MCU is reset, it starts with the code in the first bank, so this is where my bootloader is located.

I have 6kB of flash reserved for storing calibration and configuration information immediately following the bootloader. I have the application code in the second bank (0x08040000).

The sole purpose of my bootloader is to allow an application software package to update the application firmware over a Modbus connection (via USB or RS485). I am able to send the new firmware and successfully program the new application code. I have verified this by using the STM32CubeProgrammer to compare the file to the updated flash contents.

Without the bootloader, the application code works fine from STM23CubeIDE when debugging.

When the bootloader resets, it verifies the application code CRC, then it will jump to the application code in the second bank. I have the following code that is a compilation from a few posts, but it is not working. Please Help...

#define APP_ADDR	0x08040000		// my MCU app code base address
#define	MCU_IRQS	102u				// no. of NVIC IRQ inputs

struct app_vectable_ {
    uint32_t Initial_SP;
    void (*Reset_Handler)(void);
};

#define APPVTAB	((struct app_vectable_ *)APP_ADDR)
void JumpToApploader(void)
{
	/* Disable all interrupts */
	__disable_irq();

	/* Disable Systick timer */
	SysTick->CTRL = 0;

	/* Set the clock to the default state */
	HAL_RCC_DeInit();

	/* Clear Interrupt Enable Register & Interrupt Pending Register */
	for (uint8_t i = 0; i < (MCU_IRQS + 31u) / 32; i++)
	{
		NVIC->ICER[i]=0xFFFFFFFF;
		NVIC->ICPR[i]=0xFFFFFFFF;
	}

	/* Re-enable all interrupts */
	__enable_irq();

	// Set the MSP
	__set_MSP(APPVTAB->Initial_SP);

	// Jump to app firmware
	APPVTAB->Reset_Handler();
}


///////////////////////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////////////////////
void dbaseGoToApp(void)
{
	// start executing the application code if it appears to be valid
	if (dbaseIsAppOk())
	{
		JumpToApploader();
	}
}

DavidNaviaux · ‎2023-12-18

Thank you Piranha, I have added the synchronization barriers as you pointed out and I did change my APPVTAB cast as you suggested. This code, at least for the STM32G474 works great. Because I am also concerned about a "bricked" mcu due to an application bug or data corruption, I check a switch setting just before this code to see if a request to erase the application code is issued.

This code is inserted just after the SystemClock_Config() function call at the beginning of main(). I hope that this can help someone else.

I should point out that given that the firmware will seldom be updated, and the flash has a 10,000 write lifetime, I'm not too concerned about degrading the flash by writing to it to direct whether to run the bootloader or the application code.

Also, since my application code can be split between two areas, for me it simplified things by recording the locations and the # of pages for each area and the total CRC in my configuration structure. Of course, I used the CRC for validating the application flash before jumping to it.

Here is my new code (Thanks to you, gbm, and RHJ).

	/* USER CODE BEGIN SysInit */

	/////////////////////////////////////////////////////////////////////////////////////////////////
	// Get the rotary switch setting so we can check for special power on duties
	/////////////////////////////////////////////////////////////////////////////////////////////////

	// Temporarily enable the GPIOB and read the rotary switch position to check for bricked MCU
	// recovery request
	{
		// if the rotary switches are set to F7 we will erase the application code
		GPIO_InitTypeDef GPIO_InitStruct = {0};

		// enable GPIOB so that we can read the rotary switches
		__HAL_RCC_GPIOB_CLK_ENABLE();

		// put pull-up resistors on the rotary switch inputs
		GPIO_InitStruct.Pin =
				SW1_1_Pin|SW1_2_Pin|SW1_4_Pin|SW1_8_Pin|SW2_1_Pin|SW2_2_Pin|SW2_4_Pin|SW2_8_Pin;
		GPIO_InitStruct.Mode = GPIO_MODE_INPUT;
		GPIO_InitStruct.Pull = GPIO_PULLUP;
		HAL_GPIO_Init(GPIOB, &GPIO_InitStruct);

		// delay for the switch capacitance to charge with the pull-ups
		HAL_Delay(10);

		// read the switches
		ucRotarySwitch = GetSlaveIdHex();

		// we don't need the GPIOB clock any more at this point
		__HAL_RCC_GPIOB_CLK_DISABLE();
	}

	/////////////////////////////////////////////////////////////////////////////////////////////////
	// check if we should jump to the application code
	/////////////////////////////////////////////////////////////////////////////////////////////////

	// initialize the CRC functionality so we can validate the application firmware
	hcrc.Instance = CRC;
	hcrc.Init.DefaultPolynomialUse = DEFAULT_POLYNOMIAL_ENABLE;
	hcrc.Init.DefaultInitValueUse = DEFAULT_INIT_VALUE_ENABLE;
	hcrc.Init.InputDataInversionMode = CRC_INPUTDATA_INVERSION_NONE;
	hcrc.Init.OutputDataInversionMode = CRC_OUTPUTDATA_INVERSION_DISABLE;
	hcrc.InputDataFormat = CRC_INPUTDATA_FORMAT_BYTES;
	if (HAL_CRC_Init(&hcrc) == HAL_OK)
	{
		// read the config structure from flash and fix it if it is corrupt
		//
		// NOTE: there are two copies of the configuration struct in two separate 2kB flash pages
		//       so that we can recover if one of the copies is corrupted (e.g. power loss while
		//       writing to one of the copies).
		dbaseLoadAndFixConfig();

		// erase the application if the rotary switch is set to "F7" which will allow us to recover
		// from a bricked MCU because the application firmware is not working
		if (ucRotarySwitch == 0xf7)
		{
			// erase the vector table of the application and update the application data in the
			// config structure
			dbaseEraseApp();

			// save the updated configuration to flash
			dbaseSaveConfig();
		}


		// try to jump to the application code if requested
		if (config.bGoToApp)
		{
			uint32_t ulCrc;

			// make sure that the application vector table is not erased
			if (*(uint32_t *)(APP_ADDR)!=0xffffffff && *(uint32_t *)(APP_ADDR+4)!=0xffffffff)
			{
				// is appears that we have a vector table for the application code

				// now make sure our config struct agrees
				if ( config.nBank2Pages && config.uBotBank2Page==0)
				{
					// according to the config struct, we have at least one page in bank 2 that holds the application
					// code and that the code in bank 2 starts at the first page in the bank (the vector table)

					// calculate the CRC of the application code in bank2
					ulCrc = HAL_CRC_Calculate(&hcrc, (uint32_t*)APP_ADDR, config.nBank2Pages*FLASH_PAGE_SIZE);

					// add the crc from the first bank if we have some code there too
					if (config.nBank1Pages)
					{
						// some of the application code was placed in bank1 so we need to include it in the CRC
						ulCrc = HAL_CRC_Accumulate(&hcrc, (uint32_t*)(config.uBotBank1Page*FLASH_PAGE_SIZE+BOOT_ADDR),
									config.nBank1Pages*FLASH_PAGE_SIZE);
					}

					// make sure that the CRC is valid
					if (ulCrc == config.ulAppCrc)
					{
						// at this point, we know:
						// 1. a boot to the application code is requested
						// 2. that the vector table for the application code is not erased
						// 3. that our config struct shows that we have application code at the application vector table
						// 4. that the CRC of the application code is valid

						// undo some of our initialization in preparation for the jump to the application code

						/* Set the clock to the default state */
						HAL_RCC_DeInit();

						/* Disable all interrupts */
						__disable_irq();

						/* Disable Systick timer */
						SysTick->CTRL = 0;

						// set the vector table address to the application vector table
						SCB->VTOR = APP_ADDR;

						// Set the stack pointer
						__set_MSP(APPVTAB->Initial_SP);

						/* Re-enable all interrupts */
						__enable_irq();

						// and now jump to the application vector

						// provide data and instruction synchronization barriers before the jump
						__DSB();
						__ISB();

						// jump to the application
						APPVTAB->Reset_Handler();
					}
				}
			}
		}
	}

	// fall through to run the bootloader if for any reason we didn't boot to the application


	/* USER CODE END SysInit */

DavidNaviaux · ‎2023-12-18

Should I locate the __DSB() and __ISB() before the __enable_irq(), just before the jump to the application code, or does it matter?

DavidNaviaux · ‎2023-12-18

In my application, there could be up to 247 of these boards on a common RS485 multi-drop line. I wrote a program that runs on a PC and uses Modbus to communicate with the boards. The PC software allows the firmware to be updated on all connected boards in sequence. It takes about 6 seconds per boards to update the firmware. When the software completes transferring the firmware to each board, it instructs the board to save the information about the firmware, which includes the location and # of flash pages used in each of the two flash areas that it can occupy. In my application, the bootloader is responsible for generating the CRC across the non-contiguous flash regions. I'm always concerned about an interrupted firmware update. Each system requires that 72 boards all function without any problems, so I typically go to a lot of effort to make sure that that is the case. Even one bricked board renders the product useless. In the past, I have also initiated a watchdog timer just before the jump to the application. The application must either continue to trigger the watchdog or disable it. When the bootloader is reset, it can check to see if it was the watchdog timer, i.e. something when wrong with the application. A retry count could be maintained to prevent jumping to the application if it is exceeded. 1000 boards are being delivered to the customer within the next two weeks, so the watchdog timer will have to wait.

As I said in another post, I'm not concerned about using flash to transfer important information between the bootloader and the application since it will not be written very often (mostly when updating firmware) and the 10000 write life of the flash will last far more than the life of the product.

Each device that my software is capable of updating firmware on, transmits a unique firmware ID #. I haven't incorporated this feature yet, but the file name could be such that that firmware ID must be incorporated to prevent programming an incompatible device. Is there another way to do that?

Piranha · ‎2023-12-19

It seems that you saw the initial version of my post, which was a bit incorrect. A bit later I updated it, so just check my previous post as the updated version shows the correct code. Yes, the barriers should be before enabling the interrupts, because the VTOR and SP changes must be complete until then, so that the interrupts use the updated values.

If the checksum/hash does not come with the firmware, you cannot trust whether the file was not corrupted during the transfer, by software on PC or any other way. Generally the best way of storing the metadata like the device ID, firmware version, anti-rollback counter, addresses and sizes of the FLASH blocks, checksum/hash, signature, shared secret for asymmetric cryptography etc. is to add a header in front of the firmware. The header is also flashed to he device and that is where the bootloader looks first. Another simpler option is to insert some data in the free locations of the vector table. On Cortex-M even the first 16 words have at least 5 reserved words, which gives 20 bytes that can definitely store the device ID, firmware version, CRC-32 and some more data. And in addition one can use all the other words in vector table, about which one can be sure that those will never be necessary for any future firmware version. Anyway, the device ID must be validated and the update refused at the bootloader, not at a PC application or anywhere else. By the way, with "device ID" I mean the "model number", not the "serial number". And, of course, the device has to have those values stored at production time at a different FLASH/OTP location, which is never modified.

g_xBootRAM = 0xB00720AD;
NVIC_SystemReset();

Even, if wearing out FLASH is not an issue, there is still no point in implementing a relatively complex large code (and to a lesser extent that is also true for RTC registers), when there is a much simpler and better way like this.

DavidNaviaux · ‎2023-12-20

Yes, I noticed your update and that is why I asked. I have updated my bootloader to properly position the barriers before enabling the interrupts.

When updating the firmware, the new firmware is being sent to the bootloader via Modbus which verifies that the data was sent and programmed correctly. The process aborts if any problem was detected before the entire firmware update has been sent.

When the last 2kB page is sent, the bootloader calculates the firmware CRC across the two non-continuous flash regions and saves all of the required information needed by the bootloader to verify that the application code is valid.

If I use the filename of the update firmware to include the device ID, the PC update software could inform the user that the firmware file that they selected is not valid for the attached device(s) before writing any new firmware to the flash. I prefer this to the alternative.

I like your suggestion to use an uninitialized variable to signal that the bootloader should run and not jump to the application code. I don't have time to implement that at the moment. But will likely do that for my next project using this MCU (coming in a couple of weeks).

Thank you for your detailed responses. They have been very helpful to me.

DavidNaviaux · ‎2023-12-25

Piranha, I had time to try to run the bootloader as you suggested. However, I have tried to create the uninitialized "g_xBootRam" variable but have not been successful. I am sure that it is very simple, but all of my modifications to the linker script files haven't worked.

Could give me an example of how you can create that uninitialized variable?

DavidNaviaux · ‎2023-12-25

Piranha,

I got it all working. I'm using the top 8 bytes of RAM as my uninitialized flag variable and the top 8 bytes of flash to hold the application CRC.

I feel better that I am not updating a flash record to indicate whether I jump to the app or not. It is much simpler, as you said.

Thanks,

David