cancel
Showing results for 
Search instead for 
Did you mean: 

FLASH startup code does not branch to RAM

obiewhistler
Associate II

We have an STM32H750 ARM Cortex M7 processor in our product and are compiling C code using STM32CubeIDE. We want to have default code in FLASH that is able to run the default application by itself but which can also load an updated application to AXI RAM and run the update from RAM. Eventually the update will have different initialized variables and jump tables so my thinking is that it would make sense for the branch from FLASH to RAM to be in the startup code before that initialization takes place. However, for the moment, the default and update source codes are identical. The update and default are compiled separately and linked with different .ld files. This produces .bin files that are identical except for 0x08 or 0x24 in the MSB of addresses. Both the default and the update contain exception vectors, startup code, and properly set VTOR. We do not normally want to write the update back into FLASH and eventually this has to work with RDP2 security.

I have included startup code and link file listings at the end of this post.

The process should work as follows ...

power up reset

FLASH startup initializes and runs the default application from FLASH

default loads the update to AXI RAM from an external interface

reset via the reset pin

FLASH startup sees the update

FLASH startup branches to RAM to initialize and run the update from RAM

... but it doesn't. The last step goes off into the weeds, or maybe it simply locks up.

The FLASH default runs fine by itself with FLASH_BOOT = 0x08000800.

The same RAM update code runs properly if I program the FLASH_BOOT = 0x24002400, write the update to AXI RAM via JTAG, and reset the chip.

The FLASH default continues to run properly if I manually coerce __UpdateContinueInit_veneer to do a ldr pc, [pc] branch back to ContinueInit in FLASH.

The FLASH default checks the CRC of the update at the end of loading it from the external interface. I have also checked the contents of RAM locations at the beginning, middle, and end of the update before resetting to make sure they are correct. I am therefor confident that the RAM update is loaded properly.

It just doesn't seem that the processor likes the ldr pc, [pc] to the AXI RAM. I am hoping that someone can tell me what nuance of this processor I am missing. Or perhaps I have found an unpublished errata?

The startup code is as follows:

Reset_Handler:  
  ldr   sp, =_estack       /* set stack pointer */
 
  ldr   r1, =0x580244dc
  ldr   r2, =0xe0000000
  str   r2, [r1]           /* turn on the RAM1-3 clocks - this is essential! */
  ldr   r1, =0x51008108
  ldr   r2, =0x00000001
  str   r2, [r1]           /* Set READ_ISS_OVERRIDE in AXI_TARG7_FN_MOD      */
 
  ldr   r0, =0x580244d0
  ldr   r1, [r0]           /* get the value in the reset status register     */
  ldr   r2, =0x00460000
  cmp   r2, r1             /* compare reset status to external reset value   */
  ldr   r2, =0x00010000
  str   r2, [r0]           /* clear the reset status register                */
  bne   ContinueInit       /* use default code if not from external reset    */
 
  ldr   r0, =magic_cookie
  ldr   r1, [r0]           /* get the value in the magic cookie              */
  ldr   r2, =0x12345678
  cmp   r2, r1             /* compare magic cookie to update value           */
  bne   ContinueInit       /* use default code if no cookie match            */
 
  ldr   r2, =0x00100010
  str   r2, [r0]           /* clear the magic cookie                         */
  b     UpdateContinueInit /* branch to the update in RAM                    */
 
ContinueInit:
/* Copy the data segment initializers from code to SRAM */
  movs  r1, #0
  b  LoopCopyDataInit
CopyDataInit:
  ldr   r3, =_sidata
  ldr   r3, [r3, r1]
  str   r3, [r0, r1]
  adds  r1, r1, #4
LoopCopyDataInit:
  ldr   r0, =_sdata
  ldr   r3, =_edata
  adds  r2, r0, r1
  cmp   r2, r3
  bcc   CopyDataInit
  ldr   r2, =_sbss
  b     LoopFillZerobss
 
/* Zero fill the bss segment. */
FillZerobss:
  movs  r3, #0
  str  r3, [r2], #4
LoopFillZerobss:
  ldr  r3, = _ebss
  cmp  r2, r3
  bcc  FillZerobss
 
/* Call the clock system intitialization function.*/
  bl    SystemInit
 
/* Call static constructors */
  bl    __libc_init_array
 
/* branch to the default main program */
  bl    main
  bx    lr

The startup code also contains the exception vectors. The FLASH default linker .ld files is:

ENTRY(Reset_Handler)
 
/* Highest address of the user mode stack */
_estack = 0x20020000;    /* end of DTCMRAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size  = 0x400; /* required amount of heap  */
_Min_Stack_Size = 0x800; /* required amount of stack */
 
/* 1mS counter location used by ISR */
uwTick = 0x20000000;
magic_cookie  = 0x20000004;
UpdateContinueInit = 0x240146fa;
 
/* Specify the memory areas */
MEMORY
{
   FLASH (rx)     : ORIGIN = 0x08000000, LENGTH = 128K
   DTCMRAM (xrw)  : ORIGIN = 0x20000008, LENGTH = 0x1fff8
   RAM123 (xrw)   : ORIGIN = 0x30000000, LENGTH = 288K
}
 
/* Define output sections */
SECTIONS
{
  /* The startup code goes first into FLASH */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH
 
  /* The program code and other data goes into FLASH */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
    *(.eh_frame)
 
    KEEP (*(.init))
    KEEP (*(.fini))
 
    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH
 
  /* Constant data goes into FLASH */
  .rodata :
  {
    . = ALIGN(4);
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    . = ALIGN(4);
  } >FLASH
 
  .ARM.extab   : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
  .ARM : {
    __exidx_start = .;
    *(.ARM.exidx*)
    __exidx_end = .;
  } >FLASH
 
  .preinit_array     :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  } >FLASH
    .init_array :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
  } >FLASH
    .fini_array :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT(.fini_array.*)))
    KEEP (*(.fini_array*))
    PROVIDE_HIDDEN (__fini_array_end = .);
  } >FLASH
 
  /* used by the startup to initialize data */
  _sidata = LOADADDR(.data);
 
  /* Initialized data sections goes into RAM, load LMA copy after code */
  .data : 
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
 
    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */
  } >DTCMRAM AT> FLASH
 
  
  /* Uninitialized data section */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss secion */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)
 
    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM123
 
  /* User_heap_stack section, used to check that there is enough RAM left */
  ._user_heap_stack :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >DTCMRAM
 
  
 
  /* Remove information from the standard libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }
 
  .ARM.attributes 0 : { *(.ARM.attributes) }
}

The RAM update linker .ld file is the same except for ...

UpdateContinueInit = 0x080146fa;

MEMORY

{

   DTCMRAM (xrw) : ORIGIN = 0x20000008, LENGTH = 0x1fff8

   AXIRAM (xrw)  : ORIGIN = 0x24000000, LENGTH = 0x80000  

   RAM123 (xrw)  : ORIGIN = 0x30000000, LENGTH = 288K

}

... and AXIRAM replaces FLASH in SECTIONS.

(UpdateContinueInit points back to the FLASH in the update .ld file for the moment to preserve the symmetry between the update and the default.

4 REPLIES 4
obiewhistler
Associate II

I should have also included from the default listing:

080151f8 <__UpdateContinueInit_veneer>:

 80151f8: f85f f000 ldr.w pc, [pc] ; 80151fc <__UpdateContinueInit_veneer+0x4>

 80151fc: 240146fb .word 0x240146fb

Suggest you have a HardFault_Handler that outputs actionable data, so you can see it fault for illegal instructions, bad addresses, et al.

Use the debugger, step the transition code. Should be able to rapidly determine what's going wrong.

Don't quite understand the use/necessity for hard coded symbols

Long branches perhaps indirect via a register, perhaps have as a vector table entry so as not to have to keep track of every build and link. ie a fixed table describing the deeper RAM entry point(s), that the linkage of the RAM code constructs.

Control needs to transfer to an ODD address

I don't think it needs an MPU setting to execute from 0x24000000 space.

>>It just doesn't seem that the processor likes the ldr pc, [pc] to the AXI RAM

Usually big caveats on doing that, more implementation agnostic to do it via a register

        LDR   R0, =__some_far_off_code ; ODD address

        BX   R0

>>Or perhaps I have found an unpublished errata?

Some how I doubt it.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
obiewhistler
Associate II

Thank you, Tesla DeLorean, for your feedback. I would like to say that I understand and fixed the problem, but cannot. What I can say is that I stumbled upon something that works without understanding why it works. Allow me to explain.

Per your suggestion I modified the test and jump code:

  ldr   r0, =magic_cookie
  ldr   r1, [r0]  /* get the value in the magic cookie       */
  ldr   r2, =0x12345678
  cmp   r2, r1 /* compare magic cookie to update value           */
  bne   ContinueInit /* use default code if no cookie match  */
 
  ldr   r2, =10
  str   r2, [r0]  /* clear the magic cookie                         */
  ldr	r1, =0x24000034 /* unused vector holds ram jump address */
  ldr	r1, [r1]
  bx    r1     /* branch to the RAM update at its ContinueInit   */

It still locked up and writes to the magic cookie in the various exception vectors did not produce results, so I put together some software to read out the RAM program to see if it was getting corrupted after checking the CRC. The normal sequence of events is:

  1. Program the FLASH and BOOT_ADD0/1 via JTAG sometime in the past.
  2. Allow the ARM to boot from FLASH.
  3. Send the ARM the RAM update program via its external interface.
  4. The ARM verifies the update's CRC and sets the magic cookie.
  5. Reset the ARM via its reset pin.
  6. The ARM boots from FLASH, sees the magic cookie, and branches to RAM.

The ARM would lock up after step 6 and resetting it would get it back to running from FLASH. If I then read the RAM I would find that it was indeed corrupted in 32 byte blocks at random places that were different each trial. But if I read the RAM after step 4, the RAM was uncorrupted and steps 5 and 6 were then successful. I then modified the CRC checking and ran some experiments:

    ///////////////////////////////////////////////////////////////////////////
    // Step 3 - finish computing and check the CRC after the last code block //
    ///////////////////////////////////////////////////////////////////////////
 
	if (3 == UpdateState)
	{
		/* code that calculates the CRC */
 
		pkgCrc = ((*(char *)(readaddr    ) & 0xFF) << 8)
			   |  (*(char *)(readaddr + 1) & 0xFF);
 
		readaddr = writeaddr = 0x24000000;
 
		if (crcAccumulator == pkgCrc)
		{
			UpdateState = 4;
		}
		else
		{
			UpdateState = 5;
		}
		return(TRUE);
	}
 
    //////////////////////////////////////////////
    // Step 4 - good code, dummy read all of it //
    //////////////////////////////////////////////
 
	if (4 == UpdateState )
	{
		if (readaddr < end_of_update)
		{
			for (i = 0; i < 512; i++)
			{
				checksum += *(char *)readaddr;
				readaddr++;
			}
			return(TRUE);
		}
		else
		{
			magic_cookie = 0x12345678;
			UpdateState = 0;
		}
	}
 
    ////////////////////////////////////////
    // Step 5 - bad code, erase all of it //
    ////////////////////////////////////////

This code gets executed once every main loop iteration to keep the watchdog happy and as shown branches properly to the RAM update at step 6, but if I change ...

checksum += *(char *)readaddr;

... to ...

checksum += *(char *)writeaddr;

... then the ARM locks up at step 6. It appears that reading the RAM after writing it and before trying to execute from it makes all the difference. Why, I have no idea.

Can anyone explain this?

Thanks!

I have added more info to "FLASH startup code does not branch to RAM."