2022-12-07 12:11 PM
We have an STM32H750 ARM Cortex M7 processor in our product and are compiling C code using STM32CubeIDE. We want to have default code in FLASH that is able to run the default application by itself but which can also load an updated application to AXI RAM and run the update from RAM. Eventually the update will have different initialized variables and jump tables so my thinking is that it would make sense for the branch from FLASH to RAM to be in the startup code before that initialization takes place. However, for the moment, the default and update source codes are identical. The update and default are compiled separately and linked with different .ld files. This produces .bin files that are identical except for 0x08 or 0x24 in the MSB of addresses. Both the default and the update contain exception vectors, startup code, and properly set VTOR. We do not normally want to write the update back into FLASH and eventually this has to work with RDP2 security.
I have included startup code and link file listings at the end of this post.
The process should work as follows ...
power up reset
FLASH startup initializes and runs the default application from FLASH
default loads the update to AXI RAM from an external interface
reset via the reset pin
FLASH startup sees the update
FLASH startup branches to RAM to initialize and run the update from RAM
... but it doesn't. The last step goes off into the weeds, or maybe it simply locks up.
The FLASH default runs fine by itself with FLASH_BOOT = 0x08000800.
The same RAM update code runs properly if I program the FLASH_BOOT = 0x24002400, write the update to AXI RAM via JTAG, and reset the chip.
The FLASH default continues to run properly if I manually coerce __UpdateContinueInit_veneer to do a ldr pc, [pc] branch back to ContinueInit in FLASH.
The FLASH default checks the CRC of the update at the end of loading it from the external interface. I have also checked the contents of RAM locations at the beginning, middle, and end of the update before resetting to make sure they are correct. I am therefor confident that the RAM update is loaded properly.
It just doesn't seem that the processor likes the ldr pc, [pc] to the AXI RAM. I am hoping that someone can tell me what nuance of this processor I am missing. Or perhaps I have found an unpublished errata?
The startup code is as follows:
Reset_Handler:
ldr sp, =_estack /* set stack pointer */
ldr r1, =0x580244dc
ldr r2, =0xe0000000
str r2, [r1] /* turn on the RAM1-3 clocks - this is essential! */
ldr r1, =0x51008108
ldr r2, =0x00000001
str r2, [r1] /* Set READ_ISS_OVERRIDE in AXI_TARG7_FN_MOD */
ldr r0, =0x580244d0
ldr r1, [r0] /* get the value in the reset status register */
ldr r2, =0x00460000
cmp r2, r1 /* compare reset status to external reset value */
ldr r2, =0x00010000
str r2, [r0] /* clear the reset status register */
bne ContinueInit /* use default code if not from external reset */
ldr r0, =magic_cookie
ldr r1, [r0] /* get the value in the magic cookie */
ldr r2, =0x12345678
cmp r2, r1 /* compare magic cookie to update value */
bne ContinueInit /* use default code if no cookie match */
ldr r2, =0x00100010
str r2, [r0] /* clear the magic cookie */
b UpdateContinueInit /* branch to the update in RAM */
ContinueInit:
/* Copy the data segment initializers from code to SRAM */
movs r1, #0
b LoopCopyDataInit
CopyDataInit:
ldr r3, =_sidata
ldr r3, [r3, r1]
str r3, [r0, r1]
adds r1, r1, #4
LoopCopyDataInit:
ldr r0, =_sdata
ldr r3, =_edata
adds r2, r0, r1
cmp r2, r3
bcc CopyDataInit
ldr r2, =_sbss
b LoopFillZerobss
/* Zero fill the bss segment. */
FillZerobss:
movs r3, #0
str r3, [r2], #4
LoopFillZerobss:
ldr r3, = _ebss
cmp r2, r3
bcc FillZerobss
/* Call the clock system intitialization function.*/
bl SystemInit
/* Call static constructors */
bl __libc_init_array
/* branch to the default main program */
bl main
bx lr
The startup code also contains the exception vectors. The FLASH default linker .ld files is:
ENTRY(Reset_Handler)
/* Highest address of the user mode stack */
_estack = 0x20020000; /* end of DTCMRAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x400; /* required amount of heap */
_Min_Stack_Size = 0x800; /* required amount of stack */
/* 1mS counter location used by ISR */
uwTick = 0x20000000;
magic_cookie = 0x20000004;
UpdateContinueInit = 0x240146fa;
/* Specify the memory areas */
MEMORY
{
FLASH (rx) : ORIGIN = 0x08000000, LENGTH = 128K
DTCMRAM (xrw) : ORIGIN = 0x20000008, LENGTH = 0x1fff8
RAM123 (xrw) : ORIGIN = 0x30000000, LENGTH = 288K
}
/* Define output sections */
SECTIONS
{
/* The startup code goes first into FLASH */
.isr_vector :
{
. = ALIGN(4);
KEEP(*(.isr_vector)) /* Startup code */
. = ALIGN(4);
} >FLASH
/* The program code and other data goes into FLASH */
.text :
{
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
/* Constant data goes into FLASH */
.rodata :
{
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
} >FLASH
.ARM.extab : { *(.ARM.extab* .gnu.linkonce.armextab.*) } >FLASH
.ARM : {
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
} >FLASH
.preinit_array :
{
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
} >FLASH
.init_array :
{
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
} >FLASH
.fini_array :
{
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
} >FLASH
/* used by the startup to initialize data */
_sidata = LOADADDR(.data);
/* Initialized data sections goes into RAM, load LMA copy after code */
.data :
{
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
} >DTCMRAM AT> FLASH
/* Uninitialized data section */
. = ALIGN(4);
.bss :
{
/* This is used by the startup in order to initialize the .bss secion */
_sbss = .; /* define a global symbol at bss start */
__bss_start__ = _sbss;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = _ebss;
} >RAM123
/* User_heap_stack section, used to check that there is enough RAM left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >DTCMRAM
/* Remove information from the standard libraries */
/DISCARD/ :
{
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
.ARM.attributes 0 : { *(.ARM.attributes) }
}
The RAM update linker .ld file is the same except for ...
UpdateContinueInit = 0x080146fa;
MEMORY
{
DTCMRAM (xrw) : ORIGIN = 0x20000008, LENGTH = 0x1fff8
AXIRAM (xrw) : ORIGIN = 0x24000000, LENGTH = 0x80000
RAM123 (xrw) : ORIGIN = 0x30000000, LENGTH = 288K
}
... and AXIRAM replaces FLASH in SECTIONS.
(UpdateContinueInit points back to the FLASH in the update .ld file for the moment to preserve the symmetry between the update and the default.
2022-12-07 12:16 PM
I should have also included from the default listing:
080151f8 <__UpdateContinueInit_veneer>:
80151f8: f85f f000 ldr.w pc, [pc] ; 80151fc <__UpdateContinueInit_veneer+0x4>
80151fc: 240146fb .word 0x240146fb
2022-12-07 02:43 PM
Suggest you have a HardFault_Handler that outputs actionable data, so you can see it fault for illegal instructions, bad addresses, et al.
Use the debugger, step the transition code. Should be able to rapidly determine what's going wrong.
Don't quite understand the use/necessity for hard coded symbols
Long branches perhaps indirect via a register, perhaps have as a vector table entry so as not to have to keep track of every build and link. ie a fixed table describing the deeper RAM entry point(s), that the linkage of the RAM code constructs.
Control needs to transfer to an ODD address
I don't think it needs an MPU setting to execute from 0x24000000 space.
>>It just doesn't seem that the processor likes the ldr pc, [pc] to the AXI RAM
Usually big caveats on doing that, more implementation agnostic to do it via a register
LDR R0, =__some_far_off_code ; ODD address
BX R0
>>Or perhaps I have found an unpublished errata?
Some how I doubt it.
2022-12-14 01:18 PM
Thank you, Tesla DeLorean, for your feedback. I would like to say that I understand and fixed the problem, but cannot. What I can say is that I stumbled upon something that works without understanding why it works. Allow me to explain.
Per your suggestion I modified the test and jump code:
ldr r0, =magic_cookie
ldr r1, [r0] /* get the value in the magic cookie */
ldr r2, =0x12345678
cmp r2, r1 /* compare magic cookie to update value */
bne ContinueInit /* use default code if no cookie match */
ldr r2, =10
str r2, [r0] /* clear the magic cookie */
ldr r1, =0x24000034 /* unused vector holds ram jump address */
ldr r1, [r1]
bx r1 /* branch to the RAM update at its ContinueInit */
It still locked up and writes to the magic cookie in the various exception vectors did not produce results, so I put together some software to read out the RAM program to see if it was getting corrupted after checking the CRC. The normal sequence of events is:
The ARM would lock up after step 6 and resetting it would get it back to running from FLASH. If I then read the RAM I would find that it was indeed corrupted in 32 byte blocks at random places that were different each trial. But if I read the RAM after step 4, the RAM was uncorrupted and steps 5 and 6 were then successful. I then modified the CRC checking and ran some experiments:
///////////////////////////////////////////////////////////////////////////
// Step 3 - finish computing and check the CRC after the last code block //
///////////////////////////////////////////////////////////////////////////
if (3 == UpdateState)
{
/* code that calculates the CRC */
pkgCrc = ((*(char *)(readaddr ) & 0xFF) << 8)
| (*(char *)(readaddr + 1) & 0xFF);
readaddr = writeaddr = 0x24000000;
if (crcAccumulator == pkgCrc)
{
UpdateState = 4;
}
else
{
UpdateState = 5;
}
return(TRUE);
}
//////////////////////////////////////////////
// Step 4 - good code, dummy read all of it //
//////////////////////////////////////////////
if (4 == UpdateState )
{
if (readaddr < end_of_update)
{
for (i = 0; i < 512; i++)
{
checksum += *(char *)readaddr;
readaddr++;
}
return(TRUE);
}
else
{
magic_cookie = 0x12345678;
UpdateState = 0;
}
}
////////////////////////////////////////
// Step 5 - bad code, erase all of it //
////////////////////////////////////////
This code gets executed once every main loop iteration to keep the watchdog happy and as shown branches properly to the RAM update at step 6, but if I change ...
checksum += *(char *)readaddr;
... to ...
checksum += *(char *)writeaddr;
... then the ARM locks up at step 6. It appears that reading the RAM after writing it and before trying to execute from it makes all the difference. Why, I have no idea.
Can anyone explain this?
Thanks!
2022-12-14 01:20 PM
I have added more info to "FLASH startup code does not branch to RAM."