cancel
Showing results for 
Search instead for 
Did you mean: 

Need help with obscure Hard Fault on reset!!

AVoel.1
Associate III

We've got a bizarre Hard Fault on STM32L452. It happens immediately after reset, and ONLY after programming our bootloader with the CubeProgrammer after doing a full chip erase. If one subsequently programs the bootloader again using the same hex file but not performing the full chip erase, the problem will vanish and never return until the full chip erase sequence and programming sequence is repeated. And it doesn't always happen. It's plaguing us. It feels like a tool issue, but until we understand, it is very worrying.

I've instrumented the state of a number of relevant registers in the hard fault handler, using code found on the internet. But I'm having trouble understanding what the state of the registers implies. Any help by folks here who really understand this part of the architecture would be appreciate, particularly the fact that the hard fault seems to triggered by a debug event. It makes little sense to us, as does the fact that the PC is seemingly set to the beginning of data memory.

0693W00000KZc2PQAT.png

9 REPLIES 9
AVoel.1
Associate III

A couple more registers printed out from the handler ...

0693W00000KZcRjQAL.png

Harvey White
Senior III

Look at the fault analyzer. Generally, if it's a hard fault with a determined address, ,then it's trying to access memory somewhere it shouldn't. This kinda tells me that the chip is trying to access memory at 0x20000000. The kind of fault can be indicative.

Now:

doing a full chip erase sets a lot of things to zero. Running a program goes through the setups, which initialize some things, but not all. So it's either your code that expects something to be set up (and it isn't), or the C/C++ code that does. Not at all sure since I've not hit this program. Things called out of order where a pointer isn't set up, yes. This? no.

About all that I can come up with.

Watch for how it maps the memory at zero, when the flash is blank the ROM is likely remapped there.

Make sure you set SCB->VTOR to your own code in FLASH, and have a Hard Fault Handler outputting actionable data.

The programming software might use RAM to stage a loader, be mindful perhaps to change the stack pointer and how your startup code overwrites RAM.​

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

It does seem that the execution is entering data memory, based on some debugging. Because the problem cannot be duplicated in debug mode, I added the follow to the very first startup code:

.section	.text.Reset_Handler
	.weak	Reset_Handler
	.type	Reset_Handler, %function
	.extern startup_state
Reset_Handler:
  ldr   sp, =_estack    /* Set stack pointer */
 
  ldr r1, =startup_state
  mov r2, #0x55
  str r2, [r1]

Then when the problem occurs, I print out the value of startup_state, along with the rest of the trace shown previously. It appears that we don't even reach the Reset_Handler code, since startup_state is not set to 0xff. So it seems that execution starts in data memory. Why? I have no idea. Some tool problem with the option bytes, some undocumented factory bootloader feature??

It's worth remembering that once our bootloader runs once successfully, it never misbehaves again. So whatever state got cleaned out, it never comes back. It really does seem like a bizarre tool problem. After all, how can our code be at fault when it never even gets a chance to run (see above)!!

Check what you're doing with the BOOT pin(s) in your design, or OB equivalents/overrides.

The L4's will switch to ROM if the default boot memory is invalid/empty at POWER UP, a regular reset won't change this behaviour due to a latching methodology, which I suppose relates to glitching protection methods, and a desire for the System Loader to be functional on blank devices on PCBA where designers strapped the devices for normal operation, and no accommodations where made with jumpers or test pads/pins.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
SPeac.1
Associate II

Hi.

How are you getting on with this.. I have the same issue but a different processor and have raised a support case.

I suspect you are using the "Download" button on the first tab...

try the "Erase & Programming tab" (second tab)

Load the File Path, Tick "Run After Programming" then "start programming" and don't go back to the first tab.

I suspected this is a non-reset after programming "feature" so your code isn't as dangerous as you suspect.

If you notice you PC = 0x200000000 i.e. RAM .. that's the downloader. Dead give-away

good hunting

Simon

AVoel.1
Associate III

We concluded that this had to be a tool issue, perhaps interacting with an undocumented hardware feature. It does not happen on the production line when using other programming tools.

SPeac.1
Associate II

Well ST confirmed the "Disconnect" button is just that.. it disconnects the unit.. no cleanup at all.

Hopefully the next person searching for this answer will find this useful...

Simon Peacock
Associate II

So a follow-up for anyone finding the same issue and ending up here..

This is expected behavior. Using the first tab to program will/may cause a Hard Fault on exit (even if you don't see it). Most likely all versions up to V2.11.0. So don't panic, read the message above to see how to avoid this situation, If using the command line, then add "-s" to the end of the parameters for a clean reboot after programming.

If you like this, maybe we can get ST to fix this "Bug" "Feature"