How can I analyze the reason of a randomly called HardFault_Handler()

RockfordC64 · ‎2022-10-12

Hello Community,

I have a STM32L051R6T6 CortexM0+ MCU on an customer Board and I#m developing an application for it in CubeIDE and C++, using HAL-Driverlib. It's reading from 2 UARTS by interrupt, verifying the data, converts it to telegram-objects and puts them into an queue. The main cycle processes both queues (which are pointer arrays for my telegram-objects) and logs the actual status to a LPUART.

Randomly my app runs into HardFault_Handler(). When I enlarge the heap, it seem to happen a little bit later. When I look into the call stack, the HardFault is signaled in a new or a delete operation (different ones, not the same everytime). So it looks like I have a memory leak or something similar.

I triple checked my code, added counters for new and delete([]) operators. I cant find any problem. Allocated memory is freed again.

It would be nice, if I could ask somehow for the actual allocated heap size, to be more sure, that I dont run out of heap memory.

I also tried to use CubeProgrammer, because it has got an Hard fault detection and analyzer. Loaded the same elf file to MCU, (Also looked for SCB->CCR register because of SCB_CCR_UNALIGN_TRP Flag. IDE says its on, CubeProgrammer says its of with the same elf file running). But the Hard fault detection doesnt trigger, when the Hard fault happens (I know this because of the logging and a LED on the board). So maybe I'm using it wrong.

Is there a way to analyze the Hardfault_Handler in CubeIDE. What information can I read from the registers or other statuses ? Is there a way to debug the memory better, like I know from Microsofts magic numbers in debug mode?

Or should I leave from using std-classes (std::string and std::queue at the moment)?

RockfordC64 · ‎2022-10-12

What I forgot:

The HardFault happens sometimes after 3 minutes, sometimes after 45 seconds. But the amount of data at the receiving UARTS is constant (40bytes every 50ms at 9600baud). Also the amount of logging at the LPUART is nearly constant.

Tesla DeLorean · ‎2022-10-12

I've published Hard Fault Handlers a couple of time to the forum.

For the M0 you would need to be particularly concerned about packed structures or byte streams where you attempt to access word, double word or 64-bit load/store operations. Unpacking via pointers being particularly prone as the compiler may assume alignment.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

SMacl.1 · ‎2022-10-12

This sounds a bit like a heap error that I encountered once. My C library is/was 'newlib (I use the GNU arm none eabi toolchain)'. Its malloc impl calls _sbrk when it needs more RAM. My build uses libnosys for its impl of _sbrk. The logic there is to just hand back chunks of memory, starting at _end, which your linker script supplies. NO attention is paid to where the heap might actually end.

In essence, there is no heap checking at all. malloc will never fail (return NULL), since _sbrk is always giving back more memory. On my CPU, this overflowed heap, growing UP, coincided with the main stack (i.e. the kernel stack of a system w RTOS) growing DOWN. You can imagine the carnage when an ISR or SVC trap routine runs and uses its stack, it sprays areas of the heap with bad data.

The solution is simply to provide your OWN _sbrk impl, one that actually does honor you heap constraints. I can post mine here if of interest.

Check your impls of malloc and, particularly, _sbrk.

BTW I too have done some work on fault handling, you might want to see

https://github.com/tobermory/faultHandling-cortex-m

Tesla DeLorean · ‎2022-10-12

https://github.com/cturvey/RandomNinjaChef/blob/main/KeilHardFault.c

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

RockfordC64 · ‎2022-10-12

I will study them. I also read a lot about HardFault handling. But at the moment I do not understand a lot of what is written there. Because I'm new to MCUs. And its not easy to find out, if this works with my MCU, my toolchain, my OS...

I wrote a lot of C++ and C# code in MS-VS. There I had and solved other problems but not things like reading and writing registers, programming interruptroutines,

shifting bits... But I will learn.

Thank you.

Tesla DeLorean · ‎2022-10-12

It's akin to the blue screen of death, or bombs on the screen.

Decoding it allows you to identify the culprits.

Knowing the register content, and exact point of the fault help to allow you to view a code disassembly in context. Listing files can usually show the flow or C code related. Most often with pointers, you could add asserts() or sanity check on code paths you identify as failing.

Things which are "imprecise" infers it was a write to memory, as those are deferred via write buffers, so you might have to walk a couple of instructions back to see the related STR instruction, as the faulting address is inexact.

Things which are entirely random or inconsistent are most likely memory corruption, stack or interrupt related. For example a callback routine has too many auto/local variables, or modifies things beyond it's scope.

Watch also that auto/local variables are frequently not initialized, the content will be random stack content unless explicitly cleared or set to some value.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Pavel A. · ‎2022-10-12

STM32L051R6 has only 8K of RAM, this is so much about the heap size, tasks and fancy C++ classes.

This can be refreshing for a Windows programmer used to megabytes of everything ((

RockfordC64 · ‎2022-10-13

Thank you,

first I have to find out what is _sbrk, arm none eabi toolchain, libnosys.

What I know is: I use the CubeIDE-toolchain as it is. I only had to switch the compilers optimization to debug, because without optimization the flash was to small.

And the code runs without RTOS. (this is called bare metal. right?)

I'm asking me, when my boss will be worryfull about the time, I have to invest here.(~_ö)

RockfordC64 · ‎2022-10-13

The first thing I have done now, is to avoid all classes and Functions from std namespace.

I never use std::string, only char[] and str...() functions.

But the problem still resists.

Maybe the next step could be, to convert all my code to raw c insted of c++...