Hard Fault Exception by build with optimisation

bm2 · ‎2024-07-29

Hello,

I've a project which are build with the STM32CubeIDEA Version 1.16.0. I use the tool chain 11.3.rel1. After build with optimisation -O0 the application works without problems. It is implemented with bare metal and with using of the LL version 1.6.1 for the G0 library.

Now when I build the application with a other optimisation, like -Og, the software runs in a hardfault exception.

After some days, today, I will search for this problem and collect all information. And murphy struck. I can't reproduce this problem.

I remeber me on the following information in the hardfault exeption:

it was equal which optimisation I used, the behaviour was the same
the LR register value was 0xfffffff8 in the hardfault exeption handle
the callstack history show me the same calls
the last function the the callstack history was a ISR, which IRQ was disabled

So I have here found a other problem https://community.st.com/t5/stm32cubeide-mcus/stm32g0-hardfault-and-signal-handler-fffffff1/m-p/676866 . If this the same problem? My MCU have only one Flash bank but the main loop runs into the RAM for using the performance without wait states from the Flash.

Exists a Errata Sheet to the Bug in the Silicon?

Best regards,

Bernd

bm2 · ‎2024-08-28

after long time of search and tests I found the problem. the developer have enabled the low power regulator by 64MHz SYSCLK :(

View solution in original post

TDK · ‎2024-07-29

Typically if an error occurs during Release but not Debug it's due to a code bug, rather than a silicon error. The latter is extremely rare.

Optimizations can lead to faster code, which can expose race conditions. It can also expose improper/missing usage of the volatile qualifier, and other things. Without being able to reproduce it, going to be difficult to say anything with certainty. If it's a bug, you'll probably encounter it again.

If you feel a post has answered your question, please click "Accept as Solution".

Tesla DeLorean · ‎2024-07-29

What part? The STM32G030F6 ?

Optimization surfaced flaws are almost always latent software failing, infrequently the compiler, and even less frequently the chip.

Watch for things that need to be volatile.

Watch for things that need/expect alignment, the CM0(+) is intolerant of unaligned pointers, much like the older ARM7/ARM9 designs.

Get a properly functioning Hard Fault Handler, determine consistent points of failure, and instrument, with random, look at interrupt/callback issues. Check stack depth and utilization, and code exceeding bounds.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

bm2 · ‎2024-07-29

@TDK, ok it is good for me, to read that is not a silicon failure. I will make more tests in the next day and hope that I can reproduce this hardfault exception. At the moment I dosn't understand why it is not available yet. The source code itself is not changed.

@Tesla DeLorean, yes it is the STM32G030F6. If htis a volatile failure, than come this from the LL driver from ST. I use only his definitions and mcros for read/write access into SFR. The stack is not overrun or underrun. I have check it and increase his size for the tests.

Alignment problem? OK, I know that the stack have 64bit alligment address by the M0 and M0+. All other variables are placed from the gcc in the address range. I have packet structures, but the size of the complete structure is 64bit and I dosn't use pointers into this structure. But I will check the access to this structures.

I give in the next days a feedback, what I have found out.

Tesla DeLorean · ‎2024-07-29

Pointer stuff usually occurs as a result of them being passed as char* or void* and being mapped into uint32_t* or double*

Alignments in files, and byte array, unpacked structures can result in issues, say packed structure expectations of external peripherals.

In the libraries perhaps handling of 16-bit values in SPI or others.

Just saying CM0(+) is more prone to this with LDRD/STRD or LDM,STM optimizations / folding.

LR in Handlers can in fact point to an internal "call-gate" type implementation that's going to un-stack context upon return.

https://github.com/cturvey/RandomNinjaChef/blob/main/KeilHardFault.c

Watch for unhandled interrupts

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

bm2 · ‎2024-07-30

now I have found out why is the hardfault exeption not available. The application is controlled via SPI and goes during the SPI is inactive in the Low Power Sleep Mode.

The tester have changed the test application. With the new test application can't reproduce the hardfault exception with the older version can reproduce it.

What is different between booth: the old version sends 7 information in single SPI frames and the new use only one SPI frame.

Now I have read out the information from the hardfault handle:

LR = 0xFFFF FFF9 (use MSP)
SP       = MSP 0x2000 1FA8
R0       = 0x0000 0000
R1       = 0xFFFF FFF9
R2       = 0xFFF6 0810
R3       = 0xFFFF BFFF
R12 (IP) = 0x4000 7000
R14 (LR) = 0x0000 0380
R15 (PC) = 0x0800 09F5
xPSR     = 0x2000 05C3

OK, the R15 (PC) is not correct. Why? At the moment no idea.

bm2 · ‎2024-07-30

Now, I have show what is at this address for a function with the dissassamble:

But the callstack from the MCU show me a other address:

I dosn't understand it at the moment, why is a 32bit ASM command is broken inside.

I use for the main loop only RAM functions, to becomes the needed performance. But the function Diag_IsActive() is placed into the flash. If this a problem?

bm2 · ‎2024-07-31

Now I have add more checks and used more tests and now I can reproduce the hardfault exception without optimisation. So I can say that is not a problem from the optimisation, I hope.

The status register CFSR and HFSR are cleared by the entry of the hardfault exception handle:

LR = 0xFFFF FFF9 (use MSP)
SP       = MSP 0x2000 1E18
R0       = 0x2000 1E18
R1       = 0x4000 2800
R2       = 0xFFFF FFF9
R3       = 0x2000 1E18
R12 (IP) = 0x2000 1E50
R14 (LR) = 0xFFFF FFF9
R15 (PC) = 0x2000 1C08
xPSR     = 0x0000 000F

What I see is, that R15 (PC) points to the start of a global variable and R12 (IP) points into the stack. The xternal trigger is the change of a digital input and configuration the GPA2 to LSCO output.

But this exception is not every time triggered.

The bus fault, alignment fault and so can removed, the CFSR and HFSR are 0 by every hardfauld exception.

Have every one a idee where can come this problem?

Thanks, Bernd

bm2 · ‎2024-08-28

after long time of search and tests I found the problem. the developer have enabled the low power regulator by 64MHz SYSCLK :(