Why am I getting a HardFault with -O2 optimization only?

SKled.1 · ‎2021-02-24

In my STM32F091 project, the Debug build works all just fine.

Release build does not.

So I kept -O2 to keep the behavior, but added -g3 to be able to debug.

Fixed a couple typical bugs like a forgotten volatile on an ISR-accessed flag etc.

Now I get a HARD FAULT upon, if I read this right (my asm is rusty and I never actively did ARM asm),

the preparations towards a function call. Within function 'ReadBytes', another function 'Read' is called.

It does crash if, in the debugger, I want to step over it.

In instruction stepping mode, it's the line at address 0800e880 which causes the hard fault.

I do not understand why the asm code is doing some of the things it does (see comments there).

It looks like it is doing a wrong memory access.

But why? Assuming the compiler did nothing wrong - what could I have done wrong?

With -O1, the asm code is somewhat different, and the crash does not happen.

(and at first glance, the program seems to run normally overall)

EDIT: The argument values going into the outer function, ReadBytes, are the same in both, the -O2 and the -O1 build (seen in debugger).

Does someone have an idea what's going on?

// code with -O2
//
// Breakpoint here @ line 173
//
173       	const auto ret = m_i2c.Read( m_addr, memoryWordAddr, dest, count );
          SerialEEPROM::ReadBytes(unsigned int, unsigned char*, unsigned int, unsigned int):
0800e870:   push    {r4, r5, lr}
0800e872:   movs    r4, r0
0800e874:   sub     sp, #12
0800e876:   movs    r3, r2         // r0 == 0 here, which is ok, it's EEPROM-internal address (see C++ func below, arg 'memoryWordAddr')
0800e878:   ldr     r0, [r0, #0]   // No idea why r0 is used as pointer here? And it's null...
0800e87a:   uxtb    r2, r1         // r0 == 0x20008000 here, i.e. END OF RAM (32KB)
0800e87c:   ldrb    r1, [r4, #4]
0800e87e:   ldr     r4, [sp, #24]
0800e880:   ldr     r5, [r0, #0]   // HARD FAULT. Did not like access after end of RAM, I guess.
0800e882:   uxtb    r4, r4
0800e884:   str     r4, [sp, #0]
0800e886:   ldr     r4, [r5, #4]
0800e888:   blx     r4             // so here it should jump
174       	if (ret != I2cResult::Ok)
0800e88a:   subs    r3, r0, #1
0800e88c:   sbcs    r0, r3
0800e88e:   negs    r0, r0
0800e890:   add     sp, #12
0800e892:   pop     {r4, r5, pc}

// code with -O1: there actually are some instructions before the breakpoint - with -O2 this was not so (as belonging to func ReadBytes()).
//
169       {
          SerialEEPROM::ReadBytes(unsigned int, unsigned char*, unsigned int, unsigned int):
0800e6e6:   push    {r4, r5, lr}
0800e6e8:   sub     sp, #12
0800e6ea:   movs    r4, r0
0800e6ec:   movs    r3, r2
//
// Breakpoint here @ line 173
//
173       	const auto ret = m_i2c.Read( m_addr, memoryWordAddr, dest, count );                                    
0800e6ee:   ldr     r0, [r0, #0]  // before this, r0 == 0x200009b8 -- not 0 like with -O2 above, hm?!
0800e6f0:   uxtb    r2, r1        // now r0 == 0x200008e8
0800e6f2:   ldrb    r1, [r4, #4]
0800e6f4:   ldr     r5, [r0, #0]  // hunky dory
0800e6f6:   ldr     r4, [sp, #24]
0800e6f8:   uxtb    r4, r4
0800e6fa:   str     r4, [sp, #0]
0800e6fc:   ldr     r4, [r5, #4]
0800e6fe:   blx     r4
174       	if (ret != I2cResult::Ok)
0800e700:   subs    r3, r0, #1
0800e702:   sbcs    r0, r3
0800e704:   negs    r0, r0
0800e706:   add     sp, #12
0800e708:   pop     {r4, r5, pc}

int SerialEEPROM::ReadBytes(uint_fast8_t memoryWordAddr, uint8_t* dest, unsigned destLen, unsigned count)
{
	ASSERT( memoryWordAddr + count <= TotalSizeBytes );
	ASSERT( count <= destLen );
 
// Breakpoint here
	const auto ret = m_i2c.Read( m_addr, memoryWordAddr, dest, count );
	if (ret != I2cResult::Ok)
		return -1;
 
	return 0;
}

KnarfB · ‎2021-02-24

Not a solution, but some thoughts.

When calling a member function, the this pointer is implicitly added as the first parameter to the parameter list. The first 4 parameters are passed in r0..r3.

Remaining parameters are passed on stack.

So r0 should be the this pointer which should never be 0.

When r0 is 0, ldr r0, [r0, #0] reads from address 0 which is the initial stack pointer used during reset. And, yes, that is at the top end of RAM.

blx r4 is the method call. the lines before pack the parameters to the register where they belong.

If its not a compiler bug (unlikely) It smells like inconsistent linkage or inconsistent use of virtual or such stuff. Compare .cpp and .h files carefully.

hth

KnarfB

SKled.1 · ‎2021-02-25

Right! I forgot about "this". As for stack pointer, but should it really point to one-behind-last of RAM, not the actual last word of RAM? (I don't know it, just seems funny).

Thanks, that's definitely some food to continue the bughunt.

Could you elaborate on "inconsistent linkage"?

There is some mixing of C and C++ - can it have to do with that? Although not as preparation for this function call; and no C code calls this particular C++ function.

KnarfB · ‎2021-02-25

> As for stack pointer

ARM calls this full-descending stack, i.e. pre-decrement on push, post-increment on pop. So the initial value points one behind indeed. The value is set in g_pfnVectors in the startup .s file.

> Could you elaborate on "inconsistent linkage"?

I mean some warning you get while linking because caller/caller have different ideas about how the parameters should be passed in the registers.

Feel free posting a stripped-down but more complete code snippet showing the classes, extern "C" etc..

hth

KnarfB

SKled.1 · ‎2021-02-25

Thanks, your hint about virtual was a good one.

In the code, you see a call to m_i2c.Read(...).

That m_i2c thing is a reference to a base class type, one derived class implements the HW i2c stuff on this MCU.

That base class had a protected virtual destructor, violating Guideline #4 here: http://www.gotw.ca/publications/mill18.htm

I made it public, turned optimization back to -O2, and now the program runs normally.

I don't understand how this was a problem here, as no destructor was being called anywhere, no object leaving scope or anything.

But I guess some optimization make assumptions about the layout of things when the language is being used as it should, and it goes wrong...?

It's still a bit surprizing that something like that makes things actually crash, though.

While I'm not exactly expert on all the language implementation and compiler guts details, I have been programming for a while..., yet never so far managed to produce an error like that 😉 (though I likely never made a protected virtual d'tor before, no idea how that slipped in that region)

Edit:

I tried to replicate this in a minimal program that recreates what I thought was the essential components for this.

Not able to reproduce a crash within the minimal program and -O2.

SKled.1 · ‎2021-02-26

I guess that was too easy.

I added a few lines of printing something to the uart, no other changes - and ~~with any optimization on, even O1~~:

Correction: With C++ on -O1 and C on -O2 still this time. When C is also on -O1, the program works normally again. Interesting.

Addendum: When C++ is on -O2, but C is on -O1, it also works.

The thing HardFaults again, this time shortly after __libc_init_array, in __do_global_dtors_aux.

So it's when the app starts, there isn't anything going on yet with my added code.

I wonder why it calls destructors (?) when the app is just starting.

The added code is rather C-ish than C++, function calls on string literals. Nothing to do with any C++ objects.

In debug build, it works without issues.

I'm confused ^^. Looking into it...

Edit:

I have a suspiction that this running into a __libc_init_array and crashing there is not what really happens. I observed that my 3 status LEDs were going on, and indeed, only those 3 bits in that GPIO register were set, so it seems deliberate, so the LED handling stuff seems to have been called once at least.

It seems the debuggers is showing nonsense, for some reason.

If I make a little change in the program, I get a much more reasonable call stack, with a chain of function calls through all my init stuff to the application loop, where the hardfault happens because a function pointer in an array of a struct type with one function pointer, among other members, was zero - interestingly only the 2 array cells where the func pointer was assigned with a lambda, it was zero, those which were assigned local functions were fine. If I change those lambda inits to local functions also, the weird behavior with the suspicious/nonsensical call stack comes up again.

No win so far.