cancel
Showing results for 
Search instead for 
Did you mean: 

Extremely strange behavior running on an STM32G0B1

JKaz.1
Associate III

Ok, I'm at a complete loss here so I'm looking for any ideas or suggestions. We have a code base that runs on top of FreeRTOS and we have versions of the code that run on multiple, different STM32L4s and on the STM32H753. I ported the code base to the G0 and just started to run into weird HardFaults. Hard Faults that don't make any sense. Walking back up the stack looks completely normal. Random, what looks like, memory corruption, but it doesn't make any sense. I've run a lot of tests and an absurd amount of diagnostic code... but what I'm posting about here you shouldn't have to know any of that, because it seems like strange compiler errors.

If I execute the following code, the system currently crashes (if I move things around in unrelated parts of flash/ram, then this code will suddenly start to work without issue).

std::string StringOp::sizeToString(size_t sz)
{
	std::string sout("");
	do
	{
		//__asm volatile("NOP");
		char digit = (char)(sz % 10) + '0';
		sout = digit + sout;
		sz /= 10;
	} while (sz);
	return sout;
}

When I debug it and look at it in the assembly, the relevant part looks like.

do
	{
		//__asm volatile("NOP");
		char digit = (char)(sz % 10) + '0';
0x0802e2c0  ldr r3, [r7, #0] 
0x0802e2c2  movs r1, #10 
0x0802e2c4  movs r0, r3 
0x0802e2c6  bl 0x80004d8 <__aeabi_uidivmod> 
0x0802e2ca  movs r3, r1 
0x0802e2cc  uxtb r2, r3 
0x0802e2ce  movs r1, #39	; 0x27 
0x0802e2d0  adds r3, r7, r1 
0x0802e2d2  adds r2, #48	; 0x30 
0x0802e2d4  strb r2, [r3, #0] 
		sout = digit + sout;

See that branch command? When I step into that I don't step to 0x8004d8. I step into 0x080012c8, which is not the right address, so then I crash when I try to return and it pops too many things off the stack.

See that NOP assembly command that's commented out? If I uncomment that, it simply hard faults when it tries to jump to 0x8004d8. If I instead put a NOP at the very beginning of the function, it executes the entire thing just fine.

This code is running in a task and is not in an ISR. This exact same code base runs fine on two other STM32 processors from different families (well, like 90% the same, hardware layer is different but all this higher level stuff is the exact same code).

Can anyone think of anything that would cause the processor to go off into the weeds like this? The placement of NOP commands affects whether or not it functions properly, and I'm at a loss for words on what that could be. This is on custom hardware so could something there be causing an issue? Bad oscillator signal/capacitance/resistance? Unstable 3.3V rail? I am going to port this to a Nucleo board later this week but I was wondering if anyone could think of something I should check.

Also, I am compiling it using GCC 7.3 but I updated to the latest 10 and it still broke. I built the hardware init files using the latest version of CubeMX (6.4) and have compared my initialization code against the examples in the hardware framework that I'm using (the latest, 1.5)... Anyone got any ideas on where I should poke?

Thanks!

13 REPLIES 13

I checked the errata sheet that I downloaded while investigating this issue and it does not have sections 2.2.9 and 2.2.10. Interestingly enough, we've had some weird other problems that can absolutely be described by 2.2.9... Super.

@SHuds.2​ ,

thanks for letting us know. This is a nice catch and serious issue indeed.

@JKaz.1​ ,

> I checked the errata sheet that I downloaded while investigating this issue and it does not have sections 2.2.9 and 2.2.10.

No wonder, these are dated 11-Apr-2022 in the Errata changelog.

A bizarre disclaimer ("important security notice") has been added, too, apparently there still are users (by which I mean corporate, influential enough so their rambling is propagated into errata) who believe in absolute powers of the lockbits.

Similarly bizarre is 2.2.3, in light of the fact that due to FLASH having ECC, programming any value (i.e. not just zeros) to a field already programmed to all FF (or any other value) quite likely leads to programming error. Why can't ST pay more attention when writing errata, I wonder. EDIT I take this back, see Piranha's explanation below.

JW

> programming any value (i.e. not just zeros) to a field already programmed

Not really... RM0444:

Programming a previously programmed address with a non-zero data is not allowed. Any

such attempt sets PROGERR flag of the FLASH status register (FLASH_SR). ... In standard programming: PROGERR is set if the word to write is not previously erased

(except if the value to program is full zero).

Full zero is an exception to make invalidation possible.

Oh, I m​issed that detail. Thanks.

JW​