Carving out HardFault error storage location in high RAM via linker script

CharlieMangold · ‎2024-07-16

STM32CubeIDE Version: 1.12 & STM32G0B0RE

We have had a HardFault error in the field and wanted to use an old technique of storing the info in RAM until after the micro is reset and more stable and then move it into flash. We used the STM32G0B0RETX_FLASH.ld file and modified two spots:

/* Memories definition */
MEMORY
{
  RAM    (xrw)    	: ORIGIN = 0x20000000, LENGTH = 143K
  RAM_FOR_FAULT (rw)	: ORIGIN = 0x20023C00, LENGTH = 1K
  FLASH   (rx)    	: ORIGIN = 0x08000000, LENGTH = 502K
...
}
and
  .faultData :
{
	. = ALIGN(4);
	__faultData_start__ = .;
	*faultData.o (.text .text*)
	__faultData_end__ = .;
	. = ALIGN(4); */
} > RAM_FOR_FAULT /* new RAM for hard faults */

ASSERT( LENGTH(RAM_FOR_FAULT) >= (__faultData_end__ - __faultData_start__), "faultData overflowed !")

Cube's IDE show the sections seems to work:

But once the debugger is loaded there seems to be a lot of random data around the 0x20033C00 memory spot:

Could the stack still be using the chips high RAM address or is there something else we need to do to isolate the highest 1K of RAM?

Thanks...

BarryWhit · ‎2024-07-16

Looks pretty random to me.

from collections import Counter
# OCR, don't freak out
a=[
    0xB550612A,0x5E9AC690,0x93284EAF,0xFEC34A5F,0x8D2BB763,0xFAC24870,0x5D0C675A,
    0x00AA07C7,0xAF3F2C4E,0x5067406A,0x84044680,0x1EF6E051,0xC525DD9A,0x7743A756,
    0x7E68F5D8,0x868F4586,0xD0A0E473,0x7D7AA7EB,0x1F9B6691,0xF0BC8E64,0xF5D5C5FA,
    0x26544CA9,0xBF4F3BF4,0x1BEED0CB,0x363633C8,0xC70C7328,0xC59348C6,0x08044928,
    0xFE06A9A0,0x1D002E12,0xDFA27AB8,0xD8FCA8FE,0x1ABCDA35,0x4A01646F,0x1F93779A,
    0x2508FD6E,0xD35F459B,0x22FE6406,0x0706E69B,0x8AC1E2DC,0xEB1657FD,0x8B9092FF,
    0x83B16520,0x136D0C38,0xEC2E9424,0xDEB91440,0xAC1479C8,0x14892E34,0x96A26F7A,
    0x3F0B0F64,0xCF1E4A4E,0xE179DA6D,0x6E3861D6,0x9A259BA1,0xFF384631,0x95659FCA,
]

Counter("".join(f"{_:032b}" for _ in z))

Counter({'1': 961, '0': 959})

from random import randint
s="\n".join(
    [" ".join(
        [f"{randint(0,2**32-1):8X}" for _ in range(4)] )
        for _ in range(16)])

print(s)
"""
7B067179 5D6F67C9 2B9DCFE2 3DC9B2F1
D354945D 1A5C53F9 EEADB970  B26F90A
F787EFB1 66D59FEA  8E71CCB 8F7AE442
31251D7C F42A339A B7334FF8 FA7B0CAD
C57A6626 41454269 760D112B 6BE9E5D7
823D61C3 FEABA3DB 95C90F0D 8229EAB3
89B9E8C4 658652A4 2B1B57E2 425910D7
A0075270 65E06E5E 5A754ABF 40EA9DD2
350FEFB1 4DB35048 549758EA 2E27D527
A13BD722  3B5CD16 385330F3 869A9F00
 B3A03CC A5A1691A C9701DE3 29E141BC
B3D24B42 3FB9E2B0 3EFDB25E 4A221D5E
6BF7FDE1 8D4D5A4A ACEE6C8F 95A87A4F
9114A141 AD0975D6 3C194936 155B1886
F581571B 6C5D5B57 9AEADDCB  4FDA2F1
FC884C11 C467CE3A 8D52DD09 D27766B9
"""

I encourage you to pull the power, wait for for 5 minutes (and wave your heat-gun on low over the chip) then repeat the test - because I'm not sure what we should expect to see and I'm curious to know. But this looks very much like random data.

Update: follow up to DeLorean's suggestion below - the SRAM cells might exhibit preferential (value-dependent) bit decay behavior. Try filling with all-zeros/all-ones, and repeat the power cycle test. See if a pattern emerges.

Why are you looking so closely at conditions you explicitly don't care about?

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

Tesla DeLorean · ‎2024-07-16

Define "Random" in this context.. looks like random non-specific junk to me.

How about, if you don't see your Hard Fault Magic Tag at reset, you explicitly clear this 1KB of RAM with something specific, like 0x00 or 0xCD or something.

Then next time you reset, post Hard Fault, you check/dump the content, and perhaps have your Hard Fault journal through this RAM space, so you can hold/see multiple frames.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

CharlieMangold · ‎2024-07-17

Why are you looking so closely at conditions you explicitly don't care about?

I just trying to do my do-diligence to insure I have not messed anything up(the linker files I've not dealt with RAM settings before.) The RAM data could be random but I honestly would be surprised at the distribution of one's and zero's. I kinda expected more clumps of one or the other(0x000 or 0xFFF) but the last processors I've looked at this closely at was 15 years ago.

Pavel A. · ‎2024-07-17

Just make sure the memory is correctly defined in the link script (and your code does not clobber it, and there is no activation of the built-in bootloader etc.). Test.

The SRAM content should survive MCU reset, while the power is maintained. If this MCU has a 2nd SRAM module or "backup memory" try these too.

BarryWhit · ‎2024-07-17

> I kinda expected more clumps of one or the other(0x000 or 0xFFF)

I know that's true for DRAM (given enough time), but I don't know if it's true for SRAM, which uses a different kind of basic cell to hold a bit. The pattern of charge decay in SRAM may be different. That's why I suggested you do the extra tests I described. They would have told you whether memory remnance effects and non-deterministic decay times explain what you're seeing.

If you're talking about what true random data actually looks like, humans are notoriously bad at evaluating that. That's why human-generated "random" data usually fails statistics tests.

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

CharlieMangold · ‎2024-07-17

You are probably correct. Long power cycles still result in the same type of data in RAM even after it had been cleared. It all seems to work after software resetting the processor from a induced illegal address write including the simple checksum verification. If anyone wants the code let me know and I'll post the two functions needed(one in the HardFault handler and one in main.

After digging around the HardFault so much I'm going to put a quick stack fill to trace how much the current code base is using.

Pavel A. · ‎2024-07-18

If we're talking about a simple MCU reset, due to watchdog or NVIC_SystemReset - there's no SRAM decay at all. The power (VCC) stays on, internal SRAMs hold their data. External (S)DRAMs can decay because reset of the FMC can affect their refresh.

BarryWhit · ‎2024-07-18

OP was asking about power cycle (SRAM decays) vs. reset (no decay)

- If someone's post helped resolve your issue, please thank them by clicking "Accept as Solution".
- Please post an update with details once you've solved your issue. Your experience may help others.

CharlieMangold · ‎2024-07-19

@BarryWhit wrote:
OP was asking about power cycle (SRAM decays) vs. reset (no decay)

I'm just not very familiar with the newer aspects of processor booting(security, built in serial capabilities...) This forum has helped a lot flushing out our issues and needed clarifications.