cancel
Showing results for 
Search instead for 
Did you mean: 

HardFault on Cold Boot Only - STM32H7S78 with External XSPI Flash + TouchGFX

nico23
Senior III

I'm experiencing a HardFault that occurs always on cold boot (after the board sits unpowered for ~1 hour). It also occurs sometimes when flashing and immediately running from the debugger.

I'm using FreeRTOS, and I've noticed that commenting a task, flashing the new firmware, no hard faults. De-commenting the same code, flashing again, problem solved (with debugger on).

I exit the debugger and try to power cycle the device, all works.

After the board sits unpowered for a certain time and I try to turn it on, Hard Fault again.

The issue is always the same CFSR.IMPRECISERR (imprecise data access error)

I managed to have a call stack and the assembly code

CFSR.IMPRECISERR (imprecise data access error)
PC = 0x70017DD2
Stack trace:
HardFault_Handler
<Exception frame>
touchgfx::LCDGPU2D_AXI::blitCopyCompressedRGB565_16BPP() + 0x69d
0x7001'7dca: LDRGT     R2, [SP, #0x1c]         ; Load scratchBuffer address
0x7001'7dcc: STRHGT.W  R10, [R2, R5, LSL #1]   ; Store halfword
0x7001'7dd0: ADDGT     R5, R5, #1
0x7001'7dd2: SUBGT     R7, R7, #1              ; <-- Fault occurs here

MPU configuration is

// Region 1: XSPI2 Flash (code execution)
MPU_InitStruct.BaseAddress = 0x70000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_128MB;
MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;

// Region 3: External RAM (framebuffer)
MPU_InitStruct.BaseAddress = 0x90000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_32MB;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

// Region 6: RAM_CMD (uncached GPU buffers)
MPU_InitStruct.BaseAddress = 0x24068000;
MPU_InitStruct.Size = MPU_REGION_SIZE_32KB;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

I'm monitoring the stack usage of the various tasks and the free heap, and all are below <50% of usage.

Attached the .icf file.

I've really no idea how to debug it; it happens sometimes, and it is not perfectly reproducible.

7 REPLIES 7
nico23
Senior III

After some testing, it seems there's an issue on the first boot, mostly with memory corruption or wrong memory read/write, but I can't understand what's actually firing it.

I've found out that, every time I try to attach to the running target and reset the chip to the main() in the application, 99% of the time it starts executing properly, without HardFault.

After attaching to the target, I'm seeing a BusFault is triggered

A precise data access error has occurred (CFSR.PRECISERR, BFAR)
At data address 0xfc7ff7c0.

Exception occurred at PC = 0xcdcdcdcd, LR = 0xcdcdcdcd

I think the 0xcdcdcdcd is a magic number indicating a memory access error or something, is that right?

For istance, in another istance, same code, I'm seeing always a BusFault

An imprecise data access error has occurred (CFSR.IMPRECISERR, BFAR).
Exception occurred at PC = 0x70017c1c, LR = 0xe

with stack

BusFault_Handler
<Exception frame>
[_ZN8touchgfx12LCDGPU2D_AXI30blitCopyCompressedRGB565_16BPPEPKhRKNS_4RectES5_h + 0x4e7]
[_ZN8touchgfx12LCDGPU2D_AXI21blitCopyCompressedRGBEPKhNS_6Bitmap12BitmapFormatERKNS_4RectES7_h + 0x1d]
[_ZN8touchgfx12LCDGPU2D_AXI17drawPartialBitmapERKNS_6BitmapEssRKNS_4RectEhb + 0xb9]
[_ZNK8touchgfx5Image4drawERKNS_4RectE + 0x8d]
[??JSMOC_3 + 0x5b]
[??JSMOC_4 + 0x2d]
[??JSMOC_9 + 0x5]
[??JSMOC_4 + 0x2d]
[??JSMOC_4 + 0x2d]
[_ZN8touchgfx6Screen9startSMOCERKNS_4RectE + 0x4f]
[_ZN8touchgfx6Screen4drawERNS_4RectE + 0x31]
[_ZN8touchgfx11Application4drawERNS_4RectE + 0x29]
[??drawCachedAreas_72 + 0x2b]
[??tick_6 + 0x13]
touchgfx::HAL::backPorchExited()
[??taskEntry_0 + 0x13]
touchgfx_taskEntry()
TouchGFX_Task
prvTaskExitError

and disassembly

  0x7001'7bee: 0x4588         CMP       R8, R1
  0x7001'7bf0: 0xd3fb         BCC.N     0x7001'7bea
  0x7001'7bf2: 0x930a         STR       R3, [SP, #0x28]
  0x7001'7bf4: 0x920c         STR       R2, [SP, #0x30]
  0x7001'7bf6: 0x2500         MOVS      R5, #0
  0x7001'7bf8: 0xf89d 0x6070  LDRB.W    R6, [SP, #0x70]
  0x7001'7bfc: 0x900b         STR       R0, [SP, #0x2c]
  0x7001'7bfe: 0xf8dd 0xb01c  LDR.W     R11, [SP, #0x1c]
  0x7001'7c02: 0x9c06         LDR       R4, [SP, #0x18]
  0x7001'7c04: 0xe009         B.N       0x7001'7c1a
  0x7001'7c06: 0x9805         LDR       R0, [SP, #0x14]
  0x7001'7c08: 0xf001 0x013f  AND.W     R1, R1, #63             ; 0x3f
  0x7001'7c0c: 0x2eff         CMP       R6, #255                ; 0xff
  0x7001'7c0e: 0xf830 0xa011  LDRH.W    R10, [R0, R1, LSL #1]
  0x7001'7c12: 0xd145         BNE.N     0x7001'7ca0
  0x7001'7c14: 0xf828 0xab02  STRH.W    R10, [R8], #0x2
  0x7001'7c18: 0x1e64         SUBS      R4, R4, #1
  0x7001'7c1a: 0x2c00         CMP       R4, #0
  0x7001'7c1c: 0xf340 0x80dc  BLE.W     0x7001'7dd8
  0x7001'7c20: 0xf819 0x1b01  LDRB.W    R1, [R9], #0x1
  0x7001'7c24: 0x0988         LSRS      R0, R1, #6
  0x7001'7c26: 0xd0ee         BEQ.N     0x7001'7c06
  0x7001'7c28: 0x2801         CMP       R0, #1
  0x7001'7c2a: 0xd13b         BNE.N     0x7001'7ca4
  0x7001'7c2c: 0xf64f 0x70e0  MOVW      R0, #65504              ; 0xffe0
  0x7001'7c30: 0xea00 0x000a  AND.W     R0, R0, R10
  0x7001'7c34: 0xf001 0x0203  AND.W     R2, R1, #3
  0x7001'7c38: 0x4492         ADD       R10, R10, R2
  0x7001'7c3a: 0xf1aa 0x0302  SUB.W     R3, R10, #2
  0x7001'7c3e: 0xf3c1 0x0281  UBFX      R2, R1, #2, #2
  0x7001'7c42: 0xf003 0x031f  AND.W     R3, R3, #31             ; 0x1f
  0x7001'7c46: 0x4318         ORRS      R0, R0, R3
  0x7001'7c48: 0xeb02 0x1250  ADD.W     R2, R2, R0, LSR #5
  0x7001'7c4c: 0xf64f 0x0a1f  MOVW      R10, #63519             ; 0xf81f
  0x7001'7c50: 0xea0a 0x0a00  AND.W     R10, R10, R0
  0x7001'7c54: 0x1e90         SUBS      R0, R2, #2
  0x7001'7c56: 0x0140         LSLS      R0, R0, #5

I'm noticing that, most of the time, the hard fault happens because the instruction [_ZN8touchgfx12LCDGPU2D_AXI30blitCopyCompressedRGB565_16BPPEPKhRKNS_4RectES5_h + 0x4e7]

The issue seems to occur only on a cold boot, so debugging it is even harder.

nico23
Senior III

Update

Commenting out the task running TouchGFX, the issue no longer appears so I'm guessing there's something in TouchGFX that, loading the first view or even initializing itself create some memory issue

 

Can you check in your map file what is at address 0xfc7ff7c0?

Also check the address of the two scratch buffers - they should be 16-byte aligned.

There's nothing at 0xfc7ff7c0; it is not a valid address


@mathiasmarkussen wrote:

Also check the address of the two scratch buffers - they should be 16-byte aligned.


Where can I see it?

Hamady
Senior

@nico23 I also got pretty similar scenario where the board doesn't always work at first boot i have to rely on the WWDG or press reset to make things work. 

And it is hard to debug because it doesn't appear if i use debbuger software.

nico23
Senior III

Hi @Hamady , 

The strange thing is that from the call stack and errors, it seems an issue inside the TouchGFX library, as, during the initialization process, everything returns HAL_OK and starts correctly (I2C, external flash, DMA2D, etc), but when it comes to load and decompress images, it crash.

The flash seems ok as the J-Link can easily erase, read, and write the whole flash with no issue.

Do you have a custom-made board?

Hamady
Senior

Hi @nico23 

I have a custom made board that share the same PSRAM and FLASH as the DK board + SMPS supply for H7.

In terms of hardware nothing seems to fail.


Thanks