STM32f303 Imprecise data access error (debugging hard faults)

ccvelandres · ‎2019-07-24

I am encountering a consistent hard fault (CFSR.IMPRECISERR with HFSR.FORCED bit set) on my program with FreeRTOS v10. I can fully reproduce the fault everytime and same conditions so I'm sure this can be fixed. Just need to know what causes it.

Specifically on HAL_SPI_TRANSMIT_DMA (stm32f3xx_hal_spi.c line 1571).

if ((hspi->Init.DataSize <= SPI_DATASIZE_8BIT) && (hspi->hdmatx->Init.MemDataAlignment == DMA_MDATAALIGN_HALFWORD))

On reading some other articles, It seems like the stack frame is not related to the cause of stack frame since the write was buffered. Then from the documentation for ARM Cortex M4, there's the Auxiliary Control Register with the DISDEFWBUF when set to 1, will diable the write buffering and turn imprecise faults to precise faults. But I can't find the register on my device.

Question is, how can I find the cause of the fault without being able to disable the write buffering? Or does anybody have any idea what the problem is?

Ozone · ‎2019-07-25

If you think the variable (struct member) gets overwritten elsewhere, set a data breakpoint on write access and try debugging.

View solution in original post

ccvelandres · ‎2019-07-25

Okay, I found the register. Its at 0xE000 E008. But I don't know how to modify the register..

waclawek.jan · ‎2019-07-25

Regardless of whether precise or imprecise, you don't debug faults by relating them to C code - primarily, you have to look at the disassembled code (best the C+disasm mix). If the data access error is imprecise, you'll need to look at a few instructions before the point where the fault occured - but this is usually not hard as it's some of the str instructions.

JW

ccvelandres · ‎2019-07-25

Sorry, this is my first time diagnosing hard faults. Also, I'm not knowledgeable on assembly instructions yet.

I turned off the write buffering, and now it's precise fault and the BFARVALID bit is set. BFAR -> 0x00000004

On disassembly, just before the <Exception Frame> the program stopped on 0x800f1f6

CLEAR_BIT(hspi->Instance->CR2, SPI_CR2_LDMATX)
0x800f1f6:  0x6820                LDR                 R0, [R4]     << highlighted
0x800f1f8:  0x6840                LDR                 R0, [R0, #0x4]
0x800f1fa:  0xf430  0x4080  BICS.W           R0, R0, #16384
0x800f1fe:  0x6821                LDR                 R1, [R4]
0x800f200:  0x6048               STR                 R0, [R1, #0x4]

CPU Registers	Value	Access	
R0	0x0801998D	ReadWrite	
R1	0	ReadWrite	
R2	2	ReadWrite	
R3	0x08006349	ReadWrite	
R4	0x2000DB34	ReadWrite	
R5	0x00000000	ReadWrite	
R6	0x2000E3D0	ReadWrite	
R7	0x0000000C	ReadWrite	
R8	0x20000078	ReadWrite	
R9	0x20008B78	ReadWrite	
R10	0x200073EC	ReadWrite	
R11	0xAE2E063D	ReadWrite	
R12	0x000005E2	ReadWrite	
xPSR	0x01000000	ReadWrite	
APSR	0x00000000	ReadWrite	
IPSR	0x00000000	ReadWrite	
EPSR	0x01000000	ReadWrite	
PC	0x0800F200	ReadWrite	
SP_main	0x2000E830	ReadWrite	
SP_process	0x20003CA8	ReadWrite	
LR	0x0800D1F5	ReadWrite	
PRIMASK	0x00000000	ReadWrite	
BASEPRI	0x00000000	ReadWrite	
BASEPRI_MAX	0x00000000	ReadWrite	
FAULTMASK	0x00000000	ReadWrite	
CONTROL	0x00000002	ReadWrite	
IAPSR	0x00000000	ReadWrite	
EAPSR	0x01000000	ReadWrite	
IEPSR	0x01000000	ReadWrite

The contents of 0x2000DB34 is just zeros.

0x2000db30         00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Tesla DeLorean · ‎2019-07-25

The registers seem to be describing a different address look just before that.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Danish1 · ‎2019-07-25

If it were me, I'd look at all the pointers you're dereferencing.

I think that hspi contains a credible value.

But I suspect that either hspi->Instance or hspi->hdmatx is zero

Hope this helps,

Danish

waclawek.jan · ‎2019-07-25

As Clive said, you have to look at instructions *before* the highlighed one (debuggers usually show what's in PC, and that already points to the next instruction).

Also, you want even more instructions before that, as you want to understand what leads to the actual faulting instruction (and that one will be a STR, as I've said above).

JW

Tesla DeLorean · ‎2019-07-25

The disassembly doesn't even relate to the reported fault address, as best I can decipher it.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

ccvelandres · ‎2019-07-25

I managed to catch the time when the fault was just about to occur and I found the cause. Danish was right. hspi->Instance was zero everytime the fault occurs. And as JW pointed out, the cause was a STR instruction.

The address 0x2000DB34 contains the struct hspi1 and hspi->Instance is at 0x2000DB34. The normal content of hspi->Instance is 0x04100300.

What happens is:

0x800f1fe: 0x6821 LDR R1, [R4] -------------> (R4 points to 0x2000DB34 which is zero so zero is loaded to R1)

0x800f200: 0x6048 STR R0, [R1, #0x4] ------> (since R1 is zero, it then tries to store R0 to (R1 + offset 0x4) 0x00000004)

then the hard fault occurs since it tries to store it to the reset vector. Does my understanding sound right?

I think what I need to find next is when the hspi->Instance turns to zero which shouldn't happen. Am I right?

Ozone · ‎2019-07-25

If you think the variable (struct member) gets overwritten elsewhere, set a data breakpoint on write access and try debugging.