2019-07-24 11:50 PM
I am encountering a consistent hard fault (CFSR.IMPRECISERR with HFSR.FORCED bit set) on my program with FreeRTOS v10. I can fully reproduce the fault everytime and same conditions so I'm sure this can be fixed. Just need to know what causes it.
Specifically on HAL_SPI_TRANSMIT_DMA (stm32f3xx_hal_spi.c line 1571).
if ((hspi->Init.DataSize <= SPI_DATASIZE_8BIT) && (hspi->hdmatx->Init.MemDataAlignment == DMA_MDATAALIGN_HALFWORD))
On reading some other articles, It seems like the stack frame is not related to the cause of stack frame since the write was buffered. Then from the documentation for ARM Cortex M4, there's the Auxiliary Control Register with the DISDEFWBUF when set to 1, will diable the write buffering and turn imprecise faults to precise faults. But I can't find the register on my device.
Question is, how can I find the cause of the fault without being able to disable the write buffering? Or does anybody have any idea what the problem is?
Solved! Go to Solution.
2019-07-25 10:18 PM
If you think the variable (struct member) gets overwritten elsewhere, set a data breakpoint on write access and try debugging.
2019-07-25 12:08 AM
Okay, I found the register. Its at 0xE000 E008. But I don't know how to modify the register..
2019-07-25 12:12 AM
Regardless of whether precise or imprecise, you don't debug faults by relating them to C code - primarily, you have to look at the disassembled code (best the C+disasm mix). If the data access error is imprecise, you'll need to look at a few instructions before the point where the fault occured - but this is usually not hard as it's some of the str instructions.
JW
2019-07-25 12:53 AM
Sorry, this is my first time diagnosing hard faults. Also, I'm not knowledgeable on assembly instructions yet.
I turned off the write buffering, and now it's precise fault and the BFARVALID bit is set. BFAR -> 0x00000004
On disassembly, just before the <Exception Frame> the program stopped on 0x800f1f6
CLEAR_BIT(hspi->Instance->CR2, SPI_CR2_LDMATX)
0x800f1f6: 0x6820 LDR R0, [R4] << highlighted
0x800f1f8: 0x6840 LDR R0, [R0, #0x4]
0x800f1fa: 0xf430 0x4080 BICS.W R0, R0, #16384
0x800f1fe: 0x6821 LDR R1, [R4]
0x800f200: 0x6048 STR R0, [R1, #0x4]
CPU Registers Value Access
R0 0x0801998D ReadWrite
R1 0 ReadWrite
R2 2 ReadWrite
R3 0x08006349 ReadWrite
R4 0x2000DB34 ReadWrite
R5 0x00000000 ReadWrite
R6 0x2000E3D0 ReadWrite
R7 0x0000000C ReadWrite
R8 0x20000078 ReadWrite
R9 0x20008B78 ReadWrite
R10 0x200073EC ReadWrite
R11 0xAE2E063D ReadWrite
R12 0x000005E2 ReadWrite
xPSR 0x01000000 ReadWrite
APSR 0x00000000 ReadWrite
IPSR 0x00000000 ReadWrite
EPSR 0x01000000 ReadWrite
PC 0x0800F200 ReadWrite
SP_main 0x2000E830 ReadWrite
SP_process 0x20003CA8 ReadWrite
LR 0x0800D1F5 ReadWrite
PRIMASK 0x00000000 ReadWrite
BASEPRI 0x00000000 ReadWrite
BASEPRI_MAX 0x00000000 ReadWrite
FAULTMASK 0x00000000 ReadWrite
CONTROL 0x00000002 ReadWrite
IAPSR 0x00000000 ReadWrite
EAPSR 0x01000000 ReadWrite
IEPSR 0x01000000 ReadWrite
The contents of 0x2000DB34 is just zeros.
0x2000db30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2019-07-25 01:29 AM
The registers seem to be describing a different address look just before that.
2019-07-25 01:35 AM
If it were me, I'd look at all the pointers you're dereferencing.
I think that hspi contains a credible value.
But I suspect that either hspi->Instance or hspi->hdmatx is zero
Hope this helps,
Danish
2019-07-25 03:47 AM
As Clive said, you have to look at instructions *before* the highlighed one (debuggers usually show what's in PC, and that already points to the next instruction).
Also, you want even more instructions before that, as you want to understand what leads to the actual faulting instruction (and that one will be a STR, as I've said above).
JW
2019-07-25 04:07 AM
The disassembly doesn't even relate to the reported fault address, as best I can decipher it.
2019-07-25 08:27 PM
I managed to catch the time when the fault was just about to occur and I found the cause. Danish was right. hspi->Instance was zero everytime the fault occurs. And as JW pointed out, the cause was a STR instruction.
The address 0x2000DB34 contains the struct hspi1 and hspi->Instance is at 0x2000DB34. The normal content of hspi->Instance is 0x04100300.
What happens is:
0x800f1fe: 0x6821 LDR R1, [R4] -------------> (R4 points to 0x2000DB34 which is zero so zero is loaded to R1)
0x800f200: 0x6048 STR R0, [R1, #0x4] ------> (since R1 is zero, it then tries to store R0 to (R1 + offset 0x4) 0x00000004)
then the hard fault occurs since it tries to store it to the reset vector. Does my understanding sound right?
I think what I need to find next is when the hspi->Instance turns to zero which shouldn't happen. Am I right?
2019-07-25 10:18 PM
If you think the variable (struct member) gets overwritten elsewhere, set a data breakpoint on write access and try debugging.