Associate

Solved

STM32f303 Imprecise data access error (debugging hard faults)

Forum|Forum|6 years ago
July 25, 2019
10 replies
6941 views

I am encountering a consistent hard fault (CFSR.IMPRECISERR with HFSR.FORCED bit set) on my program with FreeRTOS v10. I can fully reproduce the fault everytime and same conditions so I'm sure this can be fixed. Just need to know what causes it.

Specifically on HAL_SPI_TRANSMIT_DMA (stm32f3xx_hal_spi.c line 1571).

if ((hspi->Init.DataSize <= SPI_DATASIZE_8BIT) && (hspi->hdmatx->Init.MemDataAlignment == DMA_MDATAALIGN_HALFWORD))

On reading some other articles, It seems like the stack frame is not related to the cause of stack frame since the write was buffered. Then from the documentation for ARM Cortex M4, there's the Auxiliary Control Register with the DISDEFWBUF when set to 1, will diable the write buffering and turn imprecise faults to precise faults. But I can't find the register on my device.

Question is, how can I find the cause of the fault without being able to disable the write buffering? Or does anybody have any idea what the problem is?

This topic has been closed for replies.

Best answer by Ozone

If you think the variable (struct member) gets overwritten elsewhere, set a data breakpoint on write access and try debugging.

ccvelandresAuthor

Associate

Okay, I found the register. Its at 0xE000 E008. But I don't know how to modify the register..

waclawek.jan

Super User

Regardless of whether precise or imprecise, you don't debug faults by relating them to C code - primarily, you have to look at the disassembled code (best the C+disasm mix). If the data access error is imprecise, you'll need to look at a few instructions before the point where the fault occured - but this is usually not hard as it's some of the str instructions.

JW

ccvelandresAuthor

Associate

Sorry, this is my first time diagnosing hard faults. Also, I'm not knowledgeable on assembly instructions yet.

I turned off the write buffering, and now it's precise fault and the BFARVALID bit is set. BFAR -> 0x00000004

On disassembly, just before the <Exception Frame> the program stopped on 0x800f1f6

CLEAR_BIT(hspi->Instance->CR2, SPI_CR2_LDMATX)
0x800f1f6: 0x6820 LDR R0, [R4] << highlighted
0x800f1f8: 0x6840 LDR R0, [R0, #0x4]
0x800f1fa: 0xf430 0x4080 BICS.W R0, R0, #16384
0x800f1fe: 0x6821 LDR R1, [R4]
0x800f200: 0x6048 STR R0, [R1, #0x4]

CPU Registers	Value	Access	
R0	0x0801998D	ReadWrite	
R1	0	ReadWrite	
R2	2	ReadWrite	
R3	0x08006349	ReadWrite	
R4	0x2000DB34	ReadWrite	
R5	0x00000000	ReadWrite	
R6	0x2000E3D0	ReadWrite	
R7	0x0000000C	ReadWrite	
R8	0x20000078	ReadWrite	
R9	0x20008B78	ReadWrite	
R10	0x200073EC	ReadWrite	
R11	0xAE2E063D	ReadWrite	
R12	0x000005E2	ReadWrite	
xPSR	0x01000000	ReadWrite	
APSR	0x00000000	ReadWrite	
IPSR	0x00000000	ReadWrite	
EPSR	0x01000000	ReadWrite	
PC	0x0800F200	ReadWrite	
SP_main	0x2000E830	ReadWrite	
SP_process	0x20003CA8	ReadWrite	
LR	0x0800D1F5	ReadWrite	
PRIMASK	0x00000000	ReadWrite	
BASEPRI	0x00000000	ReadWrite	
BASEPRI_MAX	0x00000000	ReadWrite	
FAULTMASK	0x00000000	ReadWrite	
CONTROL	0x00000002	ReadWrite	
IAPSR	0x00000000	ReadWrite	
EAPSR	0x01000000	ReadWrite	
IEPSR	0x01000000	ReadWrite

The contents of 0x2000DB34 is just zeros.

0x2000db30 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Tesla DeLorean

Guru

The registers seem to be describing a different address look just before that.

Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..

Danish1

Lead III

If it were me, I'd look at all the pointers you're dereferencing.

I think that hspi contains a credible value.

But I suspect that either hspi->Instance or hspi->hdmatx is zero

Hope this helps,

Danish

waclawek.jan

Super User

As Clive said, you have to look at instructions *before* the highlighed one (debuggers usually show what's in PC, and that already points to the next instruction).

Also, you want even more instructions before that, as you want to understand what leads to the actual faulting instruction (and that one will be a STR, as I've said above).

JW

Tesla DeLorean

Guru

The disassembly doesn't even relate to the reported fault address, as best I can decipher it.

Tips, Buy me a coffee, or three.. PayPal Venmo (See Profile) Up vote any posts that you find helpful, it shows what's working..

ccvelandresAuthor

Associate

I managed to catch the time when the fault was just about to occur and I found the cause. Danish was right. hspi->Instance was zero everytime the fault occurs. And as JW pointed out, the cause was a STR instruction.

The address 0x2000DB34 contains the struct hspi1 and hspi->Instance is at 0x2000DB34. The normal content of hspi->Instance is 0x04100300.

What happens is:

0x800f1fe: 0x6821 LDR R1, [R4] -------------> (R4 points to 0x2000DB34 which is zero so zero is loaded to R1)

0x800f200: 0x6048 STR R0, [R1, #0x4] ------> (since R1 is zero, it then tries to store R0 to (R1 + offset 0x4) 0x00000004)

then the hard fault occurs since it tries to store it to the reset vector. Does my understanding sound right?

I think what I need to find next is when the hspi->Instance turns to zero which shouldn't happen. Am I right?

OzoneBest answer

Principal

If you think the variable (struct member) gets overwritten elsewhere, set a data breakpoint on write access and try debugging.

ccvelandresAuthor

Associate

Found the cause! Turns out one of the libraries I'm using was using a static buffer at 0x2000DAD0 to 0x2000DB34 (size 100 bytes). And it was having a buffer overflow which then causes to overwrite the hspi1 struct. Increasing the size of the buffer fixed the problem.

Thanks so much for the inputs! I'm just thinking, how would one better detect buffer overflows like these ones? Especially if the memory was dynamically allocated?

Ozone

Principal

I don't know (understand ?) all circumstances. Especially what you mean with "dynamically allocated".

Make sure stack sizes are appropriate for your application.

> Turns out one of the libraries I'm using was using a static buffer at 0x2000DAD0 to 0x2000DB34 (size 100 bytes). And it was having a buffer overflow...

If the library code made out-of-boundary array accesses, it is buggy. Your hspi instace variable might just have been neighboring on the stack.

ccvelandresAuthor

Associate

> ... Especially what you mean with "dynamically allocated".

Just a hypothetical situation where the buffer is dynamically allocated and also the other buffer getting overwritten is dynamically allocated in the heap. How would you catch instances of buffer overflow? Of course, prevention is key, but what if it was a big library you imported was causing this?

> If the library code made out-of-boundary array accesses, it is buggy

Yeah.. I'll inform the developer regarding this

waclawek.jan

Super User

Thanks for coming back with the solution. Please choose your post as Best, so that the thread is marked as resolved.

> how would one better detect buffer overflows like these ones?

Using a different programming language...

C is inherently prone to programmer errors like these. It's basic premise was, everything is allowed and it's the programmer's responsibility no to shoot himself into foot. It's anticipated that the programmer will exercise extreme care, and/or will write every check necessary. This did not work out well in practice as witnessed by the innumerous bugs in the wild...

.. but that's life, programming languages are not chosen primarily based on their type safety.

JW

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded