cancel
Showing results for 
Search instead for 
Did you mean: 

Getting a hard fault on only one of two identical boards

MKola
Associate II

So I've made two prototypes of a new custom board with a STM32F405RG microcontroller at heart and while one of them works totally fine, after a few seconds or more I'm getting a hard fault on the second one.

To investigate the origin of the hard fault I turned on the full assert in StdPeriph library and found the DMA_Cmd function from the library to be the first place where something goes wrong (or a first place where something wrong is detected).

It begins like this:

void DMA_Cmd(DMA_Stream_TypeDef* DMAy_Streamx, FunctionalState NewState)
{
    /* Check the parameters */
    assert_param(IS_DMA_ALL_PERIPH(DMAy_Streamx));
    assert_param(IS_FUNCTIONAL_STATE(NewState));
    (...)
}

Apparently, the IS_DMA_ALL_PERIPH macro produces a false result, suggesting that the DMAy_Streamx variable isn't actually a DMA stream.

Here's what the macro looks like:

#define IS_DMA_ALL_PERIPH(PERIPH) (((PERIPH) == DMA1_Stream0) || \
                                   ((PERIPH) == DMA1_Stream1) || \
                                   ((PERIPH) == DMA1_Stream2) || \
                                   ((PERIPH) == DMA1_Stream3) || \
... 
more of the same
...
                                   ((PERIPH) == DMA2_Stream6) || \
                                   ((PERIPH) == DMA2_Stream7))

The weird thing is that it's called from the same context every time (with a DMA1_Stream4 value), and the argument of the function definitely is a DMA stream.

So I decided to dive into the register values and assembly. Instead of the assert_param macro I set a simple trap to catch the first occurence of the fault and inspect the register values (the function gets called properly thousands of times before the fault occurs):

if(IS_DMA_ALL_PERIPH(DMAy_Streamx) == 0) {
    while(1);
}

Here's the disassembly of the relevant part of the function:

08000aac:   ldr     r3, [pc, #136]  ; (0x8000b38 <DMA_Cmd+140>)
08000aae:   cmp     r0, r3
08000ab0:   push    {r4, lr}
08000ab2:   mov     r4, r0
 485        if(IS_DMA_ALL_PERIPH(DMAy_Streamx) == 0) {
08000ab4:   beq.n   0x8000b14 <DMA_Cmd+104>
08000ab6:   adds    r3, #24
08000ab8:   cmp     r0, r3
08000aba:   beq.n   0x8000b14 <DMA_Cmd+104>
08000abc:   adds    r3, #24
08000abe:   cmp     r0, r3
 
...
12 identical checks here
...
 
08000b0a:   beq.n   0x8000b14 <DMA_Cmd+104>
08000b0c:   adds    r3, #24
08000b0e:   cmp     r0, r3
08000b10:   beq.n   0x8000b14 <DMA_Cmd+104>
08000b12:   b.n     0x8000b12 <DMA_Cmd+102>

So, the R0 value gets copied to the R4 register and then is successively compared to different values, stored in R3.

But the weird thing happens when investigating the register values after the program falls into the trap (loops at 08000b12):

r0	0x40026078 (Hex)	
r1	0x0 (Hex)	
r2	0x40003800 (Hex)	
r3	0x400264b8 (Hex)	
r4	0x40026070 (Hex)	
(...)
sp	0x1000fe50	
lr	0x800225b (Hex)	
pc	0x8000b12 <DMA_Cmd+102>

What? r0 value is copied to r4, gets compared to r3 without changing it's value and somehow after those comparisons r0 and r4 differ in value by 8?

I thought at first that maybe some interrupt code stops this function in the middle and corrupts the value of r0, but I don't think that's possible - the code I've shown gets called from the ISR of a 0 Preemption Priority, which means it cannot get preempted by other interrupts (other than fault exceptions, NMI and other stuff I don't use).

Does anyone have any idea what could be the cause of this weird behaviour?

2 REPLIES 2
Ozone
Lead

> ... the code I've shown gets called from the ISR of a 0 Preemption Priority, which means it cannot get preempted by other interrupts (other than fault exceptions, NMI and other stuff I don't use).

SysTick has a higher priority.

Cube uses Systick by default.

Or a stack overflow caused by interrupt nesting ?

thanks for the reply and suggestions,

I manually change the SysTick priority to a lower one using a NVIC_SetPriority function (Preemption priority of 3, to be exact).

The interrupts are not consuming much stack, and I have plenty of stack left - max. stack usage I've measured was around 2kB out of 64kB available memory for stack, so I don't think this is the cause of the problem either.