2025-01-30 04:38 AM - edited 2025-01-30 06:35 AM
I am using an STM32G473 with FreeRTOS and am having issues with some interrupts sometimes causing an NMI, making the processor hang in the NMI_Handler (until the watchdog resets it if enabled).
The issue occurs sometimes, and I can always eventually trigger it when doing CAN Block Transfers, leading to many CAN Rx interrupts in quick succession.
From what I can see in the Call Stack, every single time I detect the issue, the interrupt seems to arrive/trigger when I am inside of a task switch. Example of the call stack:
So, here we have a task that has called vTaskDelay. The code pointed to in the delay function is portYIELD_WITHIN_API. My understanding is that this will cause a jump to
xPortPendSVHandler. My port of xPortPendSVHandler looks like this:
void xPortPendSVHandler( void )
{
/* This is a naked function. */
__asm volatile
(
" mrs r0, psp \n"
" isb \n"
" \n"
" ldr r3, pxCurrentTCBConst \n"/* Get the location of the current TCB. */
" ldr r2, [r3] \n"
" \n"
" tst r14, #0x10 \n"/* Is the task using the FPU context? If so, push high vfp registers. */
" it eq \n"
" vstmdbeq r0!, {s16-s31} \n"
" \n"
" stmdb r0!, {r4-r11, r14} \n"/* Save the core registers. */
" str r0, [r2] \n"/* Save the new top of stack into the first member of the TCB. */
" \n"
" stmdb sp!, {r0, r3} \n"
" mov r0, %0 \n"
" msr basepri, r0 \n"
" dsb \n"
" isb \n"
" bl vTaskSwitchContext \n"
" mov r0, #0 \n"
" msr basepri, r0 \n"
" ldmia sp!, {r0, r3} \n"
" \n"
" ldr r1, [r3] \n"/* The first item in pxCurrentTCB is the task top of stack. */
" ldr r0, [r1] \n"
" \n"
" ldmia r0!, {r4-r11, r14} \n"/* Pop the core registers. */
" \n"
" tst r14, #0x10 \n"/* Is the task using the FPU context? If so, pop the high vfp registers too. */
" it eq \n"
" vldmiaeq r0!, {s16-s31} \n"
" \n"
" msr psp, r0 \n"
" isb \n"
" \n"
#ifdef WORKAROUND_PMU_CM001 /* XMC4000 specific errata workaround. */
#if WORKAROUND_PMU_CM001 == 1
" push { r14 } \n"
" pop { pc } \n"
#endif
#endif
" \n"
" bx r14 \n"
" \n"
" .align 4 \n"
"pxCurrentTCBConst: .word pxCurrentTCB \n"
::"i" ( configMAX_SYSCALL_INTERRUPT_PRIORITY )
);
}
Which asm instruction we are at changes from run to run, but in this case we are at
stmdb r0!, {r4, r5, r6, r7, r8, r9, sl, fp, lr}
Here it looks like we get a CAN Rx interrupt. We end up in the handler for that interrupt, and traverse down to where it wants to put some data into a queue. The queue is very large and therefore will not be full. The memcpy is referred to in the call stack is the copy of the data into the write position of the queue.
The variables/pointers used in the memcpy looks to be OK.
After the memcpy we seem to end up in the NMI handler. I do not know why. I have tried to modify the NMI handler to see what the issue is but without luck. My NMI handler:
void NMI_Handler(void)
{
if (SYSCFG->CFGR2 & 0x100)
{
/* SRAM parity err */
while (1) {
__asm("bkpt 5");
}
}
if (FLASH->ECCR & 0xf0000000)
{
/* FLASH ECC err */
while (1) {
__asm("bkpt 6");
}
}
if (RCC->CIFR)
{
/* CSS err */
while (1) {
__asm("bkpt 7");
}
}
while (1) {
__asm("bkpt 8");
}
}
I always end up in the "bkpt 8". The if-cases are designed from what I could read in the reference manual, but they might be incorrect.
Contents of MSP and PSP when the bkpt8 has been hit:
x/128x $msp
0x2001fea8: 0x20004c70 0x2001fefc 0x00000008 0x00000000
0x2001feb8: 0x20004c6c 0x0800682b 0x08001560 0x21000025
0x2001fec8: 0x200049a8 0x00000000 0xffffffff 0x08006d95
0x2001fed8: 0x00000000 0x40006400 0x2221201f 0x2001ff3c
0x2001fee8: 0x00000000 0x4000a4b0 0x00000000 0x08008de1
0x2001fef8: 0x00000000 0x00000000 0x00000601 0x3c080000
0x2001ff08: 0x1f1e1d1c 0x00222120 0xffffffff 0x20000170
0x2001ff18: 0x00000000 0x00000101 0x00000000 0xa5a5a5a5
0x2001ff28: 0x2001b760 0x2001b820 0x2001b860 0x08008ec1
0x2001ff38: 0xffffffff 0x00000000 0x00000000 0xffffffe1
0x2001ff48: 0x200135ec 0x2001371c 0x20013718 0x2000034c
0x2001ff58: 0x200001c8 0xffffffed 0x08007c36 0x6100700e
0x2001ff68: 0x00000601 0x3b080000 0x18171615 0x081b1a19
0x2001ff78: 0x2000034c 0x20000178 0x200001b0 0x200005c0
0x2001ff88: 0x200001b0 0x00000040 0x00000000 0x200005c0
0x2001ff98: 0x200001b0 0x20000198 0x200001b4 0x08007c55
0x2001ffa8: 0x200025d4 0x2000034c 0x00000000 0x00f00000
0x2001ffb8: 0x00000000 0xc0000000 0x08013593 0x08007db5
0x2001ffc8: 0x08007b0c 0x61000000 0x00000000 0x00000000
0x2001ffd8: 0x00000000 0x00000000 0x00000000 0x00000000
0x2001ffe8: 0x00000000 0x00000000 0x00000000 0x00000000
0x2001fff8: 0x00000010 0x08000895
x/128x $psp
0x20013650 <ucHeap+78504>: 0x00000000 0x2001371c 0x10000000 0xe000e000
0x20013660 <ucHeap+78520>: 0x200001c8 0x08005a53 0x08005d80 0x61000000
0x20013670 <ucHeap+78536>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013680 <ucHeap+78552>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013690 <ucHeap+78568>: 0x00000000 0x00000000 0x00000000 0x00000000
0x200136a0 <ucHeap+78584>: 0x00000000 0x00000000 0x00200000 0x00600000
0x200136b0 <ucHeap+78600>: 0x00000000 0x08005dc7 0x2001b820 0x0801146f
0x200136c0 <ucHeap+78616>: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
0x200136d0 <ucHeap+78632>: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0xfffffffd
0x200136e0 <ucHeap+78648>: 0x00000000 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5
0x200136f0 <ucHeap+78664>: 0xa5a5a5a5 0xa5a5a5a5 0xa5a5a5a5 0x08007b29
0x20013700 <ucHeap+78680>: 0xa5a5a5a5 0xa5a5a5a5 0x00000000 0x00000000
0x20013710 <ucHeap+78696>: 0x00000000 0x80000068 0x200135ec 0x000047ad
0x20013720 <ucHeap+78712>: 0x20000200 0x20000200 0x20013718 0x200001f8
0x20013730 <ucHeap+78728>: 0x0000000b 0x00000000 0x00000000 0x20013718
0x20013740 <ucHeap+78744>: 0x00000000 0x00000005 0x20011708 0x45504d49
0x20013750 <ucHeap+78760>: 0x00000058 0x00000000 0x00000008 0x0000000d
0x20013760 <ucHeap+78776>: 0x00000005 0x00000000 0x00000708 0x00000000
0x20013770 <ucHeap+78792>: 0x00000000 0x00000000 0x200183a0 0x00004c28
0x20013780 <ucHeap+78808>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013790 <ucHeap+78824>: 0x00000000 0x00000000 0x00000000 0x00000000
0x200137a0 <ucHeap+78840>: 0x00000000 0x00000000 0x00000000 0x00000000
0x200137b0 <ucHeap+78856>: 0x00000000 0x00000000 0x00000000 0x00000000
0x200137c0 <ucHeap+78872>: 0x00000000 0x00000000 0x00000000 0x00000000
0x200137d0 <ucHeap+78888>: 0x00000000 0x00000000 0x00000000 0x00000000
0x200137e0 <ucHeap+78904>: 0x00000000 0x00000000 0x00000000 0x00000000
0x200137f0 <ucHeap+78920>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013800 <ucHeap+78936>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013810 <ucHeap+78952>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013820 <ucHeap+78968>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013830 <ucHeap+78984>: 0x00000000 0x00000000 0x00000000 0x00000000
0x20013840 <ucHeap+79000>: 0x00000000 0x00000000 0x00000000 0x00000000
The CAN Rx interrupt is the only interrupt enabled, I have even disabled the tick interrupt. I have also disabled CAN Tx interrupts, since the Tx should be so far between that the FIFO/queue should never be full. I have a few different tasks still running in order to service the CAN messages.
Our FreeRTOSConfig.h is attached.
So, my question is split in two: