CPU suddenly hangs

mail239955_stm1_stmicro · ‎2016-07-06

Posted on July 06, 2016 at 10:41

I have got an STM32F407 (discovery board) + RN1723 UART WLAN module.

I'm using STM32CubeMX + FreeRTOS (all up to date).

Now I have a problem I never encountered before when working with STM32. I have multiple Tasks running, UART communication etc and at a specific point my CPU suddenly does weird things.

When I kick in the debugger I observe that it hangs in HAL_TIM_IRQHandler, but the status register (SR) is 0, so no handler is executed.

By toggling an DO in the handler, I can see that it always reenters the handler in a endless loop, also the CNT register then is always 0.

But: when I try to set clever breakpoints to catch the CPU just before this problem occurs, I can observe that the SR suddenly is 0xfffffff7 (all other timer registers seem to have random values).

Also UART does not work any more in this situation. When I remove complexity a bit, the problem may never occur, when I add more code complexity the problem occurs earlier.

I have both stack overflow detection on, also the MallocFailed handler, but nothing kicks in. I tried increasing the stack of the tasks, does not change anything.

In the callstack i only see this:

HAL_TIM_IRQHandler() at stm32f4xx_hal_tim.c:2.918 0x80010c2

<signal handler called>() at 0xfffffffd

prvPortStartFirstTask() at port.c:290 0x8002b68

xPortStartScheduler() at port.c:382 0x8002db4

0x0

Sometimes the HardFault Handler kicks in, then ALL timer registers are 0 again and the stack looks like this:

prvGetRegistersFromStack() at handlers.c:49 0x8003d44

HardFault_Handler() at stm32f4xx_it.c:59 0x80044bc

<signal handler called>() at 0xfffffff1

HAL_TIM_IRQHandler() at stm32f4xx_hal_tim.c:2.802 0x8000f96

<signal handler called>() at 0xfffffffd

prvPortStartFirstTask() at port.c:290 0x8002b68

xPortStartScheduler() at port.c:382 0x8002db4

0x0

I only see it comes from the ISR, but that does not help much.

Something very strange with random symptoms is happening here, maybe someone can give me a hint how to track this down.

Thanks

Jan

mail239955_stm1_stmicro · ‎2016-07-06

Posted on July 06, 2016 at 11:20

When I disable instr cache, prefetch buffer and data cache the program hangs earlier with a hard fault:

r0: 0x2000180C

r1: 0xFFFFFFF1

r2: 0x00000002

r3: 0x20006D5F

r12: 0x0000279F

lr: 0x2000180C

pc: 0xA5A5A5A5

psr: 0x08002C97

The 0xA5A5A5A5 is the stack magic pattern from FreeRTOS. How can I debug this?

AvaTar · ‎2016-07-06

Posted on July 06, 2016 at 13:33

My first guess would have been a stack overflow, too. FreeRTOS with his task stacks surely complicates things.

Another possibility is an out-of-bound access to local (stack) data, which would also trash the stack.

If the place (i.e. the stack address) is consistent, perhaps a data watchpoint could help debugging.

mckenney · ‎2016-07-06

Posted on July 06, 2016 at 15:01

Check the Flash Wait States in the ACR for compatibility with your CPU speed.

My observation is that the CPU can run ''on the edge'' for quite some time, and the symptoms are always (Engineering term) ''goofy''.

mail239955_stm1_stmicro · ‎2016-07-06

Posted on July 06, 2016 at 17:04

Oh god, after looking for days I fixed it.

First: Because it was hard to track it down with the debugger, I turned on an DO before a critical code section and off again after. At the error, the DO did stay on. This way i tracked down the error to a line with a memcpy(), I think you can guess what the problem was 😉

Thanks anyway for help. And for every other one who has such problems: Try to isolate the failing code by turning an LED on before and off after.

Can be closed