How to debug code freeze in STM32F401

jhnlmn · ‎2014-03-04

Posted on March 05, 2014 at 01:28

Hi,

I am running my own code on STM32F401 Discovery board.

Sometimes (pretty rarely) the code freezes.

I added sufficient logging to HardFault_Handler, but it is not executed.

I am not sure, but most likely the code runs some loop with interrupts disabled.

I cannot send any input to it, cannot print stack or examine registers, nothing.

I thought of using NMI, but it appears that STM32F401 has no mean

of invoking NMI externally. The manual says that NMI is invoked

when ''HSE clock happens to fail'', but how can I make it to happen to fail?

I also searched this forum and saw advise ''use JTAG'',

which in case of STM32F401 Discovery would translate to ''use SWD'', but I cannot figure out how to connect to a running board without resetting it.

I am using latest IAR EWARM 6.70. It has Debug option ''Attach to running target'', but it does not work. It shows message ''The debugging session could not be started ...''

But if I disable ''Attach to running target'', then debugger always resets MCU and stops at main.

So, please, tell me how do you debug freezes on STM32F401 Discovery board or similar?

I would prefer to use NMI rather than debugger if possible.

Thank you

#dbgmcu

Tesla DeLorean · ‎2014-03-04

Posted on March 05, 2014 at 02:45

Couldn't you just leave it run in the debugger, and simply hit the STOP button when you needed to know where it is?

You could kill the HSE by grounding the OSC_IN side of the crystal. If you removed the crystal and set the solder bridges to derived the clock from the ST-LINK's F103 PA8 MCO pin, you could jumper that feed and make it go away.

You need to get some diagnostic output from your application, either via a USART or SWV (SWO) Serial Wire Viewer on the ST-LINK Utilities. You could track heart-beat interrupts, or add check-points to see where the code is confined. If you can determine the rough location you can add additional checks as you bisect the issue.

You could perhaps use GPIO and LED to observe where things get stuck. You could use buttons or EXTI interrupts to break out. You could use the watchdog

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

jhnlmn · ‎2014-03-04

Posted on March 05, 2014 at 03:37

Thank you for your response.

> Couldn't you just leave it run in the debugger

It is difficult for several reasons:

1. My code has __WFI() calls in few places.

And I found that ST-LINK connection becomes unstable when __WFI() is called.

For example, if I run

ST-LINK_CLI.exe -Rst

then I get

No target connected

Unable to connect to ST-LINK!

This is not 100%, but pretty often.

And debugger looses connection too and sometimes terminates, sometimes crashes.

Well, may be I can remove all __WFI() calls from my code to get stable ST-LINK,

but it will be unfortunate.

2. I am running 3 Discovery boards with the same code at the same time talking to each other

and there is not way to predict which one will freeze.

Unfortunately, ST-LINK does not support simultaneous connection

to several Discovery boards. It always connects to whatever happened

to be first in Windows Device Manager list. So, there is no way to run 3 debuggers at the same time.

May be I can get 3 PCs to run 3 debuggers, but soon I will have to run 100 custom boards

with the same code, so running 100 debuggers will not work.

So, if there is a way to connect debugger to already frozen board at run time, it will be great.

> You could kill the HSE by grounding the OSC_IN side of the crystal. If you removed the crystal ...

No, it is too big change. And I will have to repeat it on multiple boards. Too much trouble.

Just out of curiosity, if I ground the crystal and kill HSE, how MCU is going to work?

Or do you mean that instead of crystal I should get external clock?

And should this external clock be used before of after the moment when I need NMI?

> You need to get some diagnostic output from your application, either via a USART

I was preparing to do that, but then I saw this post:

https://my.st.com/public/STe2ecommunities/mcu/Lists/cortex_mx_stm32/Flat.aspx?RootFolder=%2Fpublic%2FSTe2ecommunities%2Fmcu%2FLists%2Fcortex_mx_stm32%2FSTM32%20freezing&FolderCTID=0x01200200770978C69A1141439FE559EB459D7580009C4E14902C3CDE46A77F0FFD06506F5B&currentviews=618

where a guy was trying to use serial port for debugging, but you wrote

''break-in with JTAG, where is the program stuck''

So, I thought that you know some magic bullet how to ''break-in with JTAG''.

Thank you

Tesla DeLorean · ‎2014-03-04

Posted on March 05, 2014 at 04:10

Well if you've configured WFI to dump into a low power mode it will drop the debug connection (ie power regulator turns off, clocks/data into part stop functioning). Check DBGMCU options/settings.

Can't say I've done postmortem analysis on devices with IAR + SWD, I've done it with Keil/Realview + JTAG

You said stopping HSE generates an NMI, I'd presume that means it has a clock testing mechanism and can switch to HSI, or something, refer to the documentation on the feature.

I can't speak to the impracticality of your situation, you should probably walk the code and try and isolate reasons or mechanisms where it might get stuck. ST has heaps of code which spins in endless loops, like waiting for I2C bus states and such. You'd might want to review such constructs and put time-outs in loops. You have a handful of LED's have those toggle when you pass a check-point a few thousand times, try to identify the code paths which are locking up.

Use a watchdog timer, store check point information in RAM, catch the Watchdog Flagged Reset and store/output about what occurred immediately prior. Correlate that information, and use it to focus on possible causes or code paths involved in your failure.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

os_kopernika · ‎2014-03-07

Posted on March 07, 2014 at 10:37

''Check DBGMCU options/settings.''

+1

That is the first thing you should have done BEFORE starting the project.

Just after reading errata, which is zeroth on the list.

jhnlmn · ‎2014-03-09

Posted on March 10, 2014 at 01:43

> ''Check DBGMCU options/settings.''

> That is the first thing you should have done BEFORE starting the project.

I was studying sample projects for Discovery board from stsw-stm32136.zip and they do not have any DBGMCU_ calls. How could I have guessed to do something if STM did not do it themselves?

I was using Projects\Demonstration for experiments. I added __WFI(); to loop in Delay(). Now I also added

DBGMCU_Config(DBGMCU_SLEEP, ENABLE);

DBGMCU_Config(DBGMCU_STOP, ENABLE);

DBGMCU_Config(DBGMCU_STANDBY, ENABLE);

But the problem still persists. For example,

ST-LINK_CLI.exe -Rst

causes:

No target connected

Unable to connect to ST-LINK!

on most runs. So, I had to comment out all __WFI() to make ST-Link stable.

Anyway, as I explained before, even if debugger will work well, I still cannot run each of my 100 boards under a debugger. I need postmortem.

> try to identify the code paths which are locking up.

I tried to make global variables with __no_init attribute, assign these variables in several places, like at the beginning and end of each ISR, at __disable_interrupt and __enable_interrupt calls, etc, enable IWDG and then print those variables at the beginning of main(). But so far I failed to isolate the problem this way.

.....

It looks like I found a way to do postmortem analysis. Here are the steps:

1. Disable IWDG (to let my program freeze instead of resetting).

2. Once program is frozen, open ST-Link utility.

3. Change Settings | Connection Mode = Hot Plug

4. Click Connect to Target

5. Read RAM. Address 0x20000000 Size 0x10000 (or whatever your MCU has)

6. Save Ram: File | Save As. Select Intel Hex format. Name: ram.hex

7. Target | MCU Core. It should show registers. Take a screen shot.

8. Target Disconnect

9. Connect IAR debugger (this will reset MCU)

10. View | Registers. Manually enter values from the screen shot.

11. Debug | Memory | Restore. Open ram.hex.

Now I can see the place where it froze, stack and variables. It is almost as good as breaking into existing debugger session except I cannot resume execution because peripherals are in the wrong state.