2026-01-18 11:47 PM
Hi everyone,
I am working on an STM32-based project and occasionally facing an issue where the MCU freezes during normal operation. There’s no hard fault or obvious error, and the issue doesn’t happen consistently, which makes it harder to trace.
So far, I have checked:
Stack and heap usage
Watchdog configuration
Power supply stability
Basic clock and reset settings
The project runs fine most of the time, but under certain conditions (long uptime or repeated operations), it becomes unresponsive and requires a reset.
I wanted to ask the community:
What are your go-to steps for debugging intermittent freezes on STM32?
Are there specific STM32 tools or registers you usually monitor for this?
Any common pitfalls you’ve seen related to low-power modes, interrupts, or RTOS usage?
I’m mainly looking for practical debugging tips or real-world experiences that could help narrow this down.
Thanks in advance for your help.
2026-01-19 3:23 AM - edited 2026-01-19 3:40 AM
@Ozone wrote:This is not an issue of static analysis..
What I mean by "static analysis" here is examining the source code to look for potential infinite loops; eg,
while( some_status() )
{
/* Wait - do nothing */
}If, due to some fault or unforeseen circumstance, some_status() never goes false, that will be an infinite loop - and the system would hang/freeze.
@robert3 There should always be some escape from every loop - eg, a timeout or count - which ensures it cannot get stuck.
eg, How to detect potential infinite loops.
And make sure that your Watchdog is not updated within such a loop.
Also, beware of the standard ST-style:
if( HAL_TIM_Base_Init(&handle) != HAL_OK )
{
Error_Handler();
}
because that Error_Handler() is just an infinite loop:
/**
* @brief This function is executed in case of error occurrence.
* @retval None
*/
void Error_Handler(void)
{
/* USER CODE BEGIN Error_Handler_Debug */
/* User can add his own implementation to report the HAL error return state */
__disable_irq();
while (1)
{
}
/* USER CODE END Error_Handler_Debug */
}
2026-01-19 3:37 AM
@Ozone wrote:A hardware issue, like a supply ... instability, usually manifests either in the first milliseconds, or the first 15 minutes until the board reaches thermal equilibrium. Which is not the case here.
Hmmm ...
Supply issues can be due to things which don't happen often; eg a combination of things which rarely happen at once but, when they do, can can cause a glitch.
They can also be caused by external things which cause the input to the system to fluctuate or become noisy.
@robert3 this is why it's important for you to describe the external environment in which your system operates.
I once had a tricky bug with a remote-monitoring unit which was being retro-fitted into coffee machines.
The units would occasionally lose comms, and require a site visit to restart.
We eventually found that it was due to the operator performing a cleaning cycle on the machine - which glitched the power supply.
The key to finding it was to add instrumentation - which allowed us to see exactly when it happened, and link that to the cleaning visit.
2026-01-19 4:24 AM
> What I mean by "static analysis" here is examining the source code to look for potential infinite loops; eg, ...
I think too much posts in one thread confuse me ... I thought this was posted by the OP.
Anyway, a somewhat experienced Cortex M / STM32 developer knows that basically all fault handlers in all toolchains come with such a template implementation. And he should be aware of the consequences, being it the resulting error behaviour, or the subsequent need to implement an improved handler.
On a related note, static analysis does not cover interrupts and faults ...
>Hmmm ...
> Supply issues can be due to things which don't happen often; ...
Yes, I'm aware such errors can happen at later times. But the likelyhood is relatively small compared to startup and warm-up time issues.
I use to check for the most probable error causes first.
The "unresponsive" description most likely excludes supply issues, which would either cause a restart (not unresponsiveness), or likely persist after a reset.
A clock issue could usually be identified with a debugger as well.
2026-01-19 5:02 AM
@Ozone wrote:On a related note, static analysis does not cover interrupts and faults
I'm just using "static analysis" in the generic sense of inspecting the source code without running it - not implying any particular automated tool to do that.
Maybe call it "code inspection", if you prefer...
2026-01-19 5:03 AM
Double check stack usage. Note that total stack usage can exceed what the compiler thinks it is. If you enable an interrupt you need to add the stack usage of that interrupt to your worst case stack usage of function calls from main/threads. If you enable nested interrupts you can get 2 interrupts at the worst possible moment and need to add both of them to your worst case stack usage. You can calculate stack usage or measure it by putting a known pattern in memory.
Another common problem is hardfault. I recommend turning on an LED in the hardfault handler.
Infitite loops were mentioned before.
uninitialized stack variables can cause problems that only show up after a long time. Their value depends on what happens to be on the stack at that moment.
You checked the power supply voltage, but you didn't check it at the time of the crash I presume. So there could still be a dip. I recommend enabling the brown out detector.
If you enable a watchdog timer you will automatically reset if your program freezes.
At boot you can check the reset cause. So if program resets due to watchdog or brown out you can show that.
Improper DMA usage can corrupt your memory and cause unexpected behavior of your program.
2026-01-19 5:05 AM
@robert3 is this a new project, or something old & established which has just recently started showing this behaviour?
Or even something old & established which has always done this, and is only now being investigated?
2026-01-19 6:39 AM
@unsigned_char_array wrote:If you enable a watchdog timer you will automatically reset if your program freezes.
Unless you do something silly like updating the watchdog in a timer interrupt ...