Why my STM32 doesn't start?

Tomas DRESLER · ‎2021-05-03

Let's look at the following aspects in more detail:

1. HW considerations
2. Start-up configuration
3. Software
4. Summary (TL;DR)

1. HW considerations

Datasheet and other documentation

Please always refer to:

the datasheet for electrical and mechanical properties
the reference manual for description of the functionality
the latest STM32 errata sheet, implement all recommended fixes, if they may influence you
hardware getting started application note
application notes associated with your STM32

Power supply

The power supply is a critical part in STM32 function. For proper start, the STM32 requires applying supply voltage from zero to minimum start-up voltage (default BOR level from Option Bytes) on at least Vdd and Vdda supply pins. After the internal reset is released and BOR level updated from user Option Bytes, the voltage may drop down to minimum available for given BOR level, provided the BOR circuitry has a defined hysteresis.
Beware that for proper start-up, the supply voltage must drop to 0 V before it can be reconnected, otherwise proper power-on reset may not be guaranteed. This situation may occur if external voltage is present on GPIOs when Vdd is disconnected. In such case, IO_FT pins are preferred for such signals.
You can monitor proper supply of STM32 by several means:

reading voltage with a multimeter on Vdd-Vss and Vdda-Vssa pairs. It must match designed supply voltage and operating range of STM32
reading supply current of STM32. On ST evaluation boards such function is implemented by means of a jumper over the split Vdd path. Start-up current shall be in the range of several mA up to 10s of mA. Too high current (100 mA and more) can mean damaged STM32, short circuit on the PCB etc. Good practice during evaluation is to use external power supply with current limit set to expected current draw.
measuring NRST pin - if it's permanently low, the supply voltage isn't sufficient to release the reset circuitry, or the STM32 is damaged
measuring the Vcap voltage, if the Vcap pins are present, in the range of 1.1 - 1.3 V

The STM32 is supplied via several voltage domains:

digital (IOs, backup domain, Vcore voltage regulator) via multiple Vdd-Vss supply pairs
analog (ADC, DAC, comparators, PLL, reset) via Vdda-Vssa pair
core voltage, available as Vcap (from none to multiple pins)
backup domain (RTC, backup RAM) via Vbat pin
voltage domains for specific peripherals (USB, MIPI DSI, LCD, GPIO_1.8V) with dedicated supply pins

There are some requirements for the supply organization, usually described in Application Notes regarded as Hardware Getting Started for specific STM32 family. The allowable voltages and their sequence is defined in product datasheet.

There are common requirements for the STM32 supply:

all Vss pins must be connected together with maximum voltage difference 50 mV, meaning very good ground plane
all Vdd pins must be connected together with maximum voltage difference 50 mV, meaning very good supply connection. Such requirement means no added inductance between any two Vdd pins, having disastrous consequences to the STM32.
all Vdd-Vss pairs require decoupling capacitor 100 nF as close as possible to the package
Vbat pin shall be connected to backup battery, if required by the design, or to Vdd. Connecting Vbat to Vss short-circuits the STM32 and damages it.
Vdda can be supplied from the same voltage source as Vdd and it is allowed to decouple it with an LC filter to guarantee stable supply for analog peripherals. Beware of timing characteristics from the datasheet.
Typical maximum voltage for any supply pin is 3.6V, absolute maximum short-time is 4V. Crossing it, even temporarily or as an effect of added serial inductance, can damage the chip permanently! Please design your power supply and ESD protection to guarantee these requirements.
Core voltage, generated by internal linear voltage regulator, may be available on Vcap1, Vcap2 pins. If these pins are available on the package, each of them must be equipped with low-ESR capacitor with a typical capacity of 2.2 - 4.7 uF (see the datasheet). Do not exchange mistakenly with a 4.7 nF capacitors as it will heavily influence stability of the regulator and whole chip!
The core voltage in some STM32 families can be supplied as well externally to reduce the power loss from the internal linear voltage regulator. Please refer to the datasheet and HW Getting Started for further details.
Another possibility for Vcore generation is available with integrated SMPS. If SMPS is not implemented or not available for given STM32 (STM32L4, STM32H7), switching core power supply to SMPS option during start may render STM32 unresponsive. The workaround is to power on the chip with BOOT0=1, i.e. starting the system bootloader which relies only on linear voltage regulator, and to erase the chip.

Other supplies may (not) be present or may be generated internally (Vdd_dsi, Vlcd, Vio_1.8V, Vusb). If such feature is allowed, they are monitored internally and associated peripherals need to be enabled and internal isolation lifted, once such voltage is available and detected.

Clock

Every STM32 starts with internal RC oscillator on a default frequency (typically 8 or 16 MHz), thus there is no concern about proper boot.
Later, user SW can enable the crystal oscillator (HSE, LSE) and/or PLL. After the clock tree is supplied from the HSE and PLL, the external noise or ESD event may cause temporary frequency change, or, if the crystal is damaged by mechanical shock, failure to continue generating the system clock at all. To prevent deadlock in such situations, there are two major tools:

STM32 will not allow to switch to such clock source, if it's not enabled, stable nor locked. In SW, this is managed by timeouts in waiting for clock switch command.
enabling CSS (a HW clock "watchdog") guarantees that when HSE disappears, the whole clock tree is swung to HSI (startup RC oscillator) and the use program is notified via Non-Maskable Interrupt (NMI). Such behavior allows to recover the clock using internal resources, possibly restarting the crystal oscillator, or safe shutdown of user application

Caution: when STM32 wakes up or powers on with Backup Domain supplied by external battery and RTC running, the standard initialization routine of LSE may time out due to the fact that LSE is already running. If you don't test LSE\RTC operation prior to (re)initializing it, the timeout delay may be around 30 seconds.
Caution: please choose and verify your crystal choice with the AN2867 (Oscillator Design Guide for STM8 and STM32s)! Wrong selection of crystal or drive level may cause long or unreliable start of HSE or LSE, esp. at low temperatures.

External setup of STM32

After the power supply is applied, the STM32 waits until the reset circuitry releases the reset signal. In most designs and with most STM32 families, the internal voltage supervisors (POR/PDR and BOR) are utilized without need of external voltage supervisor.
This, however, limits the minimum operating supply voltage for some STM32 families to 1.71 V. To enable operation at even lower supply voltage (down to 1.62 V), external voltage supervisor is needed and the POR has to be disabled. You can disable POR (Power-On Reset) by tying the pin PDR_ON to Vss instead of Vdd (normal situation with internal reset supervisor enabled).
When the internal reset is released, STM32 reads the logic level on the pins BOOT0 and BOOT1. In recent STM32 families, BOOT1 is replaced by a bit in the Option Bytes.
The logic level of BOOT0 decides whether STM32 enters user application (BOOT0 grounded to Vss) or system bootloader or internal RAM (BOOT0 tied to Vdd, BOOT1 pin or nBOOT1 bit decide). Some other scenarios are available, when the chip is virgin and no code is present in the FLASH memory. For details, see AN2606 .
Beware that if System Bootloader is activated, it listens on all interfaces stated in AN2606. This poses a need for specific design if System Bootloader has to be utlilized: only the interface designed for the primary communication must be active, all other pins associated with other interfaces must not show any activity (stay at constant value until primary communication is established). The reason behind is that Bootloader scans for activity on any and all of the pins associated with all designated peripherals and the peripheral with first detected activity is selected for further communication.
If the BOOT0 is disconnected or the connection to Vss is damaged (i.e. missing jumper), the entry into user application may be impossible or improbable, while System Bootloader is executed instead, due to randomness of the voltage on this input pin at power on or reset.
The BOOT0 sampling may be bypassed by Option Bytes setup. When the STM32 is read-out-protected in level 2, the BOOT0 is ignored completely.

Reset circuitry

The NRST pin is bidirectional, with open-drain MOS transistor and internal pull-up resistor. This allows to reset the STM32 internally from various sources as well as externally, with a button, external watchdog, voltage supervisor etc. For proper function of internal reset circuitry, all external devices must operate in open-drain mode, never in push-pull configuration.
The NRST pin doesn't need to be driven externally. The only component needed is a 100 nF capacitor connected between NRST and Vss. Bigger capacitor, external pull-up resistor or push-pull device will cause overload of the internal MOS or possibility to miss the reset pulse.
New low-pin-count STM32 devices may configure the NRST pin as input-only or disconnect the functionality from the pin, leaving it for user GPIO usage.
The NRST shall be propagated to the debugging connector and/or test pad. Monitoring it at various situations can reveal the issues within or outside of STM32.

Debug connections

The STM32 devices can be debugged via two distinctive, but overlapping interfaces: SWD or JTAG. The choice is made by the debug probe and specific sequence pushed through the debug pins. While the protocol is different, capabilities of both interfaces and their speeds match in terms of debugging. SWD doesn't have access to Boundary Scan of STM32, though.
When JTAG interface is selected, all JTAG pins including NJTRST need to be used. If SWD is utilized, only SWDIO and SWCLK are needed for proper debug and programming.
Further connections to a debug probe are needed:

Vdd used as a voltage monitoring input and supply to debug probe's voltage level translator: the Vdd on a JTAG connector is used as input to the probe, not as a supply to STM32!
NRST for HW reset of STM32: if connection to STM32 fails, choosing connection method "Connect under reset" allows to reconnect if any SW misconfiguration prevents from connecting to STM32 (invalid clock, GPIO setup etc.)

Caution: during reset, the JTAG pins are configured as Alternate Functions with internal pull-ups or pull-downs and connected to the JTAG circuitry. This may pose risk to any external peripherals, possibly controlled by these pins, when set up as normal GPIOs. Driving these pins externally and providing multiple resets to STM32 may cause injection of JTAG sequences, causing unexpected behavior, like entry in boundary scan mode.

When using ETM debug interface, beware that it is a high-frequency circuitry (hundreds of MHz) and proper PCB layout needs to be considered. Several things have to be fulfilled for proper ETM operation:

ETM has to be enabled, together with appropriate GPIO pins in Alternate Function mode (typically by debug script)
proper core frequency has to be entered in the debug tool\IDE (there exist some frequency limitations in different probes and tools)
Cortex-M ETM-enabled debug probe has to be used
High-speed USB streaming trace is preferred over buffered trace
possibly data skew has to be compensated if ETM track lengths are not equal

SWD interface can be extended with SWO pin, which allows to stream user and ITM data via SWD debug probe and utilize them as debug printf-like output.

2. Start-up configuration

Option bytes

The Option Bytes are stored in separate area of FLASH memory. They can be configured by user and contain various startup options (depends on STM32 family) like:

start up addresses for BOOT0 options
nBOOT1 bit value
BOOT0 lock and preprogrammed value
launch of independent watchdog
automatic reset if SW enters low-power modes
read-out protection level
write-protected sectors
secured sectors
ROP-protected sectors
default core security mode
activation of dual-core system

Some options may help you to make your system more robust, like disabling bootloader entry (if BOOT0 can be controlled from Option Bytes), some may be annoying for debug (activating IWDG or RDP level above 0).
If you don't plan to use low-power modes, consider enabling reset when entering STOP and Stand-by.
Caution:

debugging your SW, while you activate RDP level above 0, will automatically render the FLASH memory unavailable until full power cycle!
RDP level 2 is irreversible and blocks any debug or System Bootloader access to STM32. If you misconfigure the STM32 or its SW is damaged and doesn't allow in-application reprogramming, the STM32 will be bricked permanently!
If STM32 experiences reset or power-cycle during Option Bytes programming and Option Bytes will mismatch against their complements, RDP Level 1 will be imposed. In this case, regression to Level 0 is needed (with full FLASH erase) to recover the Option Bytes and FLASH access.

When IWDG HW activation is enabled, the STM32 will start the independent watchdog immediately after reset. It is possible to temporarily block its operation only when in debug mode, but as soon as the STM32 starts or resumes SW execution, IWDG needs to be periodically updated. The IWDG initial delay is only several ms, thus its reconfiguration may be the first operation of the start-up code, prolonging the IWDG timeout to hundreds or thousands of milliseconds, if needed. If you utilize lot of RAM, the initial IWDG timeout may be too short to allow reconfiguration from main(), causing periodic resets. You can detect such situation by period of the resets on the NRST pin.

3. Software

Vector table and initial setup

The STM32 starts typically from fixed address: 0x0000 0000. Here it expects a vector table with 32-bit addresses of different routines (reset, faults, interrupts), very first address being value of initial stack pointer, loaded by core to MSP after reset, second being the address of reset handler.
STM32, however, places the FLASH memory by default from address 0x0800 0000. This allows to "make space" for memory remapping, so that, using memory mirroring, STM32 can offer to the core at address 0x0 various types of memory: FLASH from 0x0800 0000, System Bootloader from 0x1FF0 0000 or RAM from 0x2000 0000 via BOOT0\BOOT1 configuration. SW remapping in runtime is available, too.
When custom bootloader is implemented in multi-stage boot, it's useful when vector table is remapped to the sector with user application, to allow the application to utilize its own set of fault and interrupt vectors. This is available through NVIC->VTOR.
Caution: VTOR address must be aligned to size of the vector table (i.e. in 1 kB steps). If the alignment is not respected, core enters unrecoverable fault.
CMSIS code can manage remapping via macros USER_VECT_TAB_ADDRESS and VECT_TAB_OFFSET.
When the STM32 boots, the core loads the initial SP and jumps to reset_handler in privileged thread mode. From now on the control of STM32 is on the user program.
The reset_handler proceeds to call function SystemInit, which, depending on the selected core, enables FPU, sets up the Vector table address matching the user setup, possibly enables F(S)MC interface for access to external RAM and FLASH memories, resets clock to internal RC source and possibly reconfigures other oscillators.
Next step after return from SystemInit depends on the implementation of the compiler runtime library (RTL), but in general starts RAM initialization (RW copied from FLASH, ZI cleared), stack and heap setup, IN\OUT\ERR channel initialization, C++ static constructor calls etc.
As a last step, a wrapper around main() and finally main() itself is called.
Normal STM32 application shall not exit, however there may be implemented functions like __rt_exit(), defining behavior for such occasion.
Caution: if Hard Fault is entered during start-up, possibly:

the RAM isn't enabled\accessible. This may happen when additional RAM is in different domain and needs its clock enabled, or the RAM is in external memory space. Can be solved by proper F(S)MC setup in SystemInit.
the FLASH memory is unavailable (if your code is stored in external FLASH, too, or ECC error is detected)
static C++ constructors are called in wrong order

Notice: Cortex-M7-based STM32s have configurable start-up address in Option Bytes for different BOOT0 configurations.

Memory setup

Memory layout of user application is defined by linker description file. The default memory setup may not cover all available memories, for a good reason. User can extend or modify the linker description file (or scatter file, by different terminology) to access these memory regions, however some caution and knowledge is required.
Various STM32 families offer different RAMs available through different buses: main data SRAM, D-TCM and I-TCM RAM, backup SRAM, SRAMs in different power domains. Some of them can be utilized only by the core, some can be shared with other bus masters like GPDMA, ETH DMA, USB DMA, SDMMC DMA or other core. Without prior knowledge, user can introduce hard-to-debug issues, i.e. when core-private memory is included in normal linker setup and linker decides to allocate stack or heap in such memory. If such RAM is then allocated and used as a buffer for DMA, the DMA will fail and report permanent error. DMAs typically require aligned access, thus proper allocation of buffers is a must.
Access to I-TCM RAM can cause Data Abort faults etc.

Caches and MPU

The usage of caches has significant impact on the STM32 performance, esp. with high core speeds or external memories like QSPI or SDRAM. Usage, however, must be well considered, esp. if the cached memory is shared with other bus masters. Proper caching strategies and partitioning need to be chosen to guarantee consistency of the data. Various blocks within a cache can be defined and different strategy selected via MPU utilization.
The MPU can define up to 8 or 16 overlapping blocks with different size and position within memory. Each block setup can influence whether the underlying memory will be cached, buffered, treated as a device or memory with random access etc.
The cache lines may need to be invalidated. If different content is stored across one cache line, when invalidated, it may get corrupted, thus alignment of data may be necessary.
Should you enable the Data cache, please extract all DMA buffers or control blocks for MDMA or ETH DMA in non-cached regions.
Private program data can be cached freely.

Debug tools, RTL and retargeting

When your application is compiled in Debug mode, the RTL code may contain routines allowing so-called retargeting, i.e. passing data to the debugger via a side-channel. The routines either pass the data to the debugger via vector catch using instruction BKPT, or implement other means of communication - memory buffer, ITM interface etc. Some configurations may prevent the program from running properly when it calls such routine. Then the code may get stuck.
Solution is to compile the application with RTL library with retargeting switched off.
Beware of different versions of RTL library, they may offer full, mid or minimal functionality. Typically they implement various features of printf or scanf, but support for 64-bit arithmetic may be silently discarded and other standard C features, like support for file streams or time\time-zone conversions may be very simplified.
Relying on such standard C features may be dangerous and cause run-time troubles.
When debugging an application, using HAL library, it's useful to define macro USE_FULL_ASSERT and put breakpoint in the routine assert_failed. This setup generates code checking parameters in most of HAL functions. When wrong setup is detected, assert_failed is called with filename and line no. of the failing test. In Release mode, remove the macro to boost the performance!

Catching errors

Typical issue with an application with stable HW design is a software or configuration fault, causing some type of Cortex-M fault or an algorithmic or peripheral deadlock. The application shall be ready to deal with them in expected and reliable manner.
The faults generated by Cortex-M can be split into usage fault (generated by the core, i.e. undefined instruction, bad privilege), bus error fault (generated by bus matrix) and memory protection fault (generated by MPU). When no fault handler is defined, all these are escalated to Hard Fault. The Hard Fault doesn't demonstrate any specific fault in the system, it is considered as a collector of other unhandled faults (usage, bus, MPU, priority inversion).
Except Cortex-M0, all higher Cortex-M implementations of STM32 offer registers, where user can read the original source of the error and associated flags together with failing address. Other information together with context of the caller are stored in the stack. It may be difficult to recognize the source of the error even with such detail, thus ETM interface may become handy, but in general your application shall handle these errors reliably and deterministically. You can implement some kind of parser of the error context, store it to some log or non-volatile memory and reset your application, or, gracefully shutdown and stop any operation, if the error may not be recoverable in your application, minimizing external risks. Such fault implementation will create deterministic behavior, that would allow future debugging and improve the user experience.
Examples of Hard Fault handler can be found on Keil.com and elsewhere.
Software faults may be difficult to spot, esp. if they cause deadlock. HAL library functions contain user-defined timeout to prevent dead-lock if underlying peripheral is misconfigured or external fault or ESD event cause peripheral trip. In such case the function will return HAL_TIMEOUT and user shall consider peripheral or system reset.
Algorithmic issues may not be able to prevent themselves from detecting timeouts by default or it may have big impact on efficiency. For such purpose STM32 offers two hardware watchdogs - WWDG and IWDG. The WWDG is well suited for monitoring various algorithms for minimum and maximum length, IWDG can be used for long-term processes. Both watchdogs offer independent clock sources, API interface and features. When your application is reset by any of these two watchdogs, you can recognize the reset source from the RCC->RSR. If you design your routines in a way described in the X-CUBE-CLASSB package, using function call stamps you can detect at which routine the watchdog has timed out.

4. Summary (TL;DR)

read datasheet, reference manual, errata sheet and application notes
design reliable power supply, measure it, don't cross AMR, ever!
beware of voltage difference between Vss and Vdd pins, decouple well with 100 nF capacitors
monitor supply current and Vcap voltage
Vbat must be always positive, never grounded
NRST shall be free (of push-pull devices, external pull-ups and capacity over 100 nF) and supply voltage shall reach 0 V for proper power cycle
monitor NRST for start-up issues, read reset reason in RCC->RSR
validate crystals against AN2867, enable CSS
always ground BOOT0 when System Bootloader is not in use, possibly with a pull-down and allow access via test-point
when System Bootloader is used, beware of all possible communication interfaces, see AN2606
when designing debug interface, add SWO, NRST and Vdd to debug probe cable
use 'Connect under reset' feature of your debugger
if your SW enables SMPS accidentally, boot to System Bootloader, then erase the chip via SWD/JTAG
setup Option Bytes properly, use as many security features as possible, but beware of debug limitations, when enabling chip protections - power cycle may be needed!
align Vector table to 1 kB boundary
define concise memory layout, respecting different access limitations of various bus masters
use MPU to define regions with caching\buffering\reordering enabled\disabled, esp. for buffers shared with DMA units and core
choose the right RTL (C library linked with your project) with right set of features, test it properly
remove retargeting in Release version of your SW
define macro USE_FULL_ASSERT in Debug mode, allows catching runtime errors when calling HAL routines
implement fault handlers, log errors and define response strategy (system reset, halt, recoverable situations)
use watchdogs and implement SW structure and self-test according to X-CUBE-CLASSB
test stack and heap usage and use the benefit of error codes returned by critical routine calls