How to clear or bypass hard-fault in STM32L433?

Aatif Shaikh1 · ‎2021-06-16

Hello,

I'm working on a project, wherein I've interfaced my STM32L433 controller with a Flash-IC (Nor flash) on the SPI port. At present, there is continuously bulk reading and writing operation is going on. Due to random power-cut, some of the flash sectors gets corrupted. When the device tries to rewriting something on those corrupted sectors, a hard fault gets generated. And my device gets rebooted because of the watchdog (this creates an infinite loop). Now, I just wanted to clear the generated hard fault interrupt and skip the processing of that particular corrupted sector.

Tesla DeLorean · ‎2021-06-16

I come at this from an architectural understanding of micro-controllers and processors. College level texts probably have this well covered these days. Perhaps Patterson/Hennessy ARM Architecture type thing, or some of Furber's original works.

You probably want to start by understanding how the micro-controller stacks context for interrupts and exceptions, for this a more generalized coverage might be found in books on the Cortex-Mx by Joseph Yiu, and the ST Programming Manuals for the M3/M4

ARM also has Technical Reference Manuals (TRM) explaining how the cores function.

If you understand how the context is saved, you can fish around in there, see where the program counter (PC), what instuction is there, and modify the PC and other registers, or emulate instructions, or retry them. The try/catch type stuff takes a narrow range of potential PC values in a region of interest, and then diverts execution, at the same stack scope/depth, to some code that handles the "what to do if that just failed" scenario. How polished or crude this is depends on your desire to do system level coding, in the simplest case you could handle one particular instruction you know might fail, and then just step beyond it replacing R0 (or whatever) with the content you want, or that flags failure, perhaps in a variable that you can check later.

https://www.st.com/resource/en/programming_manual/dm00046982-stm32-cortex-m4-mcus-and-mpus-programming-manual-stmicroelectronics.pdf

See also the hard fault routines I published in the past for outputting useful diagnostic information. Imagine how you can modify the PC, or other registers, in that context and return();

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

View solution in original post

Tilen MAJERLE · ‎2021-06-16

You cannot mask out Hard-fault interrupt, it has lower prio setup than Non-Maskable-Interrupt setup.

If power fails while you program ext flash and you get hard-fault next time, it means that your code doesn't process something properly. Thanks to STM32CubeIDE tool, you can see where and what instruction triggered HF error.

Tesla DeLorean · ‎2021-06-16

Sure, but what's being asked for here is basically structured exception handling, a try/catch type construct..

You can actually return from certain faults, you can advance or change the program counter, or fix/address the underlying cause in some cases and retry.

A good reading of the processor TRM might be in order, and how to deconstruct/disassemble the offending instuctions.

The simplest course would be to handle an limited or singular case, where for example you probe a memory location, and the fault handler recognizes this, and changes the state slightly so it returns a flag the user application can observe and skip to the next test.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Aatif Shaikh1 · ‎2021-06-16

This is what exactly I've been looking for, but unfortunately, I'm not finding any reference related to that.

Do you have any reference or document which could help me in understanding the internal architecture of the STM device and achieving this task?

Tesla DeLorean · ‎2021-06-16

I come at this from an architectural understanding of micro-controllers and processors. College level texts probably have this well covered these days. Perhaps Patterson/Hennessy ARM Architecture type thing, or some of Furber's original works.

You probably want to start by understanding how the micro-controller stacks context for interrupts and exceptions, for this a more generalized coverage might be found in books on the Cortex-Mx by Joseph Yiu, and the ST Programming Manuals for the M3/M4

ARM also has Technical Reference Manuals (TRM) explaining how the cores function.

If you understand how the context is saved, you can fish around in there, see where the program counter (PC), what instuction is there, and modify the PC and other registers, or emulate instructions, or retry them. The try/catch type stuff takes a narrow range of potential PC values in a region of interest, and then diverts execution, at the same stack scope/depth, to some code that handles the "what to do if that just failed" scenario. How polished or crude this is depends on your desire to do system level coding, in the simplest case you could handle one particular instruction you know might fail, and then just step beyond it replacing R0 (or whatever) with the content you want, or that flags failure, perhaps in a variable that you can check later.

https://www.st.com/resource/en/programming_manual/dm00046982-stm32-cortex-m4-mcus-and-mpus-programming-manual-stmicroelectronics.pdf

See also the hard fault routines I published in the past for outputting useful diagnostic information. Imagine how you can modify the PC, or other registers, in that context and return();

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2021-06-16

https://www.amazon.com/Computer-Organization-Design-ARM-Architecture/dp/0128017333

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Aatif Shaikh1 · ‎2021-06-17

Thanks a lot, big fan!

Your ideas, knowledge and methods are remarkable. Over the year, you helped me a lot in growing and overcoming the issues!.

Cheers!

Tilen MAJERLE · ‎2021-06-17

While theory is one thing, checking why hard-fault happens and solving this should be preferred way instead. Bypassing hard-fault is like bypassing blue-screen on windows and expecting to go back to original state.

Good luck

Aatif Shaikh1 · ‎2021-06-17

After trying few things, I found a method that seems to work quite well as of now. This method involves some basic simple steps. At first, whenever the fault occurs, the program needs to reset all the peripherals that are in use, wait for a few milliseconds and then reinitialize all the peripherals as you do at the boot-up (whatever the previous state was). This process does not seem to affect the program counter at all. Therefore, after performing this process, the control is being transferred back to the last working state or the next instruction which was supposed to be executed (if the fault hasn't occurred).

/* ----------------------------------------------------------------------------
*                           STRUCTURE VARIABLES
* ----------------------------------------------------------------------------
*/
/*debugg structure*/
  typedef struct
  {  uint8_t ucDebuggEnableDisable;
  }stDebuggPara_All;
 
/*fault structure*/
  typedef struct
  {  uint8_t ucSystemHardFault;
  }stSystemFault;
 
/* ----------------------------------------------------------------------------
*                           GLOBAL VARIABLES
* ----------------------------------------------------------------------------
*/
   /*Debugg variables*/
   stDebuggPara_All       stVarDebuggPara_All;
  /*System fault variables*/
  stSystemFault              stVarSystemFault;
 
/*----------------------------------------------------------------------------
*                           MACROS
*----------------------------------------------------------------------------
*/
/*manually Enable/Disable the debuggs*/
#define _DEBUG_ALL_ENABLE_DIABLE   if(stVarDebuggPara_All.ucDebuggEnableDisable == SET)
 
 
/*****************************************************************************
 **@Function 	  	: 	HardFault_Handler
 **@Descriptions	: 	This function handles Hard Fault exception.
 **@parameters		: 	None
 **@return		: 	None
*****************************************************************************/
void HardFault_Handler(void)
{
   #if  _USE_DEBUG_COM
        _DEBUG_ALL_ENABLE_DIABLE
         printf("\n\r*******HardFault_Handler*******\n\r");
   #endif
 
  /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
  LL_APB2_GRP1_EnableClock(LL_APB2_GRP1_PERIPH_SYSCFG);
  LL_APB1_GRP1_EnableClock(LL_APB1_GRP1_PERIPH_PWR);
  NVIC_SetPriorityGrouping(NVIC_PRIORITYGROUP_4);
  /*wait for few ms*/
  FnDelay_1ms(2);
  /*Initialize all the required peripherals again*/
  FnsystemInit( );
 /*set the hardfault flag to perform the error handling*/
  stVarSystemFault.ucSystemHardFault = SET ;
}