HAL_FLASHEx_Erase HardFault_Handler STM32G030K8

sde c.1 · ‎2023-10-27

Hello,

I have developed an application that saves accelerometer data when a machine starts up. This data is stored in the last page of the controller's memory (page 31).

I'm encountering a HardFault exception that occurs intermittently, roughly 1 in 20 to 50 times when running the code with the debugger attached. However, it consistently happens when flashing and running the application without the debugger.

I'm seeking guidance on how to effectively debug this issue. Here's a snippet of the relevant code:

The page parameter is 31, with a size of 1 when this function is called.

Any assistance in troubleshooting this problem would be greatly appreciated.

static _Bool flash_erase_pages(uint8_t page,uint8_t size)
{
	static FLASH_EraseInitTypeDef EraseInitStruct;
	uint8_t bResult=0;
	uint8_t retry=0;

	static volatile uint32_t flasherror;
	uint32_t PAGEError;

	do{
		/* Unlock the Flash to enable the flash control register access *************/
		HAL_FLASH_Unlock();

		 /* Clear OPTVERR bit set on virgin samples */
		__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_OPTVERR);


		/* Fill EraseInit structure*/
		EraseInitStruct.TypeErase   = FLASH_TYPEERASE_PAGES;
		EraseInitStruct.Banks = FLASH_BANK_1 ;
		EraseInitStruct.Page = (uint32_t)page;
		//EraseInitStruct.NbPages     = ((EndPage - StartPage)) +1;
		EraseInitStruct.NbPages     = size;

		if (HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError) != HAL_OK)
		{
			/*Error occurred while page erase.*/
			flasherror = HAL_FLASH_GetError ();
		}
		else{
			bResult = 1;
		}

		HAL_FLASH_Lock();

		if(bResult) 	return true;
		HAL_Delay(1);
		if(++retry>5) 	return false;
	}while(1);
}

on the occasional times i was able to generate this while i was debuggin i got these data :

The HardFault happens in HAL_FLASH_Lock(); -> SET_BIT(FLASH->CR, FLASH_CR_LOCK);

before we enter the HardFault i see this data in the flash_erase_pages function.
PAGEError = 2103
PAGEError = 536876972 (another time)

Which seems to make no sense as there are only 32 pages.
flasherror = 0

Upon a successful flash write, the value of PAGEError is consistently 0xFFFFFFFF. I'm perplexed by what might be causing this unexpected behavior.

Here's the structure of EraseInitStruct for your reference:

Thank you

sde c.1 · ‎2023-10-30

The STM32G030K8 only have 1 bank

sde c.1 · ‎2023-10-30

I've made some modifications to the code to initiate a page wipe 1 second after startup and then every 4 seconds subsequently. Additionally, I've set up an LED on my board to light up immediately before and right after the page erasure. The LED's behavior serves as an indicator for me: if it lights up briefly, the page erasure was successful. However, if it remains illuminated, it suggests a hard fault has occurred.

The page erases correctly in the following scenarios:
1) When I run the program with a debugger.
2) When I've just bootloaded the code, (the application starts automatically after bootload), the page wipe performs as expected.

However, a hard fault occurs when:
1) I bootload the code and then power cycle the controller (turning it off and back on).

I'm perplexed as to why the system behaves differently after a power cycle, especially since it operates smoothly after bootloading or while debugging without any power interruption.

Would you be open to assisting me in investigating this matter? I'd be happy to treat you to a few cups of coffee or more :) in appreciation for your help!

TDK · ‎2023-10-30

Hmm, I'm not sure I can help much more than what I'm writing here. I don't have a G030 board and I'll be gone for the next few weeks.

If you really need the problem solved, here's what I would do if it were my project:

When it hard faults, look at the hard fault to determine the root cause. STM32CubeIDE has a hard fault analyzer. You can attach a debugger to a chip in such a state without resetting it if you edit your debug/run configuration and set Startup ->Download -> False and Debugger -> Reset Behavior -> None.
Are you using that page of flash for anything? Consider removing that logic during debugging and see if the issue still remains. If not, it could be a program logic issue.
Remove all the "retry" logic from the code and only try once. Retrying multiple times should never be necessary and at worst it will mask bad behavior.
Consider implement a debug output that you can view without a debugger connected. I use a UART stream for this, but there are other options. This will give you better debug output than a simple LED being on/off. You can write things like "FLASH->SR=...".
Buy an ST board (known good hardware) and try to replicate the issue there. Perhaps the STM32G031K8T6 is the closest, but I didn't look in detail at the hardware differences. If you can't replicate it there, it could be a hardware issue. Perhaps power is insufficient or takes too long to come up. You're delaying a few seconds after startup which should eliminate this as an issue, but who knows.

If you feel a post has answered your question, please click "Accept as Solution".

sde c.1 · ‎2023-10-30

Great advice !
1) "You can attach a debugger to a chip in such a state without resetting it".
This i did not know but can be a great help !
2) After removing the retry logic and introducing a brief delay post-initializing the SPI CS pin, I noticed the print started to work again. However, I'm puzzled as to why an SPI read attempt leads to a Hard Fault when erasing the last page, especially given that this fault appears several instructions later. I double checked the code that talks with the SPI chip, this seems oké.

// In main.c, I used this retry code to initialize the SPI interface.
// The initialization did not worked the first attempt because I attempted communication // too quickly after initializing the CS pin. After introducing a brief delay, the retry  // became unnecessary, and the chip no longer enters a Hard Fault state.



  uint8_t retry=0;
  for (retry = 0; retry < 3; retry++) {
	  paccelero_handle = lis2dw12_init(&hspi2, SPI_CS) ;
		if (!paccelero_handle) {
			pApp_h->status.flags.bAcceleroError = 1;
			SetLedPattern(led_Error);
		} else {
			SetLedPattern(led_Alive);
			break;
		}
  }

lis2dw12_t *lis2dw12_init(SPI_HandleTypeDef *phspi,Pins_t cs_pin){
	lis2dw12.pSPIinterface = phspi;
	lis2dw12.cs_pin = cs_pin;
	WritePin(lis2dw12.cs_pin,1) ; //SPI mode disable
	HAL_Delay(2); // -> adding this delay solved the Hard fault at page wipe 

	if(!lis2dw12_read_device_info()){
		return 0;
	}
	return &lis2dw12;
}

3) I've incorporated USART output and added several printf statements in the code. These were the results I observed when the Hard Fault occurred. However, my limited knowledge of STM assembler and its inner mechanisms prevents me from comprehending the entire situation. I'll need more time to study and analyze this thoroughly.

R0 = 0xFFFFFFFF
R1 = 0xFFFFFFFF
R2 = 0xFFFFFFFF
R3 = 0xFFFFFFFF
R12 = 0x20000410
LR [R14] = 0x200003EC subroutine call return address
PC [R15] = 0x00000001 program counter
PSR = 0xFFFFFFF9
ICSR = 0x0440F003
AIRCR = 0xFA050000
SCR = 0x00000000
CCR = 0x00000208
SHCSR = 0x00000000

Pavel A. · ‎2023-10-30

> LR [R14] = 0x200003EC subroutine call return address

Does your code run in the RAM?

Piranha · ‎2023-10-30

please check if your program occupied the second bank of flash. If so you can not program the flash.

You are wrong and it has nothing to do with read-while-write. Learn how these things work before you "consult" other people!

sde c.1 · ‎2023-10-31

no

Pavel A. · ‎2023-10-31

Then how come that return address in LR points to RAM? Stack overwrite, again?

sde c.1 · ‎2023-10-31

If I had a clear answer, I would provide it. To truly understand what happens when a Hard Fault occurs, I need to delve deeper into the study of STM registers, assembly, and their inner workings. For now i got the issue fixed , thanks to checking all my code and make sure no repeats are happening just likeTDK proposed, but i do not know what triggered the Fault.