HAL_FLASHEx_Erase HardFault_Handler STM32G030K8

sde c.1 · ‎2023-10-27

Hello,

I have developed an application that saves accelerometer data when a machine starts up. This data is stored in the last page of the controller's memory (page 31).

I'm encountering a HardFault exception that occurs intermittently, roughly 1 in 20 to 50 times when running the code with the debugger attached. However, it consistently happens when flashing and running the application without the debugger.

I'm seeking guidance on how to effectively debug this issue. Here's a snippet of the relevant code:

The page parameter is 31, with a size of 1 when this function is called.

Any assistance in troubleshooting this problem would be greatly appreciated.

static _Bool flash_erase_pages(uint8_t page,uint8_t size)
{
	static FLASH_EraseInitTypeDef EraseInitStruct;
	uint8_t bResult=0;
	uint8_t retry=0;

	static volatile uint32_t flasherror;
	uint32_t PAGEError;

	do{
		/* Unlock the Flash to enable the flash control register access *************/
		HAL_FLASH_Unlock();

		 /* Clear OPTVERR bit set on virgin samples */
		__HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_OPTVERR);


		/* Fill EraseInit structure*/
		EraseInitStruct.TypeErase   = FLASH_TYPEERASE_PAGES;
		EraseInitStruct.Banks = FLASH_BANK_1 ;
		EraseInitStruct.Page = (uint32_t)page;
		//EraseInitStruct.NbPages     = ((EndPage - StartPage)) +1;
		EraseInitStruct.NbPages     = size;

		if (HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError) != HAL_OK)
		{
			/*Error occurred while page erase.*/
			flasherror = HAL_FLASH_GetError ();
		}
		else{
			bResult = 1;
		}

		HAL_FLASH_Lock();

		if(bResult) 	return true;
		HAL_Delay(1);
		if(++retry>5) 	return false;
	}while(1);
}

on the occasional times i was able to generate this while i was debuggin i got these data :

The HardFault happens in HAL_FLASH_Lock(); -> SET_BIT(FLASH->CR, FLASH_CR_LOCK);

before we enter the HardFault i see this data in the flash_erase_pages function.
PAGEError = 2103
PAGEError = 536876972 (another time)

Which seems to make no sense as there are only 32 pages.
flasherror = 0

Upon a successful flash write, the value of PAGEError is consistently 0xFFFFFFFF. I'm perplexed by what might be causing this unexpected behavior.

Here's the structure of EraseInitStruct for your reference:

Thank you

Tesla DeLorean · ‎2023-10-27

Need to dump full fault context. Want to see what address it's touching or nature of fault. By the lock, assume the erase completed.

Most likely some attempt to execute code that's in the erased section, perhaps via interrupt.

https://github.com/cturvey/RandomNinjaChef/blob/main/KeilHardFault.c

On hardware side, that there's enough bulk capacitance on supply/vcap.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

TDK · ‎2023-10-27

Since it's not consistent, it's probably not an explicit bug in the code. Consider things like stack overflow or RTOS issues, if using one.

Could also be a power issue as Tesla suggests. Flash operations use more power and that can cause issues if your power rail is not stable or sufficient.

> SET_BIT(FLASH->CR, FLASH_CR_LOCK)

This line is harmless, shouldn't be causing any issues.

If you feel a post has answered your question, please click "Accept as Solution".

Tesla DeLorean · ‎2023-10-27

Something in interrupts using this data (Flash, EEPROM Emulation), and blank flash generates an ECC failure? Or this code called from interrupt/callback?

Perhaps add some interlocks around secondary access methods and this erase/initialize code. Zoom out a bit, look at the interactions, perhaps output diagnostic messages so you can see any patterns in the interactions and subsequent failure.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

sde c.1 · ‎2023-10-28

The hardware wasn't the culprit. Even after incorporating additional decoupling capacitors, the issue remained. The PCB is bustling with activity, and I utilize FreeRTOS for task management. My plan is to designate an exclusive task for erasing/writing a page, ensuring it's done when no other processes or interrupt codes are engaged. The page wipe lasts 20ms. Should I turn off interrupts during this period? It's plausible that the RS485 bus might attempt communication while the page is being wiped, and the PCB should respond.. So i prefer interrups on while wiping.

sde c.1 · ‎2023-10-29

I have stripped the code to a minimum and still got HardFault_Handler , i'm super confused about what's going on.
I removed RTOS, and run this code 4 seconds after startup, at that moment the application crashes:


	static FLASH_EraseInitTypeDef EraseInitStruct __attribute__((aligned(8)));;
	uint32_t PAGEError __attribute__((aligned(8)))=0xFFFFFFFFU;
	uint32_t StartPageAddress=0;
	volatile HAL_StatusTypeDef status;

	WritePin(ACTIVELED_OUT, 0);	// active led

	do{

	}while((FLASH->SR & FLASH_SR_BSY1));

	  /* Unlock the Flash to enable the flash control register access *************/
	  // __disable_irq();

	/* Fill EraseInit structure*/
	   EraseInitStruct.TypeErase   = FLASH_TYPEERASE_PAGES;
	   EraseInitStruct.Banks = FLASH_BANK_1 ;
	   EraseInitStruct.Page = 31;
	   EraseInitStruct.NbPages     = 1;

	   /* Clear OPTVERR bit set on virgin samples */
	   __HAL_FLASH_CLEAR_FLAG(FLASH_FLAG_EOP | FLASH_FLAG_OPERR | FLASH_FLAG_WRPERR | FLASH_FLAG_PGAERR | FLASH_FLAG_MISERR | FLASH_FLAG_PGSERR | FLASH_FLAG_FASTERR);

	   HAL_FLASH_Unlock();
	   /* Erase the user Flash area*/

	   status = HAL_FLASHEx_Erase(&EraseInitStruct, &PAGEError);
	   FLASH_WaitForLastOperation(1000);
	   HAL_FLASH_Lock();
	   if (status != HAL_OK)
	   {
		 /*Error occurred while page erase.*/

		//   __enable_irq();
		  return HAL_FLASH_GetError();
	   }

	   WritePin(ACTIVELED_OUT, 1);	// active led

The hardFault triggers at HAL_FLASH_Lock(); Line 31
I have captured some registers and see this :

pFlash.ErrorCode contains value = 0xe0 , and tripple checked all possible causes of the error bits (PGAERR , SIZEERR and PGSERR ), but found nothing.

Any other idea's what i can do to find the problem?

TDK · ‎2023-10-29

Does it happen every time?

Does it happen when interrupts are disabled?

Look at the disassembly at and around 0x08006C8A to see the exact statement that causes the fault.

Look at other relevant registers to determine more information about the cause of the hard fault. STM32CubeIDE has a fault analyzer, would recommend using that if you are unfamiliar. See "7.2 Using the Fault Analyzer"

https://www.st.com/resource/en/user_manual/dm00629856-stm32cubeide-user-guide-stmicroelectronics.pdf

If you feel a post has answered your question, please click "Accept as Solution".

Tesla DeLorean · ‎2023-10-29

Yes, does kind of seem like something is blocking, or breaking context in the background, and something is getting corrupted. Perhaps disable / enable interrupts at a level out here, see if you can move the failure

Have a fault handler that outputs the MCU register, both the general and those holding the faulting details. Interest to see the memory address it's failing on, not the code address, but the read/write memory address it's acting upon.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

sde c.1 · ‎2023-10-29

every time, also with irq disabled. i go trough the Fault Analyze part right now

Shirley.Ye · ‎2023-10-30

please check if your program occupied the second bank of flash. If so you can not program the flash.