Data-like flash read with memory barrier?

LCE · ‎2024-05-14

Heyho,

I recently finished - or so I thought - the ethernet bootloader for STM32H73x.

The bootloader can update the application and vice versa, updating, jumping back and forth, all working fine.

Until yesterday all of a sudden (when I was working on some other http POST stuff unrelated to the internal flash) the bootloader somehow got stuck when verifying the freshly written application.
Verification goes like:
- read (Octo-) SPI flash image page (256 bytes)
- compare SPI flash buffer to internal flash via pointer

The hard fault info that I got was not really useful, I was looking for all kinds of stuff, but could not solve the problem.

Until I added some memory barriers, as you can see below - combined with loading each flash byte into a variable, and then comparing (before that I used directly used the flash byte pointer sFlashIntCtl.pu8ChkAddr[i] in the for loop).
Now it's working again.

BUT... does that make sense?
Or was that just coincidence, and I have another problem?

				/* ++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
				/* read from SPI flash
				 *	BLOCKING &
				 *	### outside of state machine ###
				 */
				sFlashIntCtl.u8ChkError = OspiFlashRdPage();
				__DSB();

				if( sFlashIntCtl.u8ChkError != HAL_OK )
				{
					#if DEBUG_FLASH_INT
						uart_printf("\n\r# ERR: FLINT_STATE_CHECK OspiFlashRdPage()\n\r");
					#endif 	/* DEBUG_FLASH_INT */

					sFlashIntCtl.u32ChkErrors++;
					u8FlashIntState = FLINT_STATE_ERROR;
				}
				else
				{
					/* compare */
					uint32_t u32ErrOld = sFlashIntCtl.u32ChkErrors;
					uint8_t u8FlashByte = 0;

					/* compare internal flash to SPI flash page buffer */
					for( uint32_t i = 0; i < (uint32_t)OSPI_FLASH_PAGE_SIZE; i++ )
					{
						u8FlashByte = sFlashIntCtl.pu8ChkAddr[i];
						__DSB();
						if( u8OspiFlashPageBuf[i] != u8FlashByte )
						{
							sFlashIntCtl.u32ChkErrors++;
						}
						if( (sFlashIntCtl.u32ChkBtDone + i) >= (sSpiFileInfo.u32Size - 1) ) break;
					}
					sFlashIntCtl.u32ChkBtDone += sFlashIntCtl.u32ChkLen;

PS: why is that very useful </> code insert button back in line 2? This should be the first one to appear, it's surely more often used than other stuff that appears first.

Pavel A. · ‎2024-05-14

> The hard fault info that I got was not really useful

Nevertheless, can you show it, please?

LCE · ‎2024-05-15

@Pavel A. Thanks for taking a look at it.

Here comes the problem: I just reverted back to the old version in source code without the memory barriers - now it's working, as it did before for some time. Grmph...

I remember I had:

PRECISERR
BFARVALID

But then in BFAR something like 0x43434343

Here's the hard fault handler, the struct is in no-init DTCM SRAM:

void HardFault_Handler(void)
{
	uint32_t *pu32Stack = NULL;

	sHardFault.u32Event = DEBUG_FAULT_HARD_EVENT;

	/* save some registers */

	register uint32_t temp0 asm("r0");
	register uint32_t temp1 asm("r1");
	register uint32_t temp2 asm("r2");
	register uint32_t temp3 asm("r3");
	register uint32_t temp4 asm("r4");
	register uint32_t temp5 asm("r5");
	register uint32_t temp6 asm("r6");
	register uint32_t temp7 asm("r7");

	register uint32_t tempSP asm("sp");		/* stack pointer = R13 */
	register uint32_t tempLR asm("lr");		/* link register = R14 */

	/* program counter PC = R15 is a little harder to get: */
	register uint32_t tempPC;
	asm("mov pc,%0":"=r"(tempPC));

	sHardFault.u32CFSR 	= SCB->CFSR;
	sHardFault.u32MMFAR = SCB->MMFAR;
	sHardFault.u32BFAR 	= SCB->BFAR;

	sHardFault.u32CpuR[0] = temp0;
	sHardFault.u32CpuR[1] = temp1;
	sHardFault.u32CpuR[2] = temp2;
	sHardFault.u32CpuR[3] = temp3;
	sHardFault.u32CpuR[4] = temp4;
	sHardFault.u32CpuR[5] = temp5;
	sHardFault.u32CpuR[6] = temp6;
	sHardFault.u32CpuR[7] = temp7;

	sHardFault.u32CpuSP = tempSP;
	sHardFault.u32CpuLR = tempLR;
	sHardFault.u32CpuPC = tempPC;

	pu32Stack = (uint32_t *)sHardFault.u32CpuSP;
	sHardFault.u32CpuPCe = pu32Stack[6];

	sHardFault.u32EthTxErr[0] = 0;
	sHardFault.u32EthTxErr[1] = 0;
	sHardFault.u32EthTxErr[2] = 0;
	sHardFault.u32EthTxErr[3] = 0;
	sHardFault.u32EthTxErr[4] = 0;
	sHardFault.u32EthTxErr[5] = 0;
	sHardFault.u32EthTxErr[6] = 0;
	sHardFault.u32EthTxErr[7]++;		/* test increment */

	sHardFault.u32EthTxECode = u32EthTxErrorCode;
	sHardFault.u32EthTxELtst = u32EthErrTxLatest;

	/* save fault time */
	sHardFault.u32TimeNano = ETH->MACSTNR;
	sHardFault.u32TimeSecs = ETH->MACSTSR;

	while( 1 )
	{
	}
}

Edit: no cache used

LCE · ‎2024-05-15

So the error is back, with the memory barriers, so that was not the solution.

2 problems for debugging:

- right now the error only occurs in the bootloader, and that hangs but does NOT enter any fault handler

- I cannot debug the bootloader, because it's getting too big in debug mode

So I have to change the dedicated flash sizes to enable debugging - but I guess that as soon as I debug the problem will not occur... ;)

Let's see...

SofLit · ‎2024-05-15

Hello,

The first thing that comes to my mind: it could be a "failing" speculative access since you're using (Octo-) SPI memory.

Did you activate the MPU for non-used regions?

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS:
1 - This is NOT an online support (https://ols.st.com) but a collaborative space.
2 - Please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help.

LCE · ‎2024-05-15

> The first thing that comes to my mind: it could be a "failing" speculative access since you're using (Octo-) SPI memory.

@SofLit is that possible when the SPI flash is NOT used in memory mapped mode?
SPI flash is only used via "indirect-read / write" mode, and always only with complete page size (256 bytes, a MX25LM51245G on H735 Discovery).

I'm not using MPU at all.

SofLit · ‎2024-05-15

Humm well. So even the external memory is not used the user needs to configure MPU just in case to avoid this behavior. In STM32CubeH7/F7 examples MPU default config has been added just for this purpose.

static void MPU_Config(void)
{
  MPU_Region_InitTypeDef MPU_InitStruct;

  /* Disable the MPU */
  HAL_MPU_Disable();

  /* Configure the MPU as Strongly ordered for not defined regions */
  MPU_InitStruct.Enable = MPU_REGION_ENABLE;
  MPU_InitStruct.BaseAddress = 0x00;
  MPU_InitStruct.Size = MPU_REGION_SIZE_4GB;
  MPU_InitStruct.AccessPermission = MPU_REGION_NO_ACCESS;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.Number = MPU_REGION_NUMBER0;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
  MPU_InitStruct.SubRegionDisable = 0x87;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /* Enable the MPU */
  HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}

To give better visibility on the answered topics, please click on "Accept as Solution" on the reply which solved your issue or answered your question.
PS:
1 - This is NOT an online support (https://ols.st.com) but a collaborative space.
2 - Please be polite in your reply. Otherwise, it will be reported as inappropriate and you will be permanently blacklisted from my help.

waclawek.jan · ‎2024-05-15

> right now the error only occurs in the bootloader, and that hangs but does NOT enter any fault handler

And where does it hang, then?

> I cannot debug the bootloader, because it's getting too big in debug mode

Use the same optimization settings in debug mode than in release, and then you can debug the bootloader. Sure, it requires some thinking to interpret properly the results, but then there's also not much point in developing/debugging a different binary than you intend to deploy, as some errors are hidden by unoptimized code - and this sounds much like one of those.

JW

LCE · ‎2024-05-15

Debug and Release ~~both do NOT use any optimization.~~ Edit: it seems I had size optimization on for Release all the time.

Just for checking, I turned on size optimization for DEBUG, now the bootloader fits.

With this debug version the error happens again, debugger ends in the hard fault handler.

But... it hangs there, debugger stops at

sHardFault.u32Event = DEBUG_FAULT_HARD_EVENT;

and does not continue to save the fault registers (see above).

Stepping back or continuing with debugging kills the debugger, fault registers have not been saved.

When I start the same without debugger, it seems to hang at the same point - and also no fault registers get saved.

TO DO for me: MPU, read, understand, implement.

waclawek.jan · ‎2024-05-16

Hardfault means something already went wrong, so there's no guarantee that any code will execute there (it often does, that's why many people do have hardfault handlers; but I repeat, there's no guarantee. In fact, it may make things worse.)

Place a breakpoint at the very beginning of the hardfault handler (it would be better to have an empty infinite loop there, but at this point don't touch the sources so that you do retain the problem) and work out of content of fault register and stacked registers (here for M4, don't know for M7 but will be similar; primary source is ST's Programming Manual to Cortex-M and ARM's material). Observe the stacked PC, in disasm (best mixed with source) observe what code is a few instructions before that point and try to discern from registers' content, what might have gone wrong. Discuss here with screenshots etc.

I don't quite believe that if you use purely register access to OSPI that MPU does matter, but I don't use the 'H7 and have no first hand experience.

JW