cancel
Showing results for 
Search instead for 
Did you mean: 

Data-like flash read with memory barrier?

LCE
Principal

Heyho,

I recently finished - or so I thought - the ethernet bootloader for STM32H73x.

The bootloader can update the application and vice versa, updating, jumping back and forth, all working fine.

Until yesterday all of a sudden (when I was working on some other http POST stuff unrelated to the internal flash) the bootloader somehow got stuck when verifying the freshly written application.
Verification goes like:
- read (Octo-) SPI flash image page (256 bytes)
- compare SPI flash buffer to internal flash via pointer

The hard fault info that I got was not really useful, I was looking for all kinds of stuff, but could not solve the problem.

Until I added some memory barriers, as you can see below - combined with loading each flash byte into a variable, and then comparing (before that I used directly used the flash byte pointer sFlashIntCtl.pu8ChkAddr[i] in the for loop).
Now it's working again.

BUT... does that make sense?
Or was that just coincidence, and I have another problem?

				/* ++++++++++++++++++++++++++++++++++++++++++++++++++++++ */
				/* read from SPI flash
				 *	BLOCKING &
				 *	### outside of state machine ###
				 */
				sFlashIntCtl.u8ChkError = OspiFlashRdPage();
				__DSB();

				if( sFlashIntCtl.u8ChkError != HAL_OK )
				{
					#if DEBUG_FLASH_INT
						uart_printf("\n\r# ERR: FLINT_STATE_CHECK OspiFlashRdPage()\n\r");
					#endif 	/* DEBUG_FLASH_INT */

					sFlashIntCtl.u32ChkErrors++;
					u8FlashIntState = FLINT_STATE_ERROR;
				}
				else
				{
					/* compare */
					uint32_t u32ErrOld = sFlashIntCtl.u32ChkErrors;
					uint8_t u8FlashByte = 0;

					/* compare internal flash to SPI flash page buffer */
					for( uint32_t i = 0; i < (uint32_t)OSPI_FLASH_PAGE_SIZE; i++ )
					{
						u8FlashByte = sFlashIntCtl.pu8ChkAddr[i];
						__DSB();
						if( u8OspiFlashPageBuf[i] != u8FlashByte )
						{
							sFlashIntCtl.u32ChkErrors++;
						}
						if( (sFlashIntCtl.u32ChkBtDone + i) >= (sSpiFileInfo.u32Size - 1) ) break;
					}
					sFlashIntCtl.u32ChkBtDone += sFlashIntCtl.u32ChkLen;

 

PS: why is that very useful </> code insert button back in line 2? This should be the first one to appear, it's surely more often used than other stuff that appears first.

22 REPLIES 22

This may be a case of runaway pointer which causes random places in RAM to be overwritten.

That's why I recommended to capture the moment when it's getting overwritten using data breakpoint (watchpoint).

However, note, that when using data breakpoints, execution does not stop exactly on the instruction which causes the access to watched variable/area, but a couple of instructions after that.

JW

@waclawek.jan thanks again...

> This may be a case of runaway pointer which causes random places in RAM to be overwritten.

How is that possible, any idea?

In this case, the CGI table is set / initialized at start, and then never written / set again.

I don't use any memory allocation - except for lwIP, which has it's own heap, and that is big enough for sure (checked with lwIP memory statistics, and confirmed by lots of troubles if it gets too small).

I now know how to solve this particular case, but I'd like to understand what's going on to prevent that in the future.

I put forth, I don't claim runaway pointer *is* the root of your problem (but it much sounds like that).

> How is that possible, any idea?

The causes of runaway pointer (or array index, which in C is the same) are many and may be very indirect. For example, accepting a pointer or index from "outside" of the program (e.g. from some data stored in external memory, which got damaged through hardware problem either with the memory or with transport paths; from communication with other system, etc.); bug in program using data from an area already reused (e.g. incorrectly used data from circular buffer already overwritten by other incoming data e.g. by DMA); bug in program stemming from atomicity-related issues (i.e. sharing nonatomic data between interrupt and "main", or  between threads in multitasking, which is the same). In other words, I can't pinpoint, but not even give you a clue, as for where the error stems from, as it may be very subtle and very much depends on everything (e.g. even timing) in your particular program and its surrounding environment.

> In this case, the CGI table is set / initialized at start, and then never written / set again.

Result of runaway pointer is damaged variable(s), but this variable has absolutely no relationship to the pointer which run away. So you can't/shouldn't investigate the ways how CGI table is intended to be set in your program. In other words, you should mentally abstract from your source, and think how the brutal (from Latin brūtus (dull, ***)) machine (the processor) works - it simply doesn't care about your source, variables, structures, etc.

> I'd like to understand what's going on to prevent that in the future.

That's why it's important to find the exact cause, not just cover up by a symptomatic patch.

JW