cancel
Showing results for 
Search instead for 
Did you mean: 

STM32F746 In-circuit Flash writing problem

hpipon957
Associate III

Hi

I am using the Nucleo STM32F746 and have a problem with in-circuit flash programming. The flash drivers used have been in various other STM32 M3, M4 and M7 parts without any issues (since about 2010) but with the F746 the effect that I see is that not all written data is programmed.

For example, I write new data via FTP (using Ethernet and TCP/IP stack) and find that if I write a short test file (as example) of just a few bytes the following occurs at the low level (addresses and write ordering controlled by file system level):

  • 1 byte is written at 0x8040005 - written OK
  • two bytes (as short word write) are written to 0x8040006 - written OK
  • two bytes (as short word write) are written to 0x8040008 - write completes without any error but the flash is still 0xffff
  • 1 byte is written at 0x804000a - written OK
  • four bytes (as long word write) are written to 0x8040000 - written OK
  • 1 bytes is written at 0x8040004 - written OK

If I add a 10us delay to the code after each write operation there are no issues and I can write many files with small or large amounts of data without any errors.

I use data cache workarounds in F4 case but I find no errata for the F746 - also the F4 data cache workaroud doesn't change anything.

Is any problem known? Although the 10us delay makes it reliable it would be best to know what is different with this part.

Note that the core is running at 168MHz in the tested case (APB1 42MHz, 84MHz AHB2)

Thanks

Regards

Mark

8 REPLIES 8
TDK
Guru

It sounds like you may not be waiting for the BSY flag to be de-asserted before writing to the next address. You may also need to __DSB() after writing but before polling for BSY to ensure operations are done in the right order.

If you feel a post has answered your question, please click "Accept as Solution".
hpipon957
Associate III

Hi TDK

The BSY flag was being checked but wthout __DSB().

I just tried __DSB() between writing the new value and checking the BSY and it looks now to be OK without the delay.

Therefore it very probably was due to a synchronisation requirement which was not needed in the other parts that I had used with the same code.

Many thanks!

Regards

Mark

Piranha
Chief II

__DSB() is unnecessary restrictive, __DMB() is what you actually need. Is it so hard to read a few lines in documentation to understand how those two differ? Anyway, more details on this:

https://community.st.com/s/question/0D50X0000C4Nk4GSQS/bug-missing-compiler-and-cpu-memory-barriers

Also check out network related issues:

https://community.st.com/s/question/0D50X0000BOtfhnSQB/how-to-make-ethernet-and-lwip-working-on-stm32

hpipon957
Associate III

>>__DSB() is unnecessary restrictive

Since the flash write operation takes several micro-seconds to complete the exact boundary instruction used is not of any real consequence. The most restrictive one is fine.

>> Also check out network related issues:

The TCP/IP stack and Ethernet drivers used are from the uTasker project [https://github.com/uTasker/uTasker-Kinetis] and not lwip. There may be similar issues to those in the thread but there are presently no reports of related issues.

Regards

Mark

Personally would buffer data so it could write aligned WORDs or DOUBLE WORDs to the memory, and avoid pitfalls with ECC

Check also MPU buffering and caching settings

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
hpipon957
Associate III

The flash driver writes data depending on the alignment (single bytes, half-words or full words) of the addresses and size of data - so is flexible.

The application can ensure that all is aligned to be a bit more efficient if it wants but the file system being used reserves 5 bytes which are written after the data content has been written. For processors/memory that can't handle flexible writes (eg. flash that must be written as lines of 128 bit (eg. LPC chips) with ECC) there is an optional buffering that can be enabled. However, apart from the BSY issue due to data synchronisation, there has been no issues with using it without the buffering enabled on STM32 parts so I am happy to keep it like this for the moment. If ever an issue arised that needed it I would of course enable it.

Regards

Mark

Piranha
Chief II

Actually this is the full correct implementation:

FLASH->CR = FLASH_CR_PG;
 
__DMB();
for (; nbData > 0; --nbData) {
	*(volatile uint8_t*)pbMem++ = *(const uint8_t*)pbData++;
	__DMB();
	while (FLASH->SR & FLASH_SR_BSY);
}

DMB is necessary not only after writing memory data, but also before memory writing starts after configuring FLASH peripheral. This is to ensure that CPU doesn't start memory write operation before the store to CR register is complete as that is critically necessary for memory write to succeed. Instruction reordering is allowed because flash memory is not a device type memory like peripheral registers.

Still it's a completely unnecessary delaying of CPU as DMB blocks only load/store instructions but DSB blocks all instructions. In case of flash writing the difference is negligible, but on other tighter and more performance critical operations DSB can loose half of Cortex-M7 dual-issue execution abilities and speed. 🙂