FLASH ECC Codes Cause Bus Fault on STM32H743

jaakjensen · ‎2022-08-26

Hello,

For a few months now I have been having issues with writing data to flash memory on the STM32H743ZIT6. Most of the time, everything works great and I am able to read the data from flash successfully, but every now and again, the flash memory gets corrupted somehow and ECC codes are thrown (both single and double), which causes my device to have a bus fault.

I am running the flash peripheral / AXI bus at 240Mhz and I have the flash wait states set to 4 WS (5 Flash clock cycles). I have followed all the guidelines related to the HW design and have the correct core capacitance on the VCAP pins. I write 256 bits of data to the flash memory approximately every 10 seconds to save the state of my device (of course I increment the write address after each write so that I'm not writing to the same position every time). When the sector is filled, it is then erased and then I start writing to the start of the sector. I also set the BOR bits to the highest voltage setting to try and prevent brown-out issues. I think these are most of the important settings you need to know.

Today while looking at my register settings in debug mode I noticed that the WRHIGHFREQ setting was set to 3 (aka 11) by default... I can't find anywhere in the HAL / code where this is done so it must be set automatically. The manual only lists valid settings for 0, 1, and 2 (see below). Can anyone tell me what the behavior is of the STM32H743ZIT6's flash module is when a setting of 3 is used for WRHIGHFREQ? Is it just invalid / undefined? Maybe this is my issue?

Does anyone have any ideas?

jaakjensen · ‎2022-09-14

If i clear the RDPERR2 using the FLASH CCR2 register bit 23 before issuing an erase request, the issue disappears completely and I don't get get any more issues with erasing.

@F.Belaid Any idea why the RDPERR2 bit in the SR2 register might be getting set?

jaakjensen · ‎2022-09-14

Hm. So strange. I disabled my debug pins (used for timing measurements on the 1msec task) and now I can't recreate the issue I was mentioning earlier. The program is not blocked when erasing flash memory any more either.

All my timing measurements look like this, regardless of whether or not I clear the RDPERR2 bit:

Flash Writes:

Flash Erase (P1) Followed by Flash Write (P0):

jaakjensen · ‎2022-09-14

I feel I am back to square one. I still do not know what causes this issue. Based on the timing measurements, the chances of a reset happening during a flash write is very rare (87usecs required to complete a flash write every 10 seconds) = ~1/100,000 chance.

An erase might be more likely but the likelihood of that is also very unlikely. It would require it be powered on and writing data for: (10 seconds per write * 128kbytes/32 bytes per write = 11.37 hours to fill sector and then issue an erase request) and then powered off exactly at that second when the erase request occurs.

FBL · ‎2022-09-16

Hello @jaakjensen

Can you try

1- Check and clear the RDS and RDP errors prior to the erase/ write operations

2- Disable all interrupts before erase and program.

Maybe when debugging, the Cortex is trying to access memory. So it could reach reserved zone and it could result in an error which occurs only when accessing RDP protected area, so maybe this makes sense

I have found some related posts that could help you

https://community.st.com/s/question/0D50X0000BaKiDBSQ0/spurious-rdperr-and-rdserr-when-all-protection-and-security-settings-are-off?t=1663333087127

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

I'm out of offce with limited access to my emails.
Happy New Year!

Pavel A. · ‎2022-09-16

@jaakjensen In addition to finding the root reason of the programming failure, IMHO it is worth to mask the bus fault exception while reading the troublesome flash area.

You will be able to detect and handle the _expected_ errors, without crashing the program.

jaakjensen · ‎2022-09-16

Hi @F.Belaid this is great information and I am feeling hopeful this may be my issue. I am now clearing the RDS and RDP errors prior to erase and write operations and so far so good. I have created a system to check the flash status register and log error codes before reading and writing, which should help to diagnose the cause in the future.

I am trying to avoid disabling interrupts before erase and program - the 1 msec interrupt is a critical part of the application and shouldn't be stopped, otherwise audio data frames will be dropped and the user of this device will notice glitches.

I will look into using the MPU to map partially undocumented parts of memory as "execute never". Can your recommend any relevant ST application notes on configuring the MPU for H7?

jaakjensen · ‎2022-09-16

This is great advice @Pavel A.. The user interface of the product I am developing for is fairly simple but I've put together a system that now shows some error codes using LEDs when this exception is generated. The user can then clear them and the device will continue working as expected. I'm going to continue developing this further and see if I can find an elegant way to recover from it.

FBL · ‎2022-09-19

Hello @jaakjensen ,

I recommend to check this application note AN4838. It covers H7 series.

Also you can refer to this article on How to configure the MPU of an STM32 using STM32CubeMX.

When your question is answered, please close this topic by choosing Select as Best. This will help other users find that answer faster.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

I'm out of offce with limited access to my emails.
Happy New Year!

jaakjensen · ‎2022-11-16

Hi @F.Belaid I just wanted to thank you again for the help with my issue. As of today, we have shipped 100+ units and had zero field failures since checking and clearing the RDS and RDP bits before erase/write operations.

FBL · ‎2022-11-17

Thank you again @jaakjensen. Your feedback made my day.

Hoping that using the MPU to separate data between processing tasks did help.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.

I'm out of offce with limited access to my emails.
Happy New Year!