We see that the new lot of production units are hitting usage fault with INVPC status on USFR.

Shinoy · ‎2021-05-14

We have a situation on our stm32f765 running freeRTOS on one of our production ongoing avionic product. We see that with new lot of production units are hitting usage fault with INVPC status on USFR.

Can some advice be provided on figuring out the last PC value before the exception happened, the main/process stack does not reflect this.

Also looking at the errata it looks like there seems to be an issue around usage of data cache.

Does this in anyway correlate to the above issue we are facing.

Tesla DeLorean · ‎2021-05-14

>>the main/process stack does not reflect this.

What does it reflect?

Perhaps provide a more through dump of the registers at the fault, and the stack the fault used, see LR to determine

Would look for callbacks that aren't initialized, empty vectors, and stack corruption/overrun.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2021-05-14

https://community.st.com/s/global-search/hardfault_handler_c

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Shinoy · ‎2021-05-17

Thanks Tesla DeLorean for your response.

The fault is not consistent and it is happening at different places in code.

I went one more step ahead by commenting out the the Data Cache and Instruction Cache enabling code as I saw an Errata which says

"Cortex®-M7 data corruption when using data cache configured in write-through Description This limitation is registered under Arm® ID number 1259864 and classified into “Category A�?. If a particular sequence of stores and loads is performed to write-through memory, and some timing-based internal conditions are met, then a load might not get the last data stored to that address."

I dont see any fault after making this change in code.

I commented out the below lines of code for verifying this.

From this, Can I conclude that the fault triggered was due to this issue with Data cache as mentioned in Errata??

/* Enable branch prediction */

SCB->CCR |= (1 <<18);

__DSB();

SCB_InvalidateICache();

SCB_EnableICache();

SCB_InvalidateDCache();

SCB_EnableDCache();

Uwe Bonnes · ‎2021-05-18

Random error often point to erratic supply, overheated device, too high frequency or too few wait states.

Tesla DeLorean · ‎2021-05-18

Yes, would definitely back-off on the flash wait states. Some of the ST examples seem a bit aggressive, and to be honest most of the parts with ART or other caching do a good job at masking the slowness of the FLASH, and do have a more aggressive prefetch path than SRAM offers.

The F2 parts did have a critical path in the ART/Prefetch, which seemed to be particularly triggered by GNU/GCC generated code.

This F7 part has a very early version of the CM7 core, only the F74x/F75x parts use it, all the subsequent parts use newer cores.

I don't think one batch vs the next should be particularly susceptible to the errata, you should read the Device ID and stepping from the DBGMCU registers. Process variables could change the transistor speeds and these might make in more susceptible to supply voltage, but the process window should be fairly tight/constrained to meet specs. Of the things you can change the Flash Wait States would be the first thing to look at.

Also look at what's happening with VCAP pins, the voltages, and the capacitors placed. Issues here have been seen to generate the types of failure reported.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Shinoy · ‎2021-05-20

Our hardware is in production close to an year now and this issue started appearing recently with new units that are being produced.

I checked the REV_ID from the DBGMCU register of one of the faulty hardware unit and the good news is that the REV_ID is matching with the one that is mentioned in the errata.

Uwe Bonnes · ‎2021-05-20

With the recent part shortage and the need to buy from sources with less reputation, consider also some other part on your board to cause the error.

Tesla DeLorean · ‎2021-05-20

Yeah, well that's a conversation you're going to need to have with an ST FAE assigned to your account.

If the ICs and steppings are all the same, I'd still favour board level parts as a contender. Try removing the STM32 IC and switch a "working" IC onto a board that is currently problematic, and also a new part to an old build board. See if the behaviour follows the IC

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Tesla DeLorean · ‎2021-05-20

Please dump core info, perhaps @Ons KOOLI can help directly, or find staff familiar with the errata and ramifications, and if it is process dependent.

void CORECheck(void) // sourcer32@gmail.com
{
  uint32_t cpuid = SCB->CPUID;
  uint32_t var, pat;
 
  printf("CPUID %08X DEVID %03X REVID %04X\n", cpuid, DBGMCU->IDCODE & 0xFFF, (DBGMCU->IDCODE >> 16) & 0xFFFF);
 
  pat = (cpuid & 0x0000000F);
  var = (cpuid & 0x00F00000) >> 20;
 
  if ((cpuid & 0xFF000000) == 0x41000000) // ARM
  {
    switch((cpuid & 0x0000FFF0) >> 4)
    {
      case 0xC20 : printf("Cortex M0 r%dp%d\n", var, pat); break;
      case 0xC60 : printf("Cortex M0+ r%dp%d\n", var, pat); break;
      case 0xC21 : printf("Cortex M1 r%dp%d\n", var, pat); break;
      case 0xC23 : printf("Cortex M3 r%dp%d\n", var, pat); break;
      case 0xC24 : printf("Cortex M4 r%dp%d\n", var, pat); break;
      case 0xC27 : printf("Cortex M7 r%dp%d\n", var, pat); break;
 
      default : printf("Unknown CORE\n");
    }
  }
  else
    printf("Unknown CORE IMPLEMENTER\n");
}

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..