intermittent reading wrong pointer value

ckw089 · ‎2022-06-30

Hi support,

I am working on a project that is using uC/OS-III running on STM32H750XB using IAR with Segger J-Link debugger.

Recently we encounter a strange issue where reading a pointer (itsScanSampleAnnun) will return a wrong value intermittently. The value can be a random number and usually is an invalid address that will generate hard fault when accessing it.

Thus, I added debug code to cache the pointer value in a static variable and break when their values are different.

But, when I inspect the value of itsScanSampleAnnun in the Memory window, it is correct.

By looking at the assembly code, the pointer value of bkPtr is loaded into R2 register and the pointer value of itsScanSampleAnnun is loaded into R3 register. As you can see below, the R3 register is having a wrong value. Not sure where and how the wrong value is getting loaded into the register.

Appreciate if you can provide any advice to us. Thanks …

rgds,

kc Wong

ckw089 · ‎2022-08-11

Hi support,

So far, from what we have observed, the hard fault occurs when a CPU register is loaded with the wrong value while the memory has the correct value.

Our suspect now is that the CPU register loads the wrong value from the D-Cache.

Also we found an errata sheet on ST web site.

https://www.st.com/resource/en/errata_sheet/es0396-stm32h750xb-and-stm32h753xi-device-limitations-stmicroelectronics.pdf

For now, the wrong value loaded into the CPU register is most likely because of the data corruption as described in the errata sheet of "2.1.1 Cortex®-M7 data corruption when using Data cache configured in write-through".

Any comment ? Or anyone has encountered a similar issue ?

rgds,

kc Wong

S.Ma · ‎2022-08-11

Could it be that the memory region is shared between cache and any dma? Or that task stack size overflows? How about chanving the link merge sequence of obj files to check its effect on code behaviour?

ckw089 · ‎2022-08-11

Based on the MPU setting, the external SDRAM is configured as not shareable.

static void MPU_Config (void)
{
  #if (DATA_AREA == USE_EXTERNAL_SDRAM) || (CODE_AREA == USE_EXTERNAL_SDRAM)
  MPU_Region_InitTypeDef MPU_InitStruct;
 
  /* Disable the MPU */
  HAL_MPU_Disable();
 
  /* Configure the MPU attributes for SDRAM */
  MPU_InitStruct.Enable = MPU_REGION_ENABLE;
  MPU_InitStruct.BaseAddress = SDRAM_FMC_BANK1;
  MPU_InitStruct.Size = MPU_REGION_SIZE_32MB;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.Number = MPU_REGION_NUMBER0;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
  MPU_InitStruct.SubRegionDisable = 0x00;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
  HAL_MPU_ConfigRegion(&MPU_InitStruct);
 
  /* The regions can overlap, and can be nested.
   * The region 15 has the highest priority and the region 0 has
   * the lowest one and this governs how overlapping the regions behave.
   */
 
#if (CODE_AREA == USE_EXTERNAL_SDRAM)
  /* Configure region 1 to allow instruction access for code area */
  MPU_InitStruct.Size = MPU_REGION_SIZE_4MB;
  MPU_InitStruct.Number = MPU_REGION_NUMBER1;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
  HAL_MPU_ConfigRegion(&MPU_InitStruct);
#endif
 
  /* Enable the MPU */
  HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
#endif
}

uC/OS-III has built-in capability (redzone checking) to detect stack overflow on both the task stack and exception stack. So far, we don't see any stack overflows.

Not sure how to change the link merge sequence for an IAR project ?

ckw089 · ‎2022-08-14

Hi support,

Based on the errata sheet, below is the recommended workaround for "2.1.1 Cortex®-M7 data corruption when using Data cache configured in write-through".

Workaround

There is no direct workaround for this erratum.

Where possible, Arm® recommends that you use the MPU to change the attributes on any write-through memory to write-back memory. If this is not possible, it might be necessary to disable the cache for sections of code that access write-through memory.

I have tried the 2 write-back configuration (row #4 and #8) in Table 4 of AN4838, but it does not help.

https://www.st.com/resource/en/application_note/dm00272912-managing-memory-protection-unit-in-stm32-mcus-stmicroelectronics.pdf

Any comment ? Or anyone has encountered a similar issue ?

rgds,

kc Wong

ckw089 · ‎2023-01-03

Just want to check is there a new batch of STM32H750XB that has fixed this device limitation ?

Pavel A. · ‎2023-01-03

> I have tried the 2 write-back configuration (row #4 and #8) in Table 4 of AN4838, but it does not help.

This is a good sign IMHO. Means, the write-thru erratum is not the culprit.

Could it be a "glitch" of the external memory?

Or the register is modified and not restored properly by interrupt hander?

ckw089 · ‎2023-01-03

If it is a "glitch" of the external memory or the register is modified and not restored properly by interrupt hander, then the issues should still be there when D-Cache is disabled.

With D-Cache is disabled, we no longer see all those strange issues. But, of course that has negative impact to the performance of the code.