STM32L1 + Instruction Prefetch + Voltage Scaling + Debugger = Unexpected Memory Values

Andrew Sund · ‎2017-08-11

Posted on August 11, 2017 at 20:44

Hi, we are using an STM32L151VET device, using CubeMX generated startup code (any modifications seem benign to me for reasons that follow) and have been encountering a strange error when running with a debugger attached.

Processor is running on MSI @ 2MHz, internal flash (flash latency not an issue at this point)
Instruction prefetch is enabled prior to this in HAL_Init(), using __HAL_FLASH_PREFETCH_BUFFER_ENABLE().
Multiple boards used.
Segger J-Link and ST-Link/v2 debuggers used.

The error occurs in SystemClock_Config(), code is as generated (with two modifications):

void SystemClock_Config(void)

{

RCC_OscInitTypeDef RCC_OscInitStruct;

RCC_ClkInitTypeDef RCC_ClkInitStruct;

RCC_PeriphCLKInitTypeDef PeriphClkInit;

__HAL_RCC_PWR_CLK_ENABLE();

while ( READ_BIT( PWR->CSR, PWR_CSR_VOSF ) ); /* Coworkers added these after encountering this problem. */

__HAL_PWR_VOLTAGESCALING_CONFIG(PWR_REGULATOR_VOLTAGE_SCALE1);

while ( READ_BIT( PWR->CSR, PWR_CSR_VOSF ) ); /* Coworkers added these after encountering this problem. */

RCC_OscInitStruct.OscillatorType = RCC_OSCILLATORTYPE_HSI|RCC_OSCILLATORTYPE_HSE|RCC_OSCILLATORTYPE_LSE;

RCC_OscInitStruct.HSEState = RCC_HSE_ON;

RCC_OscInitStruct.LSEState = RCC_LSE_BYPASS;

RCC_OscInitStruct.HSIState = RCC_HSI_ON;

RCC_OscInitStruct.HSICalibrationValue = 16;

RCC_OscInitStruct.PLL.PLLState = RCC_PLL_ON;

RCC_OscInitStruct.PLL.PLLSource = RCC_PLLSOURCE_HSE;

RCC_OscInitStruct.PLL.PLLMUL = RCC_PLL_MUL12;

RCC_OscInitStruct.PLL.PLLDIV = RCC_PLL_DIV3;

if (HAL_RCC_OscConfig(&RCC_OscInitStruct) != HAL_OK)

{

Error_Handler();

}

...

The polling loops on PWR_CSR_VOSF are recommended by the reference manual, but it also says the clock is stopped while it's switching voltages, so they seem pointless. CubeMX omits them. My coworkers added them when they first encountered this issue and it made it go away. The issue has been seen since, so I suspect it was a red herring 'fix'. I'd much prefer to get to the bottom of this than to pepper the code with delays and statements which have an unknown effect... Anyways, on to the problem:

When I reset with a debugger attached, Error_Handler is called. Setting a breakpoint at HAL_RCC_OscConfig line and viewing memory shows the first 4 members of RCC_OscInitStruct (code populates them in memory order) are not the expected 7,1,5,1. Inserting __NOP() calls after the last polling loop changes these values:

0 __NOPs: 0,7,5,7 (2nd and 4th values are the same because they come from same register, loaded once)
1
__NOP:
0,1,5,1
2 __NOPs: 8,1,5,1
3+ __NOPs: 7,1,5,1 (desired)

Inserting a multiple of 4 __NOPs (0,4,8) before the voltage scaling call preserves this behaviour. Inserting 1-3, 5-7, resolves it.

When I step through the code with the debugger, the memory looks as expected. Unplugging the debugger and power cycling cause the exact same code to execute fine and the system starts up.

My Conclusion

From these results, I suspect something strange is going on with the debugger, instruction prefetch (64-bit prefetch, 16-bit nops, points to the multiple of 4 nops), voltage scaling setting (clock is supposed to be stopped during this time, maybe debugger or prefetch are doing something weird during this time), etc.

Disabling prefetch seems to resolve this, but I don't know why and would like an explanation before I make any changes.

Has anyone seen anything like this or have any other troubleshooting tips?

#debugger #stm32l #voltagescaling #prefetch

STOne-32 · ‎2017-08-11

Posted on August 11, 2017 at 21:33

Hi Andrew,

We need to debug and reproduce this behavior at our end. It might be an issue or cross talk betwenn Jtag/ swd clock and such low cpu frequently. Can you reduce down Jtag/ swd clock to minimum example few KHz and see if behavior is following debug or CPU clock ? For Voltage scaling is that possible ti execute that loop and change from RAM instead of flash.

If your debug persist I recommend you to contact your STMICROELECTRONICS FAE or distribution to have an official direct channel with us.

Cheers

STOne -32

Andrew Sund · ‎2017-08-16

Posted on August 17, 2017 at 01:20

I reduced the debugger speed to 200KHz to no avail.

Also:

This only seems to affect the system when the debugger is attached. I can also set a breakpoint after switching core voltage and then simply resume execution and things to work. Remove the breakpoint and restart the code and the issue comes back.
I tried eliminating the bootloader we're using. It has its own setup code, so I thought it could have put the chip into an unexpected state. No change.
I'm using the latest gcc-arm-embedded toolchain.
I tried changing the optimization level to -Os (was -Og) and now only RCC_OscInitStruct.LSEState gets the wrong value, even after __NOP fiddling. This value appears to be a memory address e.g. 0x802362F
. Adding multiples of 4 NOPs to this code only changed the value I find inside LSEState by 8... Odd, some offset of PC?
Back at -Og, I put an __ISB() before the first change to RCC_OscInitStruct and got a Hard Fault. HFSR says it's forced, CFSR says it's a Usage Fault and UNDEFINSTR. Unwinding the stack manually shows the debugger wasn't lying and the faulting instruction was the ISB... 0xbff36f8f.

It feels like there is a brief period where the the instructions being executed are corrupted (That fault when trying to execute an ISB or the 'movs' or 'str' populating that struct with the wrong operands). I have yet to try running this code from RAM to see if it's an issue with flash when changing core voltages, but I feel like increasing the voltage shouldn't be the issue as I'm running off the MSI and the core clock is supposed to be stopped while the voltage is unstable.

I don't know if I'm really reaching here, but a __DSB() after __HAL_RCC_PWR_CLK_ENABLE() seems to resolve it as well, but I don't know if this is meaningful or if my problem will just come back when something else changes. This macro has a dummy read from the register after writing the enable bit, though this errata workaround isn't listed in the L15x errata sheet (DM00104204.pdf), only those for other product lines. This could just be a documentation oversight or just another case of a few extra bytes hiding the issue again. I'm hesitant to suspect this code is incorrect as it's probably running just fine on many demo boards and in other people's designs.

Tesla DeLorean · ‎2017-08-16

Posted on August 17, 2017 at 01:48

>>I'm using the latest gcc-arm-embedded toolchain.

Not sure that means less bugs or different bugs. Which version specifically? Does going back to 4.6.x or 4.7.x builds show better results?

How about builds using professional tools like Keil or IAR using non-GNU/GCC based compilers?

What about debuggers from such tools, are they less or differently invasive?

The DSB is there to address an errata where the write buffers enable a clock and immediately write to the peripheral being enabled. Reading back the RCC->AHBENR after writing it would arguably have a better or equivalent effect.

Tips, buy me a coffee, or three.. PayPal Venmo Up vote any posts that you find helpful, it shows what's working..

Jan Waclawek · ‎2017-08-16

Posted on August 17, 2017 at 02:18

Tell us about your hardware - is this a 'known good' hw such as Nucle or Disco? If not, check all VDD/GND and related connections directly on pins. What's the primary source, how is it regulated, to what voltage, distributed, decoupled, what else is supplied from it, are there significant noise sources around? Can you observe the supply-related pins with an oscilloscope, best triggered with a GPIO toggled just before the problem? Can you try if the problem can be reprduced on a 'known good' hw? Does the problem occur I you physically connect the debugger but don't run the debugging program/IDE?

What is the state of processor before the presented sequence, what's its system freq., what else runs before it? Post the disasm.

JW

Andrew Sund · ‎2017-08-25

Posted on August 25, 2017 at 21:17

Thank you both for helping me with this. I am waiting on time to be able to further investigate this and for a coworker to come back from vacation to have a detailed discussion. I will update then and try to answer all questions. The only time this is triggered is when executing this code with the debugger attached and active, and only if the core voltage change is executed without pausing before doing some memory stores... so I suspect maybe some noise or power issue during this time and will focus on that.