STM32F427 NOCP UNDEFINSTR hard fault problem

Diego Dell'Orto · ‎2018-02-12

Posted on February 12, 2018 at 11:52

Hello,

I need help, because we are facing hard fault with the STM32F427 MCU.

Same software over some MCUs works perfectly, over someother crash and we are not able to understand the reasons.

Attached screen shot of debugging sections.

Any Idea?

Many Thanks,

Diego

#hard-fault #undefinstr #nocp #stm32f427

AvaTar · ‎2018-02-12

Posted on February 12, 2018 at 12:18

Looks like you use an OS and tasks, which makes it a bit more difficult.

Same software over some MCUs works perfectly, over someother crash and we are not able to understand the reasons.

I understand this as 'crashes on some boards, runs fine on others'.

This might be a clue, but not necessarily. I expect all boards (MCU's ?) to be identical otherwise.

Seemingly 'random' crashes are often cause by resource overflow/corruption (mostly stack) due to external, asynchronous events (interrupts).

The path name of your images imply CoDeSys, which is ... not quite small.

Besides of providing more about your hardware/software environment, I would first work backwards, and decode the hardfault reason from the SCB registers.

Try more than one hardfault event. Is the place and context the same, or does it change ?

Second step would be presumably an instrumentation of routines possible causing it.

Tesla DeLorean · ‎2018-02-12

Posted on February 12, 2018 at 12:47

Look at the code that's actually faulting.

For chip/board specific issues and instability, look at voltages and capacitors on VCAP pins.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

Diego Dell'Orto · ‎2018-02-12

Posted on February 12, 2018 at 12:43

yes, the very strange is some board yes and some other not.

I've tried to replace the MCU and that board is now working properly, no more problems like before.

Some how I've the feeling it is the MCU having something, but honestly I'm not sure.

The place is not always the same, it looks that somehow there is a repetition of 5/6 different value of program counter, but...

Errors are NOCP, UNDEFINSTR, INVSTATE talking about Usage Fault, few times I get also Bus Fault PRECISERR.

Another strange things happening, is that if I place the PC to the value got by the hard fault handler, sometime the instruction is executed if I do that step by step, sometime that line is not executed, it looks the program stack there and no way to go further.

Where to see SCB register?

About interrupts, I was thinking about that, because a way to have more frequently the problem is to increase the amount of Tx/Rx interrupt fron CAN BUS interface. But again, very strange that on some board it happens quite easy and on some other board never.

Many Thanks and waiting further info.

Diego Dell'Orto · ‎2018-02-12

Posted on February 12, 2018 at 12:53

We don't find the code that is faulting, because the PC is always different and it goes in place where code is normally working.

About hardware, could be a point. On one board only, the only one where we tried, by changing the MCU the problem looks solved. Maybe is not the microcontroller, but a soldering of a VCAP pin as you suggect.

I try to look.

Thanks

AvaTar · ‎2018-02-12

Posted on February 12, 2018 at 13:00

The place is not always the same, it looks that somehow there is a repetition of 5/6 different value of program counter, but...

Errors are NOCP, UNDEFINSTR, INVSTATE talking about Usage Fault, few times I get also Bus Fault PRECISERR.

This is a typical symptom of a stack overflow, when 'odd' variables are interpreted as return addresses.

I would use a stack-check feature. Some toolchains have this option, and FreeRTOS too.

About interrupts, I was thinking about that, because a way to have more frequently the problem is to increase the amount of Tx/Rx interrupt fron CAN BUS interface.

This is another overflow symptom, because interrupts pound on the stack, too.

Though they (interrupts) are not the cause, they use to add a 'random' note, and complicate debugging.

Code instrumentation is good for synchronous problem (i.e. systematic bugs in the code), but not for stack overflows.

Where to see SCB register?

http://www.keil.com/appnotes/files/apnt209.pdf

Diego Dell'Orto · ‎2018-02-12

Posted on February 12, 2018 at 14:10

but why some MCU yes and some other not?

we should face the same 'random' problem on every board in the same condition if we have a stack overflow, isn't it?

I've difficulties to understand how a software problem can create problems over one hardware and not on another hardware...

Diego Dell'Orto · ‎2018-02-12

Posted on February 12, 2018 at 14:39

AvaTar · ‎2018-02-12

Posted on February 12, 2018 at 15:09

A correlation of hardfaults and individual MCUs is not very likely.

One exception that come to my mind - if your Flash interface (clock rates, wait states) settings are at the limit.

Your firmware might contain a race condition, wich could depend on hardware differences (like delays).

Like nested interrupts or specific error conditions in interrupts.

You can try to implement stub code for the other exceptions that escalate to hardfaults if unhandled.

Do you use the FPU, and the long stack frame (with FPU regs) ?

Diego Dell'Orto · ‎2018-02-12

Posted on February 12, 2018 at 15:31

I've already tryied to slow down from 180Mhz to 120Mhz, but the behaviour is the same.

Following my system init proc:

void SystemInit(void)

{

/* FPU settings ------------------------------------------------------------*/

&sharpif (__FPU_PRESENT == 1) && (__FPU_USED == 1)

SCB->CPACR |= ((3UL << 10*2)|(3UL << 11*2)); /* set CP10 and CP11 Full Access */

&sharpendif

/* Reset the RCC clock configuration to the default reset state ------------*/

/* Set HSION bit */

RCC->CR |= (uint32_t)0x00000001;

/* Reset CFGR register */

RCC->CFGR = 0x00000000;

/* Reset HSEON, CSSON and PLLON bits */

RCC->CR &= (uint32_t)0xFEF6FFFF;

/* Reset PLLCFGR register */

RCC->PLLCFGR = 0x24003010;

/* Reset HSEBYP bit */

RCC->CR &= (uint32_t)0xFFFBFFFF;

/* Disable all interrupts */

RCC->CIR = 0x00000000;

&sharpif defined (DATA_IN_ExtSRAM) || defined (DATA_IN_ExtSDRAM) || defined (PREMAIN_FSMC_SETUP) /* Keil */

SystemInit_ExtMemCtl();

&sharpendif /* DATA_IN_ExtSRAM || DATA_IN_ExtSDRAM || defined (PREMAIN_FSMC_SETUP) */ /* Keil */

/* Configure the System clock source, PLL Multiplier and Divider factors,

AHB/APBx prescalers and Flash settings ----------------------------------*/

SetSysClock();

/* Configure the Vector Table location add offset address ------------------*/

&sharpifdef VECT_TAB_SRAM

SCB->VTOR = SRAM_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal SRAM */

&sharpelse

SCB->VTOR = FLASH_BASE | VECT_TAB_OFFSET; /* Vector Table Relocation in Internal FLASH */

&sharpendif

}

where...

&sharpdefine __FPU_PRESENT 1 /*!< FPU present */+

then I suppose is un use

Have I to do like following for the hard fault handler for the othr faults?

HardFault_Handler\

PROC

EXPORT HardFault_Handler [WEAK]

TST LR, &sharp4

ITE EQ

MRSEQ R0, MSP

MRSNE R0, PSP

B hard_fault_handler_c

ENDP

MemManage_Handler\

PROC

EXPORT MemManage_Handler [WEAK]

B .

ENDP

BusFault_Handler\

PROC

EXPORT BusFault_Handler [WEAK]

B .

ENDP

UsageFault_Handler\

PROC

EXPORT UsageFault_Handler [WEAK]

B .

ENDP

can I find sample of the other fault handling in C language?