HardFault Debug in STM32CubeIDE

rrnicolay · ‎2019-12-17

Some time ago, I was getting a Hardfault in a STM32F103 baremetal firmware. Even posted the question here, but I wasnt able to fix it. So, I moved to FreeRTOS-CMSIS, not to get rid of the problem, I was moving anyway. I'm still with this issue.

I think it is related to printing some floating point numbers.

I checked "Use float with printf from newlib-nano (-u _print_float)" in the project.

One task prints to a UART (redirection using _write()) the float values of GForce in 3 axis every 10ms.

printf("%5.2f, %5.2f, %5.2f\r\n", accScaled[0], accScaled[1], accScaled[2]);

/* Redirection of printf */
int _write(int file, char *ptr, int len)
{
	/* Wait for the transaction to complete */
	while (HAL_UART_GetState(uartHandler) != HAL_UART_STATE_READY) {}
 
	/* Fill buffer */
	if(len < TX_BUF_SZ)
	{
		strncpy(txBuffer, ptr, len);
	}
	else
	{
		strcpy(txBuffer, "[WARN] Tx size exceeded\r\n");
	}
 
	/* Transmit */
	HAL_UART_Transmit_DMA(uartHandler, (uint8_t *)txBuffer, len);
 
  return len;
}

It doesnt seem to be a Stack overflow, since it runs for several seconds (minutes sometimes) and uxTaskGetStackHighWaterMark() returns 90 words left of stack on the task.

Started debugging the Hardfault using this commit.

.section  .text.Reset_Handler
.weak  HardFault_Handler
.type  HardFault_Handler, %function
HardFault_Handler:
  movs r0,#4
  movs r1, lr
  tst r0, r1
  beq _MSP
  mrs r0, psp
  b _HALT
_MSP:
  mrs r0, msp
_HALT:
  ldr r1,[r0,#20]
  b hard_fault_handler_c
  bkpt #0
 
.size  HardFault_Handler, .-HardFault_Handler

void hard_fault_handler_c(unsigned long *hardfault_args){
  volatile unsigned long stacked_r0 ;
  volatile unsigned long stacked_r1 ;
  volatile unsigned long stacked_r2 ;
  volatile unsigned long stacked_r3 ;
  volatile unsigned long stacked_r12 ;
  volatile unsigned long stacked_lr ;
  volatile unsigned long stacked_pc ;
  volatile unsigned long stacked_psr ;
  volatile unsigned long _CFSR ;
  volatile unsigned long _HFSR ;
  volatile unsigned long _DFSR ;
  volatile unsigned long _AFSR ;
  volatile unsigned long _BFAR ;
  volatile unsigned long _MMAR ;
 
  stacked_r0 = ((unsigned long)hardfault_args[0]) ;
  stacked_r1 = ((unsigned long)hardfault_args[1]) ;
  stacked_r2 = ((unsigned long)hardfault_args[2]) ;
  stacked_r3 = ((unsigned long)hardfault_args[3]) ;
  stacked_r12 = ((unsigned long)hardfault_args[4]) ;
  stacked_lr = ((unsigned long)hardfault_args[5]) ;
  stacked_pc = ((unsigned long)hardfault_args[6]) ;
  stacked_psr = ((unsigned long)hardfault_args[7]) ;
 
  // Configurable Fault Status Register
  // Consists of MMSR, BFSR and UFSR
  _CFSR = (*((volatile unsigned long *)(0xE000ED28))) ;
 
  // Hard Fault Status Register
  _HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ;
 
  // Debug Fault Status Register
  _DFSR = (*((volatile unsigned long *)(0xE000ED30))) ;
 
  // Auxiliary Fault Status Register
  _AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ;
 
  // Read the Fault Address Registers. These may not contain valid values.
  // Check BFARVALID/MMARVALID to see if they are valid values
  // MemManage Fault Address Register
  _MMAR = (*((volatile unsigned long *)(0xE000ED34))) ;
  // Bus Fault Address Register
  _BFAR = (*((volatile unsigned long *)(0xE000ED38))) ;
 
  (void) stacked_r0 ;
  (void) stacked_r1 ;
  (void) stacked_r2 ;
  (void) stacked_r3 ;
  (void) stacked_r12 ;
  (void) stacked_lr ;
  (void) stacked_pc ;
  (void) stacked_psr ;
  (void) _CFSR ;
  (void) _HFSR ;
  (void) _DFSR ;
  (void) _AFSR ;
  (void) _BFAR ;
  (void) _MMAR ;
 
  __asm("BKPT #0\n") ; // Break into the debugger
}

The disassembly point to this code (instruction 0x080105a2):

__i2b:
08010594:   push    {r4, lr}
08010596:   mov     r4, r1
08010598:   movs    r1, #1
0801059a:   bl      0x8010370 <_Balloc>
0801059e:   movs    r2, #1
080105a0:   str     r4, [r0, #20]
080105a2:   str     r2, [r0, #16]
080105a4:   pop     {r4, pc}

I know I could stop printing floating point numbers and the fault would probably stop. But is very handy for debugging purposes. It should work fine.

Right now, I'm stuck on this. Can anyone give me a hand?

Let me know if is any info I could post to better expose the problem.

STM32F103

CMSIS v1 (1.02)

FreeRTOS 10.0.1

Ozone · ‎2019-12-17

One thing that immediately catched my eye:

> 0801059a: bl 0x8010370 <_Balloc>

The malloc() functions take memory from the heap, not the stack.

But many IDEs set the default heap size to zero when creating a new project. Or, your heap could overflow.

rrnicolay · ‎2019-12-17

Thanks for your help @Ozone !

In STM32CubeIDE, the values for heap and stack are set in the linkers. Right?

This is the default:

/* Highest address of the user mode stack */
_estack = 0x20018000;	/* end of "RAM" Ram type memory */
 
_Min_Heap_Size = 0x200;	/* required amount of heap  */
_Min_Stack_Size = 0x400;	/* required amount of stack */

I already played with these params, increased by 10x the values for heap and stack in the linker script. Same thing happened.

While in FreeRTOS:

Is there anything else I could try?

Ozone · ‎2019-12-17

> In STM32CubeIDE, the values for heap and stack are set in the loaders. Right?

They are often defined somewhere during the project creation process, and are accessible via the project properties.

And finally, they end up in the linker script.

Not much experience with FreeRTOS, which complicates things a bit.

I'm no user of CubeIDE either.

But what do the SCB registers say about the hardfault reason ?

Perhaps a propagated fault, because of a missing handler ?

Bob S · ‎2019-12-17

Not the answer to your original, non-RTOS version of this failure, but there are issues with the stock malloc() family when running an RTOS. @Dave Nadler has posted extensively about this (for example, here https://community.st.com/s/question/0D50X0000BB1eL7SQJ/bug-cubemx-freertos-projects-corrupt-memory), and has a web page describing in detail what is wrong and how to fix it:

http://www.nadler.com/embedded/newlibAndFreeRTOS.html

Since you now apparently know where in the code the fault is happening, step into the _Balloc function and see if you can tell why it is returning NULL. And for that matter, why is the code that CALLS Balloc not checking for error? Yeah, not your fault. It is buried in dtoa(), called from printf(). Inexcusably bad coding on the library's part.

FYI, here is the source to Balloc() from https://sourceware.org/newlib/ :

_Bigint *
Balloc (struct _reent *ptr, int k)
{
  int x;
  _Bigint *rv ;
 
  _REENT_CHECK_MP(ptr);
  if (_REENT_MP_FREELIST(ptr) == NULL)
    {
      /* Allocate a list of pointers to the mprec objects */
      _REENT_MP_FREELIST(ptr) = (struct _Bigint **) _calloc_r (ptr, 
						      sizeof (struct _Bigint *),
						      _Kmax + 1);
      if (_REENT_MP_FREELIST(ptr) == NULL)
	{
	  return NULL;
	}
    }
 
  if ((rv = _REENT_MP_FREELIST(ptr)[k]) != 0)
    {
      _REENT_MP_FREELIST(ptr)[k] = rv->_next;
    }
  else
    {
      x = 1 << k;
      /* Allocate an mprec Bigint and stick in in the freelist */
      rv = (_Bigint *) _calloc_r (ptr,
				  1,
				  sizeof (_Bigint) +
				  (x-1) * sizeof(rv->_x));
      if (rv == NULL) return NULL;
      rv->_k = k;
      rv->_maxwds = x;
    }
  rv->_sign = rv->_wds = 0;
  return rv;
}

RMcCa · ‎2019-12-17

You could also write your own float to string routine of some sort. Wouldn't be hard, especially if you know the expected data range.

rrnicolay · ‎2019-12-17

Thanks for your help, @Bob S !

I was already looking at Nadlers post. Implemented his solution. But, for some reason, the UART transmission through DMA inside the _write() function isnt working anymore. It shouldnt have anything to do with that. Still looking to fix this.

The first transmission works, but the DMA transfer complete interrupt never gets triggered.

I'll post any news in near future.

rrnicolay · ‎2019-12-18

Something interesting discovered accidentally:

STM32CubeIDE generates a file called sysmem.c, which implements _sbrk(). If, using my original faulty firmware, I remove this file from project, everything works fine (or its eating heap and will crash some time in the future hehe).

Checked that _sbrk() was getting called until it returned ENOMEM, just before the hardfault.

Is there a possibility that removing _sbrk() solves the problem? It doesnt make any sense, I know.

Ozone · ‎2019-12-18

> Checked that _sbrk() was getting called until it returned ENOMEM, just before the hardfault.

It seems some routines can't handle the ENOMEM error return, which would not be surprising for Cube code.

_sbrk() is supposed to be the place where you implement system specifics, i.e. arrange to application requirements and the actual available heap memory. An out-of-memory error is usually fatal for an embedded system without MMU + swapping device.

Perhaps your application doesn't need that much heap in the release version.

But for debugging and the printf() calls, you might want to increase the available heap memory, and check that no malloc() call fails.

rrnicolay · ‎2019-12-19

My thoughts about this, so far:

I got it working with two solutions, by working I mean that the firmware didnt crash running for some minutes (need more time of testing for a definitive response):

Original faulty firmware (crashing within 1min of operation on the accelerometer example I described) with just removing the sysmem.c file (the one with _sbrk()). Why it works without the _sbrk definition? No idea! I couldnt find _sbrk definition anywhere else. It shouldnt even compile (who calls this after all?). Anyway, this way it works using DMA for UART transmission and its more practical, because I can use the code generated by Cube as is.
Using Nadler's solution.

What I did:

changed heap management for the one provided in his post. (thanks Nadler!);
Removed sysmem.c (because now it was a duplicated definition of _sbrk());
Defined #define configUSE_NEWLIB_REENTRANT 1
Had to stop using DMA to transmit data on the UART (redirection using _write()). Despite being called with the right parameters, the _write() function was getting stuck in the second call, where I check for UART state (was busy forever).

What worries me the most, its that I cant understand whats going on.