Skip to main content
rrnicolay
Associate II
December 17, 2019
Question

HardFault Debug in STM32CubeIDE

  • December 17, 2019
  • 6 replies
  • 10263 views

Some time ago, I was getting a Hardfault in a STM32F103 baremetal firmware. Even posted the question here, but I wasnt able to fix it. So, I moved to FreeRTOS-CMSIS, not to get rid of the problem, I was moving anyway. I'm still with this issue.

I think it is related to printing some floating point numbers.

I checked "Use float with printf from newlib-nano (-u _print_float)" in the project.

One task prints to a UART (redirection using _write()) the float values of GForce in 3 axis every 10ms.

printf("%5.2f, %5.2f, %5.2f\r\n", accScaled[0], accScaled[1], accScaled[2]);
/* Redirection of printf */
int _write(int file, char *ptr, int len)
{
	/* Wait for the transaction to complete */
	while (HAL_UART_GetState(uartHandler) != HAL_UART_STATE_READY) {}
 
	/* Fill buffer */
	if(len < TX_BUF_SZ)
	{
		strncpy(txBuffer, ptr, len);
	}
	else
	{
		strcpy(txBuffer, "[WARN] Tx size exceeded\r\n");
	}
 
	/* Transmit */
	HAL_UART_Transmit_DMA(uartHandler, (uint8_t *)txBuffer, len);
 
 return len;
}

It doesnt seem to be a Stack overflow, since it runs for several seconds (minutes sometimes) and uxTaskGetStackHighWaterMark() returns 90 words left of stack on the task.

Started debugging the Hardfault using this commit.

.section .text.Reset_Handler
.weak HardFault_Handler
.type HardFault_Handler, %function
HardFault_Handler:
 movs r0,#4
 movs r1, lr
 tst r0, r1
 beq _MSP
 mrs r0, psp
 b _HALT
_MSP:
 mrs r0, msp
_HALT:
 ldr r1,[r0,#20]
 b hard_fault_handler_c
 bkpt #0
 
.size HardFault_Handler, .-HardFault_Handler
void hard_fault_handler_c(unsigned long *hardfault_args){
 volatile unsigned long stacked_r0 ;
 volatile unsigned long stacked_r1 ;
 volatile unsigned long stacked_r2 ;
 volatile unsigned long stacked_r3 ;
 volatile unsigned long stacked_r12 ;
 volatile unsigned long stacked_lr ;
 volatile unsigned long stacked_pc ;
 volatile unsigned long stacked_psr ;
 volatile unsigned long _CFSR ;
 volatile unsigned long _HFSR ;
 volatile unsigned long _DFSR ;
 volatile unsigned long _AFSR ;
 volatile unsigned long _BFAR ;
 volatile unsigned long _MMAR ;
 
 stacked_r0 = ((unsigned long)hardfault_args[0]) ;
 stacked_r1 = ((unsigned long)hardfault_args[1]) ;
 stacked_r2 = ((unsigned long)hardfault_args[2]) ;
 stacked_r3 = ((unsigned long)hardfault_args[3]) ;
 stacked_r12 = ((unsigned long)hardfault_args[4]) ;
 stacked_lr = ((unsigned long)hardfault_args[5]) ;
 stacked_pc = ((unsigned long)hardfault_args[6]) ;
 stacked_psr = ((unsigned long)hardfault_args[7]) ;
 
 // Configurable Fault Status Register
 // Consists of MMSR, BFSR and UFSR
 _CFSR = (*((volatile unsigned long *)(0xE000ED28))) ;
 
 // Hard Fault Status Register
 _HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ;
 
 // Debug Fault Status Register
 _DFSR = (*((volatile unsigned long *)(0xE000ED30))) ;
 
 // Auxiliary Fault Status Register
 _AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ;
 
 // Read the Fault Address Registers. These may not contain valid values.
 // Check BFARVALID/MMARVALID to see if they are valid values
 // MemManage Fault Address Register
 _MMAR = (*((volatile unsigned long *)(0xE000ED34))) ;
 // Bus Fault Address Register
 _BFAR = (*((volatile unsigned long *)(0xE000ED38))) ;
 
 (void) stacked_r0 ;
 (void) stacked_r1 ;
 (void) stacked_r2 ;
 (void) stacked_r3 ;
 (void) stacked_r12 ;
 (void) stacked_lr ;
 (void) stacked_pc ;
 (void) stacked_psr ;
 (void) _CFSR ;
 (void) _HFSR ;
 (void) _DFSR ;
 (void) _AFSR ;
 (void) _BFAR ;
 (void) _MMAR ;
 
 __asm("BKPT #0\n") ; // Break into the debugger
}

The disassembly point to this code (instruction 0x080105a2):

__i2b:
08010594: push {r4, lr}
08010596: mov r4, r1
08010598: movs r1, #1
0801059a: bl 0x8010370 <_Balloc>
0801059e: movs r2, #1
080105a0: str r4, [r0, #20]
080105a2: str r2, [r0, #16]
080105a4: pop {r4, pc}

0690X00000BuhwiQAB.png

I know I could stop printing floating point numbers and the fault would probably stop. But is very handy for debugging purposes. It should work fine.

Right now, I'm stuck on this. Can anyone give me a hand?

Let me know if is any info I could post to better expose the problem.

STM32F103

CMSIS v1 (1.02)

FreeRTOS 10.0.1

This topic has been closed for replies.

6 replies

Ozone
Principal
December 17, 2019

One thing that immediately catched my eye:

> 0801059a: bl 0x8010370 <_Balloc>

The malloc() functions take memory from the heap, not the stack.

But many IDEs set the default heap size to zero when creating a new project. Or, your heap could overflow.

rrnicolay
rrnicolayAuthor
Associate II
December 17, 2019

Thanks for your help @Ozone​ !

In STM32CubeIDE, the values for heap and stack are set in the linkers. Right?

This is the default:

/* Highest address of the user mode stack */
_estack = 0x20018000;	/* end of "RAM" Ram type memory */
 
_Min_Heap_Size = 0x200;	/* required amount of heap */
_Min_Stack_Size = 0x400;	/* required amount of stack */

I already played with these params, increased by 10x the values for heap and stack in the linker script. Same thing happened.

While in FreeRTOS:

0690X00000BuiH2QAJ.png

Is there anything else I could try?

Ozone
Principal
December 17, 2019

> In STM32CubeIDE, the values for heap and stack are set in the loaders. Right?

They are often defined somewhere during the project creation process, and are accessible via the project properties.

And finally, they end up in the linker script.

Not much experience with FreeRTOS, which complicates things a bit.

I'm no user of CubeIDE either.

But what do the SCB registers say about the hardfault reason ?

Perhaps a propagated fault, because of a missing handler ?

Bob S
Super User
December 17, 2019

Not the answer to your original, non-RTOS version of this failure, but there are issues with the stock malloc() family when running an RTOS. @Dave Nadler​ has posted extensively about this (for example, here https://community.st.com/s/question/0D50X0000BB1eL7SQJ/bug-cubemx-freertos-projects-corrupt-memory), and has a web page describing in detail what is wrong and how to fix it:

http://www.nadler.com/embedded/newlibAndFreeRTOS.html

Since you now apparently know where in the code the fault is happening, step into the _Balloc function and see if you can tell why it is returning NULL. And for that matter, why is the code that CALLS Balloc not checking for error? Yeah, not your fault. It is buried in dtoa(), called from printf(). Inexcusably bad coding on the library's part.

FYI, here is the source to Balloc() from https://sourceware.org/newlib/ :

_Bigint *
Balloc (struct _reent *ptr, int k)
{
 int x;
 _Bigint *rv ;
 
 _REENT_CHECK_MP(ptr);
 if (_REENT_MP_FREELIST(ptr) == NULL)
 {
 /* Allocate a list of pointers to the mprec objects */
 _REENT_MP_FREELIST(ptr) = (struct _Bigint **) _calloc_r (ptr, 
						 sizeof (struct _Bigint *),
						 _Kmax + 1);
 if (_REENT_MP_FREELIST(ptr) == NULL)
	{
	 return NULL;
	}
 }
 
 if ((rv = _REENT_MP_FREELIST(ptr)[k]) != 0)
 {
 _REENT_MP_FREELIST(ptr)[k] = rv->_next;
 }
 else
 {
 x = 1 << k;
 /* Allocate an mprec Bigint and stick in in the freelist */
 rv = (_Bigint *) _calloc_r (ptr,
				 1,
				 sizeof (_Bigint) +
				 (x-1) * sizeof(rv->_x));
 if (rv == NULL) return NULL;
 rv->_k = k;
 rv->_maxwds = x;
 }
 rv->_sign = rv->_wds = 0;
 return rv;
}

rrnicolay
rrnicolayAuthor
Associate II
December 17, 2019

Thanks for your help, @Bob S​ !

I was already looking at Nadlers post. Implemented his solution. But, for some reason, the UART transmission through DMA inside the _write() function isnt working anymore. It shouldnt have anything to do with that. Still looking to fix this.

The first transmission works, but the DMA transfer complete interrupt never gets triggered.

I'll post any news in near future.

RMcCa
Senior II
December 17, 2019

You could also write your own float to string routine of some sort. Wouldn't be hard, especially if you know the expected data range.​

rrnicolay
rrnicolayAuthor
Associate II
December 19, 2019

My thoughts about this, so far:

I got it working with two solutions, by working I mean that the firmware didnt crash running for some minutes (need more time of testing for a definitive response):

  1. Original faulty firmware (crashing within 1min of operation on the accelerometer example I described) with just removing the sysmem.c file (the one with _sbrk()). Why it works without the _sbrk definition? No idea! I couldnt find _sbrk definition anywhere else. It shouldnt even compile (who calls this after all?). Anyway, this way it works using DMA for UART transmission and its more practical, because I can use the code generated by Cube as is.
  2. Using Nadler's solution.

What I did:

  • changed heap management for the one provided in his post. (thanks Nadler!);
  • Removed sysmem.c (because now it was a duplicated definition of _sbrk());
  • Defined #define configUSE_NEWLIB_REENTRANT 1
  • Had to stop using DMA to transmit data on the UART (redirection using _write()). Despite being called with the right parameters, the _write() function was getting stuck in the second call, where I check for UART state (was busy forever).

What worries me the most, its that I cant understand whats going on.

shorai
Associate III
December 19, 2019

I often run into problems with sprintf overflowing a buffer, as well as RTOS not allocating enough space for a task, both are my fault.

Running in the debugger and catching the place where the error occurs is often the best approach as the stacks (and memory) may be badly mangled before the fault transpires. i.e. don't necessarily look at the Hard fault registers, by then it may be too late. Rather look at what is goingon immediately before the fault.

Try switching tasks off and isolate the problem to a task or short piece of code.

Formatting and parsing floating point is non trivial and may consume yards of stack with all it's ifs, buts and maybes. You may want to try an alternate sprintf or even ftoa.

In embedded I generally know the range of your FP numbers, so can easily do work arounds. e.g. multiply/divide by 1000s and print as integer with metric suffixes. I once wrote a function to auto scale from 10^24 to 10^-24, (yotta to yocto) very simple and quick.

For embedded work, I prefer to limit usage of malloc and free as sooner or later I get into garbage collection of some form.

As a result I usually use stack space or pre-allocated buffers. It's surprising how little you can get away with without adding complexity to the code.

My code simplified substantially once I learned to work with tasks and static buffers.

BBasn.1
Visitor II
January 23, 2024

I had the same problem, which kept me baffled for a very long time. 

I just doubled the "Stack Size(Words)" from "128" to "256" from RTOS -> "Tasks and Queues" and my problem was solved.