cancel
Showing results for 
Search instead for 
Did you mean: 

Does HAL callbacks save r0-r3 registers? Is it needed?

AlbertoGarlassi
Associate III

Hello,

We have random HardFaults on an STM32H750, maybe after several hours of uptime.

It seems caused by an access to a wrong RAM location, whose address is fetched from the stack and stored in r2.

But the stack seems OK.

This happens in a function frequently interrupted by a higher priority HAL DMA callback.

Inspecting it AFAIK registers r0-r3 are not preserved by GCC optimized at -O2.

It could be that sometimes this interrupt takes place between loading of r2 and its use, overwriting it with a wrong value.

Inside my callback function there are calls to other functions. Is this OK? I read somewhere that GCC saves only the registers it uses in the main function of the ISR and doesn't take care of register's use in called function. Don't know if it makes sense.

Adding __attribute__((interrupt)) does not seem to make any difference.

For now I added push and pop of the scratch registers in the callback and it seems to work, but I'm not completely sure because a slight timing difference could be enough to mask the problem.

It is also inconvenient, because the HAL library needs to be patched.

I am not convinced of anything I wrote before because it would break most code and it would have been spotted long ago.

Any comment?

Thanks and regards.

Alberto

18 REPLIES 18
AlbertoGarlassi
Associate III

I'm checking your clues, really appreciated.

The second screenshot is from Eclipse indeed. It should bring the infos you are asking for. If you need a text file format let me know. Yes, I walked back to the hard fault context by pointing its line in the stack call view, center left. The important thing anyway is that r2 changes value without apparent reasons and triggers a hard fault while executing vldr s14, [r2].

Most promising hint is about cache. I would expect a corrupted stack and this is not the case.

Another thing I'm going to try is to disable interrupts during the CMSIS fft that gives trouble.

Regards

Alberto

FBL
ST Employee

Hello @AlbertoGarlassi​ 

It is possible that this behavior occurs and triggers a Hard Fault:

R2 is initially loaded with the address of a memory location that contains the data needed for the FFT calculation. However, if an interrupt occurs at a critical time between the load of R2 and the execution of VLDR S14, [R2], the interrupt could change the value of R2 to point to an invalid or unexpected memory location.

It's important to ensure that the software interrupt and lower priority ISR are properly configured and managed to avoid issues with timing and resource conflicts.

To give better visibility on the answered topics, please click on Accept as Solution on the reply which solved your issue or answered your question.


I'm out of offce with limited access to my emails.
Happy New Year!

> the interrupt could change the value of R2

As hardware stores/restores R2 at interrupt entry/exit, the only way how this could happen would be if the interrupt would errorneously write to the stack. The likelihood of this is lower than a zillion of other causes, first of which is straightforward user bug.

While r2 might've been loaded at 0x0800ffd8, there's an unconditional jump at 0x0800ffee, so there's some code executed until a jump to 0x0800fff0, and we don't see that code. It may or may not modify r2, directly or indirectly.

JW

I placed two breakpoints at the lines below the unconditional jump and they are never hit, even when a hard fault occurs. Don't now how to interpret this, it seems dead code and I don't know about its likelihood. it comes precompiled from ST's CMSIS.

Meanwhile I'm trying to inspect the stack at top priority ISR entry and exit.

/******************************************************************************/
/* STM32H7xx Peripheral Interrupt Handlers                                    */
/* Add here the Interrupt Handlers for the used peripherals.                  */
/* For the available peripheral interrupt handler names,                      */
/* please refer to the startup file (startup_stm32h7xx.s).                    */
/******************************************************************************/
 
/**
  * @brief This function handles DMA1 stream0 global interrupt.
  */
void DMA1_Stream0_IRQHandler(void)
{
  /* USER CODE BEGIN DMA1_Stream0_IRQn 0 */
 
//using global variables for instrumenting, defined as
// volatile uint32_t StackInR2, StackOutR2
 
//magic number 6 is needed because there are some stack pushes before reaching this line
	register char * stack_ptr asm("sp"); 
	StackInR2 = * ( (uint32_t *) stack_ptr + 6); 
  /* USER CODE END DMA1_Stream0_IRQn 0 */
 
  HAL_DMA_IRQHandler(&hdma_adc1);
 
  /* USER CODE BEGIN DMA1_Stream0_IRQn 1 */
	 StackOutR2 = * ((uint32_t *) stack_ptr + 6); 
	if (StackInR2 != StackOutR2)
	{
//this is a placeholder for inserting a breakpoint
		StackInR2 = StackInR2; 
	}
  /* USER CODE END DMA1_Stream0_IRQn 1 */
}

I'm clearly hallucinating because on first execution the stack is correctly loaded with r0-r3, but, later on, these stack entries are never updated. Debugger shows that sp stays the same and r0-r3 change, for subsequent executions. I would expect those locations to follow r0-r3, but the debugger too shows this is not happening.

Dcache has been disabled.

I'm checking tail chaining, never dealt with it but from what I read this feature could skip registers save. Can't understand how this could apply to this case.

There are some messages from other poor fellows that have experienced hard faults with CMSIS fft.

My bad, I overlooked that the unconditional jump just jumps through those two "instructions" - and they are not instructions, they form one literal word (i.e. a constant read by the program somewhere, not executable instructions). That's why breakpoints won't work there.

Looked at the library and this is beginning of a loop, so there's a jump to the same point from a point forward to this code. And it gets incremented by r3 meantime. This still does not explain the r2 corruption.

Try to run until the hardfault, and show us content of registers and stack as it is in the hardfault, without walking back in the debugger.

JW

Tail chaining is where it maintains a.dirty register context across multiple IRQ handlers which have NO expectations on initial register values at entry.

When all pending interrupts are cleared the register context is restored and the processor ​returns to where it left off.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..
AlbertoGarlassi
Associate III

After many tests and reasoning nothing is clear.

We ended up including the relevant CMSIS code in our project, don't link the precompiled lib.

No hard faults anymore, but this could be the result of moving things around but the root cause is still there.

Anyway that's the best we could do for now.

Every time we had a hard fault it was caused by line 182 of arm_cfft_radix8_f32.c

Hard to believe that a timing issue or stack corruption or whatever always kicks in exactly at that line.

I will report if there are any news

Thanks to everybody.

Alberto

Piranha
Chief II

Are you using RTOS, particularly the FreeRTOS? The V10.5.0 recently fixed a pretty nasty bug.

What about a potential stack overflows?

Otherwise it seems that the issue also could be related to ABI settings for a compiler, runtime, floating point or some other library.

AlbertoGarlassi
Associate III

No, we don't use RTOS. Thanks for pointing out this bug, I will check if somehow it could be relevant.

It looks it is not a stack overflow. The stack and SP register seems to be OK after the hard fault. But there could be something that the debugger is hiding, like a cache coherence issue. DMA should not use the RAM potentially involved, but who knows.

I'm writing a separate project to exercise the FFT routine in CMSIS.

Regards