Does HAL callbacks save r0-r3 registers? Is it needed?

AlbertoGarlassi · ‎2023-03-03

Hello,

We have random HardFaults on an STM32H750, maybe after several hours of uptime.

It seems caused by an access to a wrong RAM location, whose address is fetched from the stack and stored in r2.

But the stack seems OK.

This happens in a function frequently interrupted by a higher priority HAL DMA callback.

Inspecting it AFAIK registers r0-r3 are not preserved by GCC optimized at -O2.

It could be that sometimes this interrupt takes place between loading of r2 and its use, overwriting it with a wrong value.

Inside my callback function there are calls to other functions. Is this OK? I read somewhere that GCC saves only the registers it uses in the main function of the ISR and doesn't take care of register's use in called function. Don't know if it makes sense.

Adding __attribute__((interrupt)) does not seem to make any difference.

For now I added push and pop of the scratch registers in the callback and it seems to work, but I'm not completely sure because a slight timing difference could be enough to mask the problem.

It is also inconvenient, because the HAL library needs to be patched.

I am not convinced of anything I wrote before because it would break most code and it would have been spotted long ago.

Any comment?

Thanks and regards.

Alberto

KnarfB · ‎2023-03-03

The callback will be at 3rd or so stack nesting levels below the native interrupt handler. So it would be way to late doing any interrupt related register fixing here. Using floating point requires special attention. Some 64-bit or larger non-atomic assignments that lead to temporarily inconsistent values?

In general, no special coding rules are needed for interrupt handlers, see https://interrupt.memfault.com/blog/arm-cortex-m-exceptions-and-nvic for a nice intro.

[Edit:] gcc has some (intrusive) stack-protection features, see https://gcc.gnu.org/onlinedocs/gcc-10.4.0/gcc/Instrumentation-Options.html

Here is some simple code to pre-fill the stack area with some magic value for diy dynamic stack size analysis:

{
  extern uint32_t _estack;
  register char * stack_ptr asm("sp");
  char * heap_top = _sbrk(0);
  printf("stack: min: %p curr: %p top: %p\n\r", heap_top, stack_ptr, &_estack );
 
  for( uint8_t *p = heap_top+4; p < stack_ptr; p++ ) {
    *p = 0xa5;
  }
}

hth

KnarfB

gbm · ‎2023-03-03

There is some error in your code, probably resulting from using an incorrect pointer value or a conflict between declared and actual function argument type (.h file not matching .c function declaration). Check the warnings - there should be none reported by the compilers.

My STM32 stuff on github - compact USB device stack and more: https://github.com/gbm-ii/gbmUSBdevice

waclawek.jan · ‎2023-03-03

> It seems caused by an access to a wrong RAM location, whose address is fetched from the stack and stored in r2.

> But the stack seems OK.

Show.

> Inside my callback function there are calls to other functions.

printf()?

JW

Tesla DeLorean · ‎2023-03-03

There isn't an interrupt attribute because the MCU core pushes R0 thru R3 and LR, etc as the NVIC pulls the Handler entry from the vector table.

I 'd say look elsewhere for stack corruption, either out-of-bound or excessive local/auto variable depth.

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

S.Ma · ‎2023-03-04

First sanity check would be stack size.

Second would be functions using global/static variables

Third would be variables which would require volatile yet aren't.

AlbertoGarlassi · ‎2023-03-04

Thanks for your attention.

Next week I will check your kind advices.

For now please have a look at these screenshots, The first is after placing a breakpoint on the one and only line that sometimes triggers the Hardfault.

The second is taken after a Hardfault. As you can see sp is the same, the stack looks OK, r2 is loaded at 0x0800ffd8 but something happens before reaching 0x800ff4.

Since r2 is, as brilliantly pointed out, automatically saved in ISRs, the only way to achieve this I can imagine is that in some way there is a jump from somewhere landing exactly there, and the ldr r2 line is never executed. Hard to believe because, apart from this hardfault, the board performs normally, and by patching with push or shuffling the code it runs forever.

Some details I omitted:

The lower priority function where the hardfault happens is an ISR too, called by a software interrupt from the higher priority ISR,.

We have a fast, high priority DMA triggered ISR that acquires a buffer filled by DMA with data from ADC. This buffer is decimated and the result is written as a single entry in another array. The ISR returns.

When, after 512 fast ISRs the output array is complete, the lower priority function is triggered to be executed by an EXTI->SWIER1 software interrupt request, at a later time.

In other word we have a long, low priority ISR interrupted several times by the fast IRQ.

Things are somewhat more complicated because the code of the hardfault comes precompiled from the CMSIS library. I grabbed the source code and used it in a C file and it didn't trigger the hardfault, but who knows.

Yes, floating point is heavily used in both ISRs.

I-Cache and D-Cache are enabled. Hopefully are correctly invalidated when needed. Don't know if it matters.

There are other ISR and DMA transfers taking place. USB CDC is used.

Thanks for now.

waclawek.jan · ‎2023-03-04

I don't understand the second screenshot: is this something resulting from "walkback"? I don't use Eclipse. Show us content of registers, stack, disasm at the hardfault.

JW

S.Ma · ‎2023-03-05

Try to disable the cache(s) function for differentilal diagnostic.

Piranha · ‎2023-03-05

> I-Cache and D-Cache are enabled. Hopefully are correctly invalidated when needed. Don't know if it matters.

That means it almost definitely is not maintained correctly.

https://community.st.com/s/question/0D53W00001Z9K9TSAV/maintaining-cpu-data-cache-coherence-for-dma-buffers