2022-05-25 04:44 AM
This was a ticket I opened this year regarding a fatal flaw in FREE RTOS which I didn't get a final answer or proper solution. Even so the team that I've been working with manage to get a workaround that for now has been shown to be very effective, but we need a permanent solution. Since if a watchdog is running and many programmers don't know about this issue, the device might stall or brick until a full power cycle is done or until it's reprogrammed again if an upgrade (Firmware over the air for example) was in progress.
Here's the post, and for the ST team here's the case number 00153229
"Greetings,
Currently we are developing the firmware of medical device using this microcontroller (STM32L4S9AI) and we are currently having the following issue, 50% of the times we reset the core, the device doesn't boot, we found that it's getting stalled on the pendingSV related functions.
Since the hardware is from a client and there's a non disclosure agreement, we are limited to the information we can provide.
But here's the current information about the configuration that I can provide without prejudice:
This problem started once we start using the stop modes and other hardware features, but we realised that it was just a coincidence, and further testing shows that if the boot is successful, the next time we reset the watch it will stall once "xPortPendSVHandler" is called. The first two firmwares run perfectly but once the transition to the third firmware occurs, it stalls.
We manage to make a first work around that actually works, but we are afraid that might be a temporary solution as the program grows.
One fix is, for the standby and reboot we de-initialize all hardware and make a reset (works 100% of the times) other is clear the ram memory in the pre-bootloader, but its not working properly all the time.
Thanks.
Best regards,
André Pereira"
Here's one of my answers with an important detail:
"Greetigns Mr. ########,
In a powerup cycle everything works fine because all the memory is being written from scratch, but when resetting, part of the ram memory for some reason persists causing hardfault when it calls pending_SV because it has a register that is active/true/enabled it for some reason during the boot. We also suspect there's also a ghost process running that triggers the Hardfault."
2022-05-26 02:03 AM
During the yellow marked code, currently I can't precise where exactly happens, but I do know that the zero fill is interrupted and goes to the hardfault handler. This failure happens only in the application before our code runs.
2022-05-26 02:22 AM
"I can't precise where exactly happens"
Surely, the Hard Fault handler tells you exactly where it was called from?
2022-05-26 04:33 AM
@Andrew Neil during the Zero fill, a pendingSVC interruption occurs and the hardfault happens. So this is why I'm saying, it's ST and FreeRTOS responsibility to solve this issue. Other detail is that this problem didn't happen immediately, no new peripherals were added, only code size and functionalities where added.
2022-05-26 04:49 AM
"I'm saying, it's ST and FreeRTOS responsibility to solve this issue."
Not sure how this thread helps with that, as you say you've already got a case open with ST?
If you want help here, you're going to have to give a lot more details of your code, and your hardware, etc ...
2022-05-26 07:05 AM
Or even answer the questions already asked by people trying to help you. So one more time:
Does this happen in your pre-bootloader, bootloader, or application?
Do either your pre-bootloader or bootloader use FreeRTOS, or only your main application?
Make DARN SURE that your pre-boot and bootlloaders are disabling any interrupts that they enabled before jumping to your application.
If the pendSV interrupt is firing, then my first presumption is that either (a) some code BEFORE that startup code you highlighted ran FreeRTOS and somehow it left some interrupt enabled that ran some (OLD) task that set the pendSV bit to call the RTOS, or (b) you have a pointer issue somewhere before that code (perhaps in the bootloader) that is setting the pendSV bit by accident (highly unlikely). Since this happens in the startup code, the interrupt vector table pointer has (probably) not been updated yet, so any interrupt or fault that happens will jump through an OLD vector table. For example, if this occurs in your application, the pendSV vector will be fetched from the vector table used by your pre-boot or bootloader. As will any OTHER interrupt that might still be enabled (systick, timer, UART, etc.).
2022-05-26 07:28 AM
That's what I've been doing so far. But to be clear here are my answers again in a way everyone understands what I'm answering.
"And your workaround is to clear all (?) RAM on startup? In the pre-bootloader? Bootloader? App?"
The workaround is in the pre-bootloader.
"Do either your pre-bootloader or bootloader use FreeRTOS, or only your main application?"
"No, only in the App"
"Make DARN SURE that your pre-boot and bootlloaders are disabling any interrupts that they enabled before jumping to your application."
First thing we tried, didn't work.
"If the pendSV interrupt is firing, then my first presumption is that either (a) some code BEFORE that startup code you highlighted ran FreeRTOS and somehow it left some interrupt enabled that ran some (OLD) task that set the pendSV bit to call the RTOS, or (b) you have a pointer issue somewhere before that code (perhaps in the bootloader) that is setting the pendSV bit by accident (highly unlikely). Since this happens in the startup code, the interrupt vector table pointer has (probably) not been updated yet, so any interrupt or fault that happens will jump through an OLD vector table. For example, if this occurs in your application, the pendSV vector will be fetched from the vector table used by your pre-boot or bootloader. As will any OTHER interrupt that might still be enabled (systick, timer, UART, etc.)."
De-initializing everything before firmware transitions, and clear pending IRQ requests didn't work either, One of my assumptions way before we found the weird behavior shown in the yellow part of the code in the picture above, was a possible microcontroller protection because we were changing the uC frequency causing an hadfault, but no, the problem wasn't there either.
2022-05-26 08:38 AM
I just re-read your original post. Sorry - I missed where you initially said this happened in the app, not the pre or bootloader.
"the next time we reset the watch"
Does this mean "next time we reset the watchdog" (i.e. next time the watchdog timer causes a reset)? Or does this mean your device is a "watch" style device, and the next time it resets?
"One fix is, for the standby and reboot we de-initialize all hardware and make a reset "
How do you "reboot" if not by causing a reset (like via the NVIC AIRCR register)? Which would already cause all (internal to the CPU) hardware to reset/de-initialize.
2022-05-26 10:24 AM
"Does this mean "next time we reset the watchdog" (i.e. next time the watchdog timer causes a reset)? Or does this mean your device is a "watch" style device, and the next time it resets?
It was not implemented at the time, so no issues here, but the device indeed is a watch with very unique features that I'm not allowed to talk about.
"How do you "reboot" if not by causing a reset (like via the NVIC AIRCR register)? Which would already cause all (internal to the CPU) hardware to reset/de-initialize."
Standard existing commands for software reset, and it worked 100% of the times, this was to reboot the device safely in normal conditions.
2022-05-26 05:18 PM
There is nothing here indicating a flaw in FreeRTOS.
As you possess all the source code, and can (presumably) reproduce the fault, you might instrument that code to isolate its cause.
Instrument means to change the code to collect and expose details to assist you to find what is happening and why.
How are you resetting the core?
Does the fault occur without the watchdog enabled? Instrumenting and debugging would be easier without it.
Disable/remove all features of your software that are not required to reproduce the fault. Just the exercise of ascertaining whether a feature is required or not would assist you to isolate the cause.