2022-05-25 04:44 AM
This was a ticket I opened this year regarding a fatal flaw in FREE RTOS which I didn't get a final answer or proper solution. Even so the team that I've been working with manage to get a workaround that for now has been shown to be very effective, but we need a permanent solution. Since if a watchdog is running and many programmers don't know about this issue, the device might stall or brick until a full power cycle is done or until it's reprogrammed again if an upgrade (Firmware over the air for example) was in progress.
Here's the post, and for the ST team here's the case number 00153229
"Greetings,
Currently we are developing the firmware of medical device using this microcontroller (STM32L4S9AI) and we are currently having the following issue, 50% of the times we reset the core, the device doesn't boot, we found that it's getting stalled on the pendingSV related functions.
Since the hardware is from a client and there's a non disclosure agreement, we are limited to the information we can provide.
But here's the current information about the configuration that I can provide without prejudice:
This problem started once we start using the stop modes and other hardware features, but we realised that it was just a coincidence, and further testing shows that if the boot is successful, the next time we reset the watch it will stall once "xPortPendSVHandler" is called. The first two firmwares run perfectly but once the transition to the third firmware occurs, it stalls.
We manage to make a first work around that actually works, but we are afraid that might be a temporary solution as the program grows.
One fix is, for the standby and reboot we de-initialize all hardware and make a reset (works 100% of the times) other is clear the ram memory in the pre-bootloader, but its not working properly all the time.
Thanks.
Best regards,
André Pereira"
Here's one of my answers with an important detail:
"Greetigns Mr. ########,
In a powerup cycle everything works fine because all the memory is being written from scratch, but when resetting, part of the ram memory for some reason persists causing hardfault when it calls pending_SV because it has a register that is active/true/enabled it for some reason during the boot. We also suspect there's also a ghost process running that triggers the Hardfault."
2022-05-25 05:17 AM
Not sure I learned much from reading that.
Team of people spend six months unsuccessful debugging their project..
2022-05-25 07:28 AM
As @Community member, I don't see anything in that which would allow anyone here to say what's going on.
@André Pereira "a fatal flaw in FREE RTOS"
FreeRTOS is an independent 3rd-party product - nothing to do with ST.
So, if you've identified a flaw in FreeRTOS, you need to report that to them:
https://www.freertos.org/RTOS-contact-and-support.html
"when resetting, part of the ram memory for some reason persists"
If power is retained, then all of the RAM will persist - that is to be expected.
"suspect there's also a ghost process running"
What makes you think that? What debugging have you done to find it?
2022-05-25 07:31 AM
The part where this was failing isn't caught simply by debugging and happens during the third part, and we didn't dedicate 6 months trying to solve this. We solved it but as mentioned, it's a workaround and for all users, it is required a permanent answer and not a workaround developed by us, the client.
2022-05-25 07:47 AM
Ok, but unless I missed some major plot points here, you never determined what the actual cause was, and it was attributed to some "ghost" process? Not sure how that's going to pass muster in a certification report.
And this for a system where you have access to all the source code?
Unless you can actually attribute it to a hardware failure/short coming in the IC, I'm not sure why ST would get deeply involved in debugging.
What compiler/tool chain are you using? GNU/GCC based?
Any other stages of the loader using the RTOS, or HAL/MX initialization?
2022-05-25 07:50 AM
After careful analysis it's safe to say @Andrew Neil that it's an issue for both to solve because it happens in a part of the code that is made by ST and only where the firmware with RTOS runs, the other two don't have this issue and transition even before our work around was implemented.
2022-05-25 08:01 AM
but, like @Community member , I don't see that you've identified what the actual problem is.
2022-05-25 08:03 AM
Yes, we did. Bellow there's a picture of the "location" where the problem happens.
Everything we use is only tools from ST. The yellow marked part never actually finishes, and if you have a power cycle, you don't have a problem, however, if you press reset, it will stall. So currently we perform by force a ram wipe in the startup of the first firmware during the transition to the next.
2022-05-25 08:10 AM
But we did, check my answer to Tesla Delorean.
2022-05-25 08:33 AM
Things are still not clear (at least to me).
That code is the standard C/C++ startup code - copying intiialized variables from FLASH to RAM, then zero-ing the uninitialized variables (BSS segment). Exactly how and where is this hanging? Is it stuck in an infinite loop (somehow??????). And is the code that hangs in your pre-bootloader, bootloader or application?
And your workaround is to clear all (?) RAM on startup? In the pre-bootloader? Bootloader? App?
And just for kicks, what is connected to the CPU's NRST line on the PCB? Is there anything that would prevent the CPU from driving that line low?