2023-03-28 02:57 AM
Hello all,
I am facing a strange issue where the kernel sometimes fails to reboot the system. Here is what I am seeing.
I use busybox reboot -f from the command line so that the reboot request goes directly to the kernel, i.e. no init/systemd involved. This simplifies things.
I am tracing the reboot syscall up to the psci_sys_reset function of the PSCI device driver (drivers/firmware/psci/psci.c). I am toggling a GPIO just before invoking invoke_psci_fn:
static int psci_sys_reset(struct notifier_block *nb, unsigned long action,
void *data)
{
pr_info("reset!\n");
if ((reboot_mode == REBOOT_WARM || reboot_mode == REBOOT_SOFT) &&
psci_system_reset2_supported) {
/*
* reset_type[31] = 0 (architectural)
* reset_type[30:0] = 0 (SYSTEM_WARM_RESET)
* cookie = 0 (ignored by the implementation)
*/
invoke_psci_fn(PSCI_FN_NATIVE(1_1, SYSTEM_RESET2), 0, 0, 0);
} else {
// TOGGLE GPIO HERE
invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0);
}
return NOTIFY_DONE;
}
The gpio is toggled so I know for sure that this point is being reached.
Then invoke_psci_fn is called; I assume this then calls into sp-min (not using Optee).
Sometimes this results in a proper reboot, as can be seen in the TF-A trace output:
INFO: Reset reason (0x54):
INFO: System reset generated by MPU (MPSYSRST)
But sometimes it doesn't seem to work; the system will then hang for some time, e.g. 20-25 seconds, and then reboot when the watchdog triggers. In these cases, the TF-A trace output shows:
INFO: Reset reason (0x214):
INFO: IWDG2 Reset (rst_iwdg2)
Is someone else experiencing this? Can someone help?
Guillermo
Solved! Go to Solution.
2023-03-29 05:37 AM
After further checks with Grodriguez (Thanks to him)
Issus in console flush function n TF-A SP_MIN is confirmed when TF-A is compiled in DEBUG mode
=> psci_system_reset
==> console_flush (with console_state = CONSOLE_FLAG_RUNTIME)
===> console_stm32_core_flush
drivers/st/uart/aarch32/stm32_console.S,, registered in console_stm32_register
when the input clock of UART is not activated => the flush function cause the problem
in drivers/st/uart/aarch32/stm32_console.S
In OpenSTLinux delivery it is partially managed by TimeOut but that cause panic => waiting watchdog
this patch is not upstream until now....
@@ -209,7 +213,10 @@ func console_stm32_core_flush
ASM_ASSERT(ne)
#endif /* ENABLE_ASSERTIONS */
/* Check Transmit Data Register Empty */
+ mov r2, #USART_TIMEOUT
txe_loop_3:
+ subs r2, r2, #1
+ beq plat_panic_handler
ldr r1, [r0, #USART_ISR]
tst r1, #USART_ISR_TXE
beq txe_loop_3
We are thinking about the clean solution, for example
Regards
Patrick
2023-03-28 09:28 AM
Hello,
do you get the same blocking behavior with the standard command: "reboot" i.e "/bin/systemctl" ?
It is doing many clean stop of the services and umount the partition before doing HW reboot ...
2023-03-29 02:28 AM
Hello,
Thank you for your answer.
We are not using systemd -- we are using busybox init instead.
However the setup is similar: if we use reboot instead of reboot -f, the init (pid 1) process is signalled, which then stops services, kills any remaining processes, and unmount all filesystems. The result is the same, all the previous actions succeed, then init makes the reboot syscall, and the system hangs until the reboot triggers.
We have verified that the kernel is calling into SPMIN, but SPMIN is apparently not completing the reset. We are trying to see what is going on inside SPMIN...
Guillermo
2023-03-29 05:37 AM
After further checks with Grodriguez (Thanks to him)
Issus in console flush function n TF-A SP_MIN is confirmed when TF-A is compiled in DEBUG mode
=> psci_system_reset
==> console_flush (with console_state = CONSOLE_FLAG_RUNTIME)
===> console_stm32_core_flush
drivers/st/uart/aarch32/stm32_console.S,, registered in console_stm32_register
when the input clock of UART is not activated => the flush function cause the problem
in drivers/st/uart/aarch32/stm32_console.S
In OpenSTLinux delivery it is partially managed by TimeOut but that cause panic => waiting watchdog
this patch is not upstream until now....
@@ -209,7 +213,10 @@ func console_stm32_core_flush
ASM_ASSERT(ne)
#endif /* ENABLE_ASSERTIONS */
/* Check Transmit Data Register Empty */
+ mov r2, #USART_TIMEOUT
txe_loop_3:
+ subs r2, r2, #1
+ beq plat_panic_handler
ldr r1, [r0, #USART_ISR]
tst r1, #USART_ISR_TXE
beq txe_loop_3
We are thinking about the clean solution, for example
Regards
Patrick
2023-03-29 06:25 AM
Thank you @Community member. We can now work around this by patching console_stm32_core_flush:
This is tested and working.
Thanks again!
Guillermo