cancel
Showing results for 
Search instead for 
Did you mean: 

Kernel sometimes fails to reboot the system

Grodriguez
Senior

Hello all,

I am facing a strange issue where the kernel sometimes fails to reboot the system. Here is what I am seeing.

I use busybox reboot -f from the command line so that the reboot request goes directly to the kernel, i.e. no init/systemd involved. This simplifies things.

I am tracing the reboot syscall up to the psci_sys_reset function of the PSCI device driver (drivers/firmware/psci/psci.c). I am toggling a GPIO just before invoking invoke_psci_fn:

static int psci_sys_reset(struct notifier_block *nb, unsigned long action,
              void *data)
{
    pr_info("reset!\n");
    
    if ((reboot_mode == REBOOT_WARM || reboot_mode == REBOOT_SOFT) &&
        psci_system_reset2_supported) {
        /*
         * reset_type[31] = 0 (architectural)
         * reset_type[30:0] = 0 (SYSTEM_WARM_RESET)
         * cookie = 0 (ignored by the implementation)
         */
        invoke_psci_fn(PSCI_FN_NATIVE(1_1, SYSTEM_RESET2), 0, 0, 0);
    } else {
        // TOGGLE GPIO HERE
        invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0);
    }
 
    return NOTIFY_DONE;
}

The gpio is toggled so I know for sure that this point is being reached.

Then invoke_psci_fn is called; I assume this then calls into sp-min (not using Optee).

Sometimes this results in a proper reboot, as can be seen in the TF-A trace output:

INFO:  Reset reason (0x54):

INFO:   System reset generated by MPU (MPSYSRST)

But sometimes it doesn't seem to work; the system will then hang for some time, e.g. 20-25 seconds, and then reboot when the watchdog triggers. In these cases, the TF-A trace output shows:

INFO:  Reset reason (0x214):

INFO:   IWDG2 Reset (rst_iwdg2)

Is someone else experiencing this? Can someone help?

Guillermo

1 ACCEPTED SOLUTION

Accepted Solutions
PatrickD
ST Employee

After further checks with Grodriguez (Thanks to him)

Issus in console flush function n TF-A SP_MIN is confirmed when TF-A is compiled in DEBUG mode

=> psci_system_reset

==> console_flush (with console_state = CONSOLE_FLAG_RUNTIME)

===> console_stm32_core_flush

drivers/st/uart/aarch32/stm32_console.S,, registered in console_stm32_register

when the input clock of UART is not activated => the flush function cause the problem

in drivers/st/uart/aarch32/stm32_console.S

In OpenSTLinux delivery it is partially managed by TimeOut but that cause panic => waiting watchdog

this patch is not upstream until now....

@@ -209,7 +213,10 @@ func console_stm32_core_flush

  ASM_ASSERT(ne)

 #endif /* ENABLE_ASSERTIONS */

  /* Check Transmit Data Register Empty */

+ mov r2, #USART_TIMEOUT

 txe_loop_3:

+ subs r2, r2, #1

+ beq plat_panic_handler

  ldr r1, [r0, #USART_ISR]

  tst r1, #USART_ISR_TXE

  beq txe_loop_3

We are thinking about the clean solution, for example

  • check if UART is clocked / enabled in fetch function => it not notyhing to DO
  • remove panic on flush timeout (no raison to block TF-A on flush error)

Regards

Patrick

View solution in original post

4 REPLIES 4
Bernard PUEL
ST Employee

Hello,

do you get the same blocking behavior with the standard command: "reboot" i.e "/bin/systemctl" ?

It is doing many clean stop of the services and umount the partition before doing HW reboot ...

Grodriguez
Senior

Hello,

Thank you for your answer.

We are not using systemd -- we are using busybox init instead.

However the setup is similar: if we use reboot instead of reboot -f, the init (pid 1) process is signalled, which then stops services, kills any remaining processes, and unmount all filesystems. The result is the same, all the previous actions succeed, then init makes the reboot syscall, and the system hangs until the reboot triggers.

We have verified that the kernel is calling into SPMIN, but SPMIN is apparently not completing the reset. We are trying to see what is going on inside SPMIN...

Guillermo

PatrickD
ST Employee

After further checks with Grodriguez (Thanks to him)

Issus in console flush function n TF-A SP_MIN is confirmed when TF-A is compiled in DEBUG mode

=> psci_system_reset

==> console_flush (with console_state = CONSOLE_FLAG_RUNTIME)

===> console_stm32_core_flush

drivers/st/uart/aarch32/stm32_console.S,, registered in console_stm32_register

when the input clock of UART is not activated => the flush function cause the problem

in drivers/st/uart/aarch32/stm32_console.S

In OpenSTLinux delivery it is partially managed by TimeOut but that cause panic => waiting watchdog

this patch is not upstream until now....

@@ -209,7 +213,10 @@ func console_stm32_core_flush

  ASM_ASSERT(ne)

 #endif /* ENABLE_ASSERTIONS */

  /* Check Transmit Data Register Empty */

+ mov r2, #USART_TIMEOUT

 txe_loop_3:

+ subs r2, r2, #1

+ beq plat_panic_handler

  ldr r1, [r0, #USART_ISR]

  tst r1, #USART_ISR_TXE

  beq txe_loop_3

We are thinking about the clean solution, for example

  • check if UART is clocked / enabled in fetch function => it not notyhing to DO
  • remove panic on flush timeout (no raison to block TF-A on flush error)

Regards

Patrick

Grodriguez
Senior

Thank you @Community member​. We can now work around this by patching console_stm32_core_flush:

  • Check UE bit in UART_CR1, if cleared, then do nothing - return immediately
  • On timeout, return immediately instead of calling plat_panic_handler

This is tested and working.

Thanks again!

Guillermo