cancel
Showing results for 
Search instead for 
Did you mean: 

STM32MP257 - Unable to handle kernel paging request at virtual address dead000000000108

altsir_sga
Associate II

Hi,

I have few custom boards with STM32MP257 mostly running PTP related tasks. Once in a while (once in a few days of continuous run) I have noticed the following error in the dmesg (as it seems UART related - I have 4 UARTs enabled in the DT, while 3 of them have DMA enabled (the console one does not have the DMA enabled), but only one UART (other than console) is being used in the system for RX only):

[251158.390476] Unable to handle kernel paging request at virtual address dead000000000108
[251158.392934] Mem abort info:
[251158.395854] ESR = 0x0000000096000044
[251158.399679] EC = 0x25: DABT (current EL), IL = 32 bits
[251158.405114] SET = 0, FnV = 0
[251158.408235] EA = 0, S1PTW = 0
[251158.411457] FSC = 0x04: level 0 translation fault
[251158.416489] Data abort info:
[251158.419408] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[251158.425041] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[251158.430173] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[251158.435607] [dead000000000108] address between user and kernel address ranges
[251158.442949] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
[251158.449289] Modules linked in: marvell cfg80211 rfkill usb_f_ncm u_ether libcomposite spidev crct10dif_ce phy_stm32_usb2phy rtc_stm32 mr75203 hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev at24 videobuf2_common mc stm32_cryp dwc3_stm32 stm32_lptimer crypto_engine spi_stm32 libdes stm32_crc32 stm32_timers mailbox_client_cdev stm32_m0_rproc stm32_rproc irq_rpmsg sch_fq_codel nfnetlink ip_tables ipv6
[251158.490832] CPU: 0 PID: 243312 Comm: kworker/0:2 Not tainted 6.6.78-gaf978724e078-dirty #1
[251158.499183] Hardware name: STMicroelectronics STM32MP257F CARD (DT)
[251158.505923] Workqueue: pm pm_runtime_work
[251158.510064] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[251158.517110] pc : vchan_tx_submit+0x58/0xd0
[251158.521343] lr : vchan_tx_submit+0x28/0xd0
[251158.525574] sp : ffff8000828cbba0
[251158.528997] x29: ffff8000828cbba0 x28: 0000000000000000 x27: 0000000000000000
[251158.536250] x26: ffff0000040fdcf4 x25: 0000000000000000 x24: 0000000000000000
[251158.543502] x23: 0000000000000000 x22: ffff000006d5c780 x21: ffff00000494a6a0
[251158.550754] x20: 000000000003d505 x19: ffff0000064823c0 x18: 0000000000000000
[251158.558006] x17: ffff7fffb5e89000 x16: ffff800080000000 x15: 001015373f0be4e6
[251158.565259] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000001
[251158.572510] x11: 0000000000000000 x10: 0000000000000003 x9 : 0000000000000000
[251158.579762] x8 : ffff00000494a748 x7 : 0000000000084000 x6 : 0000000000000800
[251158.587014] x5 : ffff00000494a758 x4 : dead000000000100 x3 : dead000000000122
[251158.594266] x2 : ffff000006482438 x1 : 0000000000000000 x0 : ffff00000494a740
[251158.601518] Call trace:
[251158.604038] vchan_tx_submit+0x58/0xd0
[251158.607968] stm32_usart_rx_dma_start_or_resume+0xd8/0x234
[251158.613509] stm32_usart_runtime_resume+0x78/0xe8
[251158.618344] pm_generic_runtime_resume+0x2c/0x44
[251158.623081] __genpd_runtime_resume+0x30/0x80
[251158.627517] genpd_runtime_resume+0x114/0x29c
[251158.631954] __rpm_callback+0x48/0x1d8
[251158.635788] rpm_callback+0x6c/0x78
[251158.639420] rpm_resume+0x490/0x6b4
[251158.642952] pm_runtime_work+0x84/0xc8
[251158.646882] process_one_work+0x144/0x29c
[251158.650917] worker_thread+0x324/0x43c
[251158.654848] kthread+0x110/0x114
[251158.658077] ret_from_fork+0x10/0x20
[251158.661813] Code: 1a9fc694 b9001074 b8078454 a9478e64 (f9000483)
[251158.668051] ---[ end trace 0000000000000000 ]---
[251158.672781] note: kworker/0:2[243312] exited with irqs disabled
[251158.679385] note: kworker/0:2[243312] exited with preempt_count 1

after this event, the system begins acting strangely - the heartbeat led blinking much faster than usual (although top command does not seem to show high CPU load, some services cannot be stopped - like I can run journalctl of some process, but I cannot stop it, the process just getting hang and kill does not help. 

Any ideas of what can cause it?

 

Thanks!

Alexey.

3 REPLIES 3
altsir_sga
Associate II

After further investigating the issue, from the call trace it seems to be related to waking up from suspension. My UART receives NMEA data once in the second so I guess the UART goes to sleep between the NMEA packets and this happens once in a second.

May be it will help....

Alexey.

abuzarra
Associate

Hi,

We are facing a similar issue with USART1 (UART) connected to a Bluetooth chip on an STM32MP255 device. In many cases, after a suspend/resume cycle the system hits a kernel crash/oops, and the UART stops working afterwards (the Bluetooth interface is no longer functional):

[  822.398942] Unable to handle kernel paging request at virtual address dead000000000108
[  822.401306] Mem abort info:
[  822.404026]   ESR = 0x0000000096000044
[  822.407750]   EC = 0x25: DABT (current EL), IL = 32 bits
[  822.413085]   SET = 0, FnV = 0
[  822.416107]   EA = 0, S1PTW = 0
[  822.419228]   FSC = 0x04: level 0 translation fault
[  822.424059] Data abort info:
[  822.426878]   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[  822.432311]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  822.437343]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  822.442676] [dead000000000108] address between user and kernel address ranges
[  822.449718] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
[  822.455957] Modules linked in: hci_uart btbcm brcmfmac_cyw brcmfmac brcmutil galcore(O) stm32_dcmipp stm32_lptimer stm32_csi stm32_cryp crypto_engine stm32_crc32 spi_stm32 optee_rng nfnetlink
[  822.472908] CPU: 0 PID: 89 Comm: kworker/0:7 Tainted: G        W  O       6.6.78-dey-37241-g4adaf6e6e4e4 #1
[  822.490211] Workqueue: pm pm_runtime_work
[  822.494154] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  822.501099] pc : vchan_tx_submit+0x58/0x8c
[  822.505137] lr : vchan_tx_submit+0x28/0x8c
[  822.509272] sp : ffff800081e9bba0
[  822.512494] x29: ffff800081e9bba0 x28: 0000000000000000 x27: 0000000000000000
[  822.519646] x26: ffff000005d6ecf4 x25: 0000000000000000 x24: 0000000000000000
[  822.526698] x23: 0000000000000000 x22: ffff000006b4a120 x21: ffff0000064d26a0
[  822.533850] x20: 00000000000000db x19: ffff00000948d240 x18: 0000000000000000
[  822.540901] x17: 0000000000000000 x16: 0000000000000000 x15: 000022af3f21217e
[  822.548052] x14: ffff000005ee6810 x13: 0000000000000000 x12: 0000000000000001
[  822.555104] x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000002
[  822.562255] x8 : ffff0000064d2748 x7 : 0000000000084000 x6 : 0000000000000800
[  822.569307] x5 : ffff0000064d2758 x4 : dead000000000100 x3 : dead000000000122
[  822.576459] x2 : ffff00000948d2b8 x1 : 0000000000000000 x0 : ffff0000064d2740
[  822.583512] Call trace:
[  822.585931]  vchan_tx_submit+0x58/0x8c
[  822.589664]  stm32_usart_rx_dma_start_or_resume+0xd8/0x234
[  822.595108]  stm32_usart_runtime_resume+0x78/0xe8
[  822.599746]  pm_generic_runtime_resume+0x2c/0x44
[  822.604381]  __genpd_runtime_resume+0x30/0x80
[  822.608715]  genpd_runtime_resume+0x110/0x244
[  822.613049]  __rpm_callback+0x48/0x1d8
[  822.616781]  rpm_callback+0x6c/0x78
[  822.620211]  rpm_resume+0x490/0x6b4
[  822.623641]  pm_runtime_work+0x84/0xc8
[  822.627372]  process_one_work+0x144/0x29c
[  822.631409]  worker_thread+0x31c/0x434
[  822.635142]  kthread+0x110/0x114
[  822.638272]  ret_from_fork+0x10/0x20
[  822.641809] Code: 1a9fc694 b9001074 b8078454 a9478e64 (f9000483) 
[  822.647947] ---[ end trace 0000000000000000 ]---

We observed that reverting ST’s commit https://github.com/STMicroelectronics/linux/commit/83771e186d2abeccc38b0a5e6841b068499622ca improves the behavior (the issue becomes less frequent / the system is more stable), but the problem is not fully resolved and we still hit the same failure after suspend/resume.

 

Any help would be appreciated,

Arturo.

 

altsir_sga
Associate II

Hi,

I have ended up disabling the suspend mode for all UARTS (as my device is always powered, I don't need it):

 

diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 5f51d9397..60782e2fe 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -2192,10 +2192,11 @@ static int stm32_usart_serial_probe(struct platform_device *pdev)
    if (ret)
        goto err_rtor;
 
+   pm_runtime_disable(&pdev->dev);
    pm_runtime_set_active(&pdev->dev);
-   pm_runtime_use_autosuspend(&pdev->dev);
-   pm_runtime_set_autosuspend_delay(&pdev->dev, STM32_USART_AUTOSUSPEND_DELAY_MS);
-   pm_runtime_enable(&pdev->dev);
+   //pm_runtime_use_autosuspend(&pdev->dev);
+   //pm_runtime_set_autosuspend_delay(&pdev->dev, STM32_USART_AUTOSUSPEND_DELAY_MS);
+   //pm_runtime_enable(&pdev->dev);
 
    clk_disable_unprepare(stm32port->clk);

 

Didn't have any issues since then.

 

Alexey.