cancel
Showing results for 
Search instead for 
Did you mean: 

STM32MP257 - Unable to handle kernel paging request at virtual address dead000000000108

altsir_sga
Associate III

Hi,

I have few custom boards with STM32MP257 mostly running PTP related tasks. Once in a while (once in a few days of continuous run) I have noticed the following error in the dmesg (as it seems UART related - I have 4 UARTs enabled in the DT, while 3 of them have DMA enabled (the console one does not have the DMA enabled), but only one UART (other than console) is being used in the system for RX only):

[251158.390476] Unable to handle kernel paging request at virtual address dead000000000108
[251158.392934] Mem abort info:
[251158.395854] ESR = 0x0000000096000044
[251158.399679] EC = 0x25: DABT (current EL), IL = 32 bits
[251158.405114] SET = 0, FnV = 0
[251158.408235] EA = 0, S1PTW = 0
[251158.411457] FSC = 0x04: level 0 translation fault
[251158.416489] Data abort info:
[251158.419408] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[251158.425041] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[251158.430173] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[251158.435607] [dead000000000108] address between user and kernel address ranges
[251158.442949] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
[251158.449289] Modules linked in: marvell cfg80211 rfkill usb_f_ncm u_ether libcomposite spidev crct10dif_ce phy_stm32_usb2phy rtc_stm32 mr75203 hantro_vpu v4l2_jpeg v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videodev at24 videobuf2_common mc stm32_cryp dwc3_stm32 stm32_lptimer crypto_engine spi_stm32 libdes stm32_crc32 stm32_timers mailbox_client_cdev stm32_m0_rproc stm32_rproc irq_rpmsg sch_fq_codel nfnetlink ip_tables ipv6
[251158.490832] CPU: 0 PID: 243312 Comm: kworker/0:2 Not tainted 6.6.78-gaf978724e078-dirty #1
[251158.499183] Hardware name: STMicroelectronics STM32MP257F CARD (DT)
[251158.505923] Workqueue: pm pm_runtime_work
[251158.510064] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[251158.517110] pc : vchan_tx_submit+0x58/0xd0
[251158.521343] lr : vchan_tx_submit+0x28/0xd0
[251158.525574] sp : ffff8000828cbba0
[251158.528997] x29: ffff8000828cbba0 x28: 0000000000000000 x27: 0000000000000000
[251158.536250] x26: ffff0000040fdcf4 x25: 0000000000000000 x24: 0000000000000000
[251158.543502] x23: 0000000000000000 x22: ffff000006d5c780 x21: ffff00000494a6a0
[251158.550754] x20: 000000000003d505 x19: ffff0000064823c0 x18: 0000000000000000
[251158.558006] x17: ffff7fffb5e89000 x16: ffff800080000000 x15: 001015373f0be4e6
[251158.565259] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000001
[251158.572510] x11: 0000000000000000 x10: 0000000000000003 x9 : 0000000000000000
[251158.579762] x8 : ffff00000494a748 x7 : 0000000000084000 x6 : 0000000000000800
[251158.587014] x5 : ffff00000494a758 x4 : dead000000000100 x3 : dead000000000122
[251158.594266] x2 : ffff000006482438 x1 : 0000000000000000 x0 : ffff00000494a740
[251158.601518] Call trace:
[251158.604038] vchan_tx_submit+0x58/0xd0
[251158.607968] stm32_usart_rx_dma_start_or_resume+0xd8/0x234
[251158.613509] stm32_usart_runtime_resume+0x78/0xe8
[251158.618344] pm_generic_runtime_resume+0x2c/0x44
[251158.623081] __genpd_runtime_resume+0x30/0x80
[251158.627517] genpd_runtime_resume+0x114/0x29c
[251158.631954] __rpm_callback+0x48/0x1d8
[251158.635788] rpm_callback+0x6c/0x78
[251158.639420] rpm_resume+0x490/0x6b4
[251158.642952] pm_runtime_work+0x84/0xc8
[251158.646882] process_one_work+0x144/0x29c
[251158.650917] worker_thread+0x324/0x43c
[251158.654848] kthread+0x110/0x114
[251158.658077] ret_from_fork+0x10/0x20
[251158.661813] Code: 1a9fc694 b9001074 b8078454 a9478e64 (f9000483)
[251158.668051] ---[ end trace 0000000000000000 ]---
[251158.672781] note: kworker/0:2[243312] exited with irqs disabled
[251158.679385] note: kworker/0:2[243312] exited with preempt_count 1

after this event, the system begins acting strangely - the heartbeat led blinking much faster than usual (although top command does not seem to show high CPU load, some services cannot be stopped - like I can run journalctl of some process, but I cannot stop it, the process just getting hang and kill does not help. 

Any ideas of what can cause it?

 

Thanks!

Alexey.

1 ACCEPTED SOLUTION

Accepted Solutions

Hello @altsir_sga , @MGO75 ,
Could you please try to apply the enclosed patch on your Linux kernel, and see if you still face troubles ?

We observed a use after free in the suspend / resume sequence due to a missing lock mechanism. This patch should solve the situation.

Kind regards,
Erwan.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

9 REPLIES 9
altsir_sga
Associate III

After further investigating the issue, from the call trace it seems to be related to waking up from suspension. My UART receives NMEA data once in the second so I guess the UART goes to sleep between the NMEA packets and this happens once in a second.

May be it will help....

Alexey.

abuzarra
Associate

Hi,

We are facing a similar issue with USART1 (UART) connected to a Bluetooth chip on an STM32MP255 device. In many cases, after a suspend/resume cycle the system hits a kernel crash/oops, and the UART stops working afterwards (the Bluetooth interface is no longer functional):

[  822.398942] Unable to handle kernel paging request at virtual address dead000000000108
[  822.401306] Mem abort info:
[  822.404026]   ESR = 0x0000000096000044
[  822.407750]   EC = 0x25: DABT (current EL), IL = 32 bits
[  822.413085]   SET = 0, FnV = 0
[  822.416107]   EA = 0, S1PTW = 0
[  822.419228]   FSC = 0x04: level 0 translation fault
[  822.424059] Data abort info:
[  822.426878]   ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[  822.432311]   CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[  822.437343]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  822.442676] [dead000000000108] address between user and kernel address ranges
[  822.449718] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
[  822.455957] Modules linked in: hci_uart btbcm brcmfmac_cyw brcmfmac brcmutil galcore(O) stm32_dcmipp stm32_lptimer stm32_csi stm32_cryp crypto_engine stm32_crc32 spi_stm32 optee_rng nfnetlink
[  822.472908] CPU: 0 PID: 89 Comm: kworker/0:7 Tainted: G        W  O       6.6.78-dey-37241-g4adaf6e6e4e4 #1
[  822.490211] Workqueue: pm pm_runtime_work
[  822.494154] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  822.501099] pc : vchan_tx_submit+0x58/0x8c
[  822.505137] lr : vchan_tx_submit+0x28/0x8c
[  822.509272] sp : ffff800081e9bba0
[  822.512494] x29: ffff800081e9bba0 x28: 0000000000000000 x27: 0000000000000000
[  822.519646] x26: ffff000005d6ecf4 x25: 0000000000000000 x24: 0000000000000000
[  822.526698] x23: 0000000000000000 x22: ffff000006b4a120 x21: ffff0000064d26a0
[  822.533850] x20: 00000000000000db x19: ffff00000948d240 x18: 0000000000000000
[  822.540901] x17: 0000000000000000 x16: 0000000000000000 x15: 000022af3f21217e
[  822.548052] x14: ffff000005ee6810 x13: 0000000000000000 x12: 0000000000000001
[  822.555104] x11: 0000000000000000 x10: 0000000000000001 x9 : 0000000000000002
[  822.562255] x8 : ffff0000064d2748 x7 : 0000000000084000 x6 : 0000000000000800
[  822.569307] x5 : ffff0000064d2758 x4 : dead000000000100 x3 : dead000000000122
[  822.576459] x2 : ffff00000948d2b8 x1 : 0000000000000000 x0 : ffff0000064d2740
[  822.583512] Call trace:
[  822.585931]  vchan_tx_submit+0x58/0x8c
[  822.589664]  stm32_usart_rx_dma_start_or_resume+0xd8/0x234
[  822.595108]  stm32_usart_runtime_resume+0x78/0xe8
[  822.599746]  pm_generic_runtime_resume+0x2c/0x44
[  822.604381]  __genpd_runtime_resume+0x30/0x80
[  822.608715]  genpd_runtime_resume+0x110/0x244
[  822.613049]  __rpm_callback+0x48/0x1d8
[  822.616781]  rpm_callback+0x6c/0x78
[  822.620211]  rpm_resume+0x490/0x6b4
[  822.623641]  pm_runtime_work+0x84/0xc8
[  822.627372]  process_one_work+0x144/0x29c
[  822.631409]  worker_thread+0x31c/0x434
[  822.635142]  kthread+0x110/0x114
[  822.638272]  ret_from_fork+0x10/0x20
[  822.641809] Code: 1a9fc694 b9001074 b8078454 a9478e64 (f9000483) 
[  822.647947] ---[ end trace 0000000000000000 ]---

We observed that reverting ST’s commit https://github.com/STMicroelectronics/linux/commit/83771e186d2abeccc38b0a5e6841b068499622ca improves the behavior (the issue becomes less frequent / the system is more stable), but the problem is not fully resolved and we still hit the same failure after suspend/resume.

 

Any help would be appreciated,

Arturo.

 

altsir_sga
Associate III

Hi,

I have ended up disabling the suspend mode for all UARTS (as my device is always powered, I don't need it):

 

diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 5f51d9397..60782e2fe 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -2192,10 +2192,11 @@ static int stm32_usart_serial_probe(struct platform_device *pdev)
    if (ret)
        goto err_rtor;
 
+   pm_runtime_disable(&pdev->dev);
    pm_runtime_set_active(&pdev->dev);
-   pm_runtime_use_autosuspend(&pdev->dev);
-   pm_runtime_set_autosuspend_delay(&pdev->dev, STM32_USART_AUTOSUSPEND_DELAY_MS);
-   pm_runtime_enable(&pdev->dev);
+   //pm_runtime_use_autosuspend(&pdev->dev);
+   //pm_runtime_set_autosuspend_delay(&pdev->dev, STM32_USART_AUTOSUSPEND_DELAY_MS);
+   //pm_runtime_enable(&pdev->dev);
 
    clk_disable_unprepare(stm32port->clk);

 

Didn't have any issues since then.

 

Alexey.

MGO75
Associate

We have the same problem by using UART5 on STM32MP233 and STM32MP257. After a couple of minutes, sometimes hours, the driver stm32-usart crashed (Oops kernel panic). But it only occurs when using DMA at this uart, if we delete the DMA properties, it runs fine. Thanks to @altsir_sga for commit the patch of the usart driver. Disabling the suspend mode for all uarts is a better choice, i think. So we can use DMA again and it works now without problems.

With best reagrds Maik

[ 6720.887991] Unable to handle kernel paging request at virtual address dead000000000108
[ 6720.890352] Mem abort info:
[ 6720.893072] ESR = 0x0000000096000044
[ 6720.896897] EC = 0x25: DABT (current EL), IL = 32 bits
[ 6720.902131] SET = 0, FnV = 0
[ 6720.905153] EA = 0, S1PTW = 0
[ 6720.908276] FSC = 0x04: level 0 translation fault
[ 6720.913207] Data abort info:
[ 6720.916026] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
[ 6720.921560] CM = 0, WnR = 1, TnD = 0, TagAccess = 0
[ 6720.926592] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 6720.931826] [dead000000000108] address between user and kernel address ranges
[ 6720.938969] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP
[ 6720.945210] Modules linked in: 8021q garp mrp sch_prio sch_mqprio sch_mqprio_lib ksz8863_smi ksz_switch dsa_core bridge stp llc edgx_pfm_lkm(O) stm32_deip(O)
[ 6720.959345] CPU: 0 PID: 479 Comm: kworker/0:0 Tainted: G O 6.6.78-gf01241fbba4d-dirty #1
[ 6720.968802] Hardware name: STMicroelectronics custom STM32CubeMX board - openstlinux-6.6-yocto-scarthgap-mpu-v24.11.06 (DT)
[ 6720.979863] Workqueue: pm pm_runtime_work
[ 6720.983908] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 6720.990854] pc : vchan_tx_submit+0x58/0xd0
[ 6720.994887] lr : vchan_tx_submit+0x28/0xd0
[ 6720.999018] sp : ffff80008452bba0
[ 6721.002341] x29: ffff80008452bba0 x28: 0000000000000000 x27: 0000000000000000
[ 6721.009394] x26: ffff0000041afcf4 x25: 0000000000000000 x24: 0000000000000000
[ 6721.016547] x23: 0000000000000000 x22: ffff000008cbf960 x21: ffff000004972080
[ 6721.023700] x20: 0000000000000183 x19: ffff000004030840 x18: 0000000000000001
[ 6721.030853] x17: 0000000000000000 x16: 0000000000000000 x15: 000aa41b009aa29e
[ 6721.037905] x14: 0000000000000001 x13: 0000000000000001 x12: 0000000000000001
[ 6721.045058] x11: 0000000000000000 x10: 0000000000000003 x9 : 0000000000000000
[ 6721.052211] x8 : ffff000004972128 x7 : 0000000000084000 x6 : 0000000000000800
[ 6721.059263] x5 : ffff000004972138 x4 : dead000000000100 x3 : dead000000000122
[ 6721.066416] x2 : ffff0000040308b8 x1 : 0000000000000000 x0 : ffff000004972120
[ 6721.073570] Call trace:
[ 6721.075989] vchan_tx_submit+0x58/0xd0
[ 6721.079720] stm32_usart_rx_dma_start_or_resume+0xd8/0x234
[ 6721.085162] stm32_usart_runtime_resume+0x78/0xe8
[ 6721.089899] pm_generic_runtime_resume+0x2c/0x44
[ 6721.094535] __genpd_runtime_resume+0x30/0x80
[ 6721.098869] genpd_runtime_resume+0x114/0x29c
[ 6721.103204] __rpm_callback+0x48/0x1d8
[ 6721.106936] rpm_callback+0x6c/0x78
[ 6721.110366] rpm_resume+0x490/0x6b4
[ 6721.113897] pm_runtime_work+0x84/0xc8
[ 6721.117629] process_one_work+0x144/0x29c
[ 6721.121561] worker_thread+0x31c/0x434
[ 6721.125291] kthread+0x110/0x114
[ 6721.128522] ret_from_fork+0x10/0x20
[ 6721.132158] Code: 1a9fc694 b9001074 b8078454 a9478e64 (f9000483)
[ 6721.138196] ---[ end trace 0000000000000000 ]---
[ 6721.142826] note: kworker/0:0[479] exited with irqs disabled
[ 6721.148745] note: kworker/0:0[479] exited with preempt_count 1

Christophe Guibout
ST Employee

Hello @altsir_sga,

We found several bugs in UART driver related to auto-suspend : the fixes will be available in the next Ecosystem-v6.2.1 planned end of June.

BR,

Christophe

 

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hello @altsir_sga , @MGO75 ,
Could you please try to apply the enclosed patch on your Linux kernel, and see if you still face troubles ?

We observed a use after free in the suspend / resume sequence due to a missing lock mechanism. This patch should solve the situation.

Kind regards,
Erwan.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hi Erwan,

Will give it a try and get back to you.

 

Regards,

Alexey.

Hi Erwan,

I have tested this version for a few days now and it seems that it resolves the issue as the crash didn't happen.

 

Thanks!

Alexey.

Hello @altsir_sga ,
That was nice of you to let us know. I am glad to see this situation solved on your side.

Kind regards,
Erwan.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.