STM32MP15 ECO 5.0.0 Kernel boot Unhandled fault: imprecise external abort (0x1c06) [cut here]

debugging · ‎2023-10-11

ECO sources used (not git). Custom board based on STM32MP157AAA3.

Both when using OPTEE in DDR and SRAM this issue occurs.

The kernel reports an "cut here" and then reports:

[ 0.273935] Unhandled fault: imprecise external abort (0x1c06) at 0x97638f7a

This seems to be a memory access violation. The 0x9 memory location seems to be in I/O space but information is hard to find. One slide from ST refers to 0x9000000-0x9FFFFFF as "STM". Why would the kernel attempt to access this space. More even curious, why for does this happens just after or at the UART ?

Log attached.

0.236358] Serial: AMBA PL011 UART driver
[ 0.243557] stm32-pm-domain pm_domain: domain core-ret-power-domain registered
[ 0.250937] stm32-pm-domain pm_domain: subdomain core-power-domain registered
[ 0.258122] stm32-pm-domain pm_domain: domains probed
[ 0.270843] 8<--- cut here ---
[ 0.273935] Unhandled fault: imprecise external abort (0x1c06) at 0x97638f7a
[ 0.281026] [97638f7a] *pgd=00000000
[ 0.284637] Internal error: : 1c06 [#1] PREEMPT SMP ARM

According to this article it could be related running tow cores. To prevent this, modified the extlinux.config to only run one core at boot time. In the log can be seen that only one CPU is enabled. Though the error still occurs.

https://developer.toradex.com/software/real-time/freertos/freertos-on-the-cortex-m4-of-a-colibri-imx7/

This link mentions the same error but it's quite some time ago: https://bootlin.com/blog/building-a-linux-system-for-the-stm32mp1-implementing-factory-flashing/

Currently mainline kernels don’t work when using TFA because some of the clocks are only accessible from the secure world. The ST kernel has patches for this but they haven’t made it into mainline yet.
This manifests itself as a hang after booting kernel but if you enable debug_ll and early printk you can see this:

[ 0.000000] 8<— cut here —
[ 0.000000] Unhandled fault: imprecise external abort (0x1c06) at 0xd3eab612

when googling, found this link ,https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=&ved=2ahUKEwjdrrmAoe6BAxVD-WEKHUGVAsAQFnoECAwQAQ&url=https%3A%2F%2Fcommunity.st.com%2Fysqtg83639%2Fattachments%2Fysqtg83639%2Fstm32-mpu-products-forum%2F10225%2F1%2Fbootlog.txt&usg=AOvVaw3u2ssj7Sv1iDmDjb-H3TZx&opi=89978449 on Sep 19th from user ysqtg83639 , with the same error, but can't find the post.

The error refers to a Register r3 information: NULL pointer, is this a bug in the ST code ?

ARM's explanation is very "encouraging". It means that the address may have no relation to the errors.So "something happened but we do not know" error.

debugging · ‎2023-10-13

Well I just came back here to note the reason and close this topic. Friday 13th seems to be ain't bad at all.

It was an incorrect include file for scmi. Even though it was the wrong include file, the compilation of uboot/kernel went fine. I can't advise where yours is because the file include structure CubeMX generates is light-years different from ST's boards such as the 157A-DK1 files (where I found the hint). I copied the scmi file one from the DK1 board, but not sure if that is sufficient, the kernel boot process seems to bother with a lof of I/O peripherals, such as Ethernet CAN etc..) in the scmi that I have not enabled ("okay'). but At least the kernel booted a bit further . Advise on this from ST would really help how to configure SCMI for custom boards, for the MPU and the M4 as well .

Perhaps your problem is not adding the new features to SCMI (just a wild guess)

Skip the rant if preferred...

<rant> IMHO, any issue on the community 2 persons or more report should qualify for a knowledge base list on ST's site with possible causes (and solutions where possible). now that it makes two of us, this issue should be a candidate.

This took me at least 80 working hours, if that was for a company, it would be bankrupt by now after 1 1/2 months still not able to get the to console prompt. if I had placed an order for 25.000 MPU's it probably would have taken less than a day... But here it is, free or charge for the community, we are the product (by posting added value information and solutions free of charge), not the customer....

PM me if you need help, but I am no expert. just wasting my time getting a board up and working.... the time I could have used for hundreds of other things on the laundry list.

</rant>

Now I am stuck at this:

[2.756089] Warning: unable to open an initial console.
[ 2.765557] Freeing unused kernel image (initmem) memory: 2048K
[ 2.809249] Run /init as init process
[ 13.394599] platform 50000000.rcc: deferred probe pending
[ 13.400134] platform 5000d000.interrupt-controller: deferred probe pending
[ 13.407063] platform 48002000.dma-router: deferred probe pending
[ 13.413187] platform soc:pinctrl@50002000: deferred probe pending
[ 13.419346] platform soc:pinctrl@54004000: deferred probe pending
[ 13.425474] platform 48000000.dma-controller: deferred probe pending
[ 13.431881] platform 58000000.dma-controller: deferred probe pending
[ 13.438271] platform 5a006000.usbphyc: deferred probe pending
[ 13.444066] platform 50001000.pwr: deferred probe pending
[ 13.449524] platform 5800c000.usb: deferred probe pending
[ 13.454956] platform 40010000.serial: deferred probe pending
[ 13.460698] platform 49000000.usb-otg: deferred probe pending
[ 13.466480] platform 5c004000.rtc: deferred probe pending
[ 13.471937] platform stm32-cpufreq: deferred probe pending
[ 13.477455] amba 58005000.mmc: deferred probe pending
[ 13.482549] amba 58007000.mmc: deferred probe pending
[ 13.487629] platform leds: deferred probe pending

View solution in original post

damien1 · ‎2023-10-12

I have just run into the same problem seemingly out of the blue.

My custom board has been booting and working for weeks. I made a few small changes to the device tree enabling timers and pwms and all of a sudden ran into this error.

I've reverted my device tree changes but it has not fixed the issue.

The only other things I changed was the kernel defconfig to enable IIO Software Triggering for the ADC I'm using.

I archived a set of working binaries from last week and that image still boots. Will post again if I can spot the differences between the two.

debugging · ‎2023-10-13

Well I just came back here to note the reason and close this topic. Friday 13th seems to be ain't bad at all.

It was an incorrect include file for scmi. Even though it was the wrong include file, the compilation of uboot/kernel went fine. I can't advise where yours is because the file include structure CubeMX generates is light-years different from ST's boards such as the 157A-DK1 files (where I found the hint). I copied the scmi file one from the DK1 board, but not sure if that is sufficient, the kernel boot process seems to bother with a lof of I/O peripherals, such as Ethernet CAN etc..) in the scmi that I have not enabled ("okay'). but At least the kernel booted a bit further . Advise on this from ST would really help how to configure SCMI for custom boards, for the MPU and the M4 as well .

Perhaps your problem is not adding the new features to SCMI (just a wild guess)

Skip the rant if preferred...

<rant> IMHO, any issue on the community 2 persons or more report should qualify for a knowledge base list on ST's site with possible causes (and solutions where possible). now that it makes two of us, this issue should be a candidate.

This took me at least 80 working hours, if that was for a company, it would be bankrupt by now after 1 1/2 months still not able to get the to console prompt. if I had placed an order for 25.000 MPU's it probably would have taken less than a day... But here it is, free or charge for the community, we are the product (by posting added value information and solutions free of charge), not the customer....

PM me if you need help, but I am no expert. just wasting my time getting a board up and working.... the time I could have used for hundreds of other things on the laundry list.

</rant>

Now I am stuck at this:

[2.756089] Warning: unable to open an initial console.
[ 2.765557] Freeing unused kernel image (initmem) memory: 2048K
[ 2.809249] Run /init as init process
[ 13.394599] platform 50000000.rcc: deferred probe pending
[ 13.400134] platform 5000d000.interrupt-controller: deferred probe pending
[ 13.407063] platform 48002000.dma-router: deferred probe pending
[ 13.413187] platform soc:pinctrl@50002000: deferred probe pending
[ 13.419346] platform soc:pinctrl@54004000: deferred probe pending
[ 13.425474] platform 48000000.dma-controller: deferred probe pending
[ 13.431881] platform 58000000.dma-controller: deferred probe pending
[ 13.438271] platform 5a006000.usbphyc: deferred probe pending
[ 13.444066] platform 50001000.pwr: deferred probe pending
[ 13.449524] platform 5800c000.usb: deferred probe pending
[ 13.454956] platform 40010000.serial: deferred probe pending
[ 13.460698] platform 49000000.usb-otg: deferred probe pending
[ 13.466480] platform 5c004000.rtc: deferred probe pending
[ 13.471937] platform stm32-cpufreq: deferred probe pending
[ 13.477455] amba 58005000.mmc: deferred probe pending
[ 13.482549] amba 58007000.mmc: deferred probe pending
[ 13.487629] platform leds: deferred probe pending

damien1 · ‎2023-10-16

I got to the bottom of the cause of my kernel hang.

I had started adding the device tree nodes to enable the M4 co processor. This involved re-assigning SPI3 from the A7 to the M4.

Turns out whilst I had set spi device correctly

&spi3 { status="disabled"; }; 
&m4_spi3 { status="okay"; };

I had not removed the conflicting GPIO assignments. When the pinctrl node for the SPI3 pins in Linux was parsed it caused the violation and system fault.

Properly deconflicting peripheral accesses and GPIO accesses has my system booting again.

debugging · ‎2023-10-16

Very interesting, the same error with a totally different reason ! Perhaps a way to prevent this is to only make changes using the USER sections of a DT and then let CubeMX make M7/M4 changes and then apply those to the DTs ?