cancel
Showing results for 
Search instead for 
Did you mean: 

STM32MP157C sporadic page faults / RAM issues

NicoH
Visitor

Hi,

we use the STM32MP157C with almost the same RAM layout as the STM32MP157F-EV1, except that the SDRAM chips are from Micron (MT41K256M16). The PMIC is also the same as the one on the evaluation board, and the settings therefore are identical. TF-A, optee, u-boot and kernel are active.

On almost all boards, we see sporadic page faults from the kernel (one in 400 warm starts), distributed across the RAM (all entries from different starts):

[    0.000000] BUG: Bad page state in process swapper  pfn:d0cfc
[    0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xd0cfc
--
[    8.840693] BUG: Bad page state in process tar  pfn:f77ae
[    8.840730] page:769addeb refcount:1 mapcount:0 mapping:00000000 index:0x4 pfn:0xf77ae
--
[    9.664655] BUG: Bad page state in process rc  pfn:f70c8
[    9.664691] page:c59745a8 refcount:1 mapcount:0 mapping:00000000 index:0x4 pfn:0xf70c8
--
[    0.000000] BUG: Bad page state in process swapper  pfn:eace6
[    0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xeace6
--
[    7.866187] BUG: Bad page state in process cksum  pfn:f2b28
[    7.866226] page:dc752aeb refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0xf2b28
--
[    0.000000] BUG: Bad page state in process swapper  pfn:d23be
[    0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xd23be
--
[    0.000000] BUG: Bad page state in process swapper  pfn:c7a44
[    0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xc7a44
--
[    6.143085] BUG: Bad page state in process rc  pfn:faf20
[    6.143121] page:8923cfe9 refcount:1073741824 mapcount:0 mapping:00000000 index:0x4 pfn:0xfaf20


On two boards, we had CRC errors when loading the Fit image into RAM; after a restart, the Fit image booted the system with the same image.

One board is more noticeable than others; when page faults occur there, they are distributed across the RAM during startup (this is one startup). But they also occur only in one of 400 starts:

[    0.000000] BUG: Bad page state in process swapper  pfn:c27a1
[    0.000000] BUG: Bad page state in process swapper  pfn:c27b1
[    0.000000] BUG: Bad page state in process swapper  pfn:c2d91
[    0.000000] BUG: Bad page state in process swapper  pfn:c3978
[    0.000000] BUG: Bad page state in process swapper  pfn:c5311
[    0.000000] BUG: Bad page state in process swapper  pfn:c5321
[    0.000000] BUG: Bad page state in process swapper  pfn:c9f91
[    0.000000] BUG: Bad page state in process swapper  pfn:d0401
[    0.000000] BUG: Bad page state in process swapper  pfn:d08a1
[    0.000000] BUG: Bad page state in process swapper  pfn:d0de1
[    0.000000] BUG: Bad page state in process swapper  pfn:d1261
[    0.000000] BUG: Bad page state in process swapper  pfn:d1fc1
[    0.000000] BUG: Bad page state in process swapper  pfn:d2248
[    0.000000] BUG: Bad page state in process swapper  pfn:d2251
[    0.000000] BUG: Bad page state in process swapper  pfn:d3601
[    0.000000] BUG: Bad page state in process swapper  pfn:d38a1
[    0.000000] BUG: Bad page state in process swapper  pfn:d4238
[    0.000000] BUG: Bad page state in process swapper  pfn:d4db1
[    0.000000] BUG: Bad page state in process swapper  pfn:d9038
[    0.000000] BUG: Bad page state in process swapper  pfn:df691
[    0.000000] BUG: Bad page state in process swapper  pfn:dfcc1
[    0.000000] BUG: Bad page state in process swapper  pfn:e2351
[    0.000000] BUG: Bad page state in process swapper  pfn:e2511
[    0.076264] BUG: Bad page state in process swapper/0  pfn:f5901
[    0.077817] BUG: Bad page state in process swapper/0  pfn:f8921
[    0.079658] BUG: Bad page state in process swapper/0  pfn:fc241
[    0.080282] BUG: Bad page state in process swapper/0  pfn:fc2b1



We checked the timing settings for the Micron chips in CubeMx in the advanced settings, but they corresponded to those of the eval board. The resulting device tree is used by TF-A.
We ran DDRUTIL with no noticeable issues. Even on the more noticable board intensive around RAM-address  0xc27a1000. Google's ‘Stresstestapp’ only shows errors if the kernel has already reported page faults. After restarting the hardware, the errors are gone.
VREF_DDR and DDR_CORE look good, even in the event of an error. At least when measured after the error was detected.

Should we check the register settings for the timings independently of CubeMX? Or is what you set in CubeMX okay?
Does anyone else have any tips or ideas for this problem?

 

 

 

Forgot to mention:
Kernel-Version 6.1.28-rt10
openstlinux-6.1-yocto-mickledore-mp1-v23.06.21

 


Thanks,

Nico

0 REPLIES 0