2025-11-20 6:28 AM - edited 2025-11-20 6:38 AM
Hi,
we use the STM32MP157C with almost the same RAM layout as the STM32MP157F-EV1, except that the SDRAM chips are from Micron (MT41K256M16). The PMIC is also the same as the one on the evaluation board, and the settings therefore are identical. TF-A, optee, u-boot and kernel are active.
On almost all boards, we see sporadic page faults from the kernel (one in 400 warm starts), distributed across the RAM (all entries from different starts):
[ 0.000000] BUG: Bad page state in process swapper pfn:d0cfc
[ 0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xd0cfc
--
[ 8.840693] BUG: Bad page state in process tar pfn:f77ae
[ 8.840730] page:769addeb refcount:1 mapcount:0 mapping:00000000 index:0x4 pfn:0xf77ae
--
[ 9.664655] BUG: Bad page state in process rc pfn:f70c8
[ 9.664691] page:c59745a8 refcount:1 mapcount:0 mapping:00000000 index:0x4 pfn:0xf70c8
--
[ 0.000000] BUG: Bad page state in process swapper pfn:eace6
[ 0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xeace6
--
[ 7.866187] BUG: Bad page state in process cksum pfn:f2b28
[ 7.866226] page:dc752aeb refcount:1 mapcount:0 mapping:00000000 index:0x1 pfn:0xf2b28
--
[ 0.000000] BUG: Bad page state in process swapper pfn:d23be
[ 0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xd23be
--
[ 0.000000] BUG: Bad page state in process swapper pfn:c7a44
[ 0.000000] page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xc7a44
--
[ 6.143085] BUG: Bad page state in process rc pfn:faf20
[ 6.143121] page:8923cfe9 refcount:1073741824 mapcount:0 mapping:00000000 index:0x4 pfn:0xfaf20
On two boards, we had CRC errors when loading the Fit image into RAM; after a restart, the Fit image booted the system with the same image.
One board is more noticeable than others; when page faults occur there, they are distributed across the RAM during startup (this is one startup). But they also occur only in one of 400 starts:
[ 0.000000] BUG: Bad page state in process swapper pfn:c27a1
[ 0.000000] BUG: Bad page state in process swapper pfn:c27b1
[ 0.000000] BUG: Bad page state in process swapper pfn:c2d91
[ 0.000000] BUG: Bad page state in process swapper pfn:c3978
[ 0.000000] BUG: Bad page state in process swapper pfn:c5311
[ 0.000000] BUG: Bad page state in process swapper pfn:c5321
[ 0.000000] BUG: Bad page state in process swapper pfn:c9f91
[ 0.000000] BUG: Bad page state in process swapper pfn:d0401
[ 0.000000] BUG: Bad page state in process swapper pfn:d08a1
[ 0.000000] BUG: Bad page state in process swapper pfn:d0de1
[ 0.000000] BUG: Bad page state in process swapper pfn:d1261
[ 0.000000] BUG: Bad page state in process swapper pfn:d1fc1
[ 0.000000] BUG: Bad page state in process swapper pfn:d2248
[ 0.000000] BUG: Bad page state in process swapper pfn:d2251
[ 0.000000] BUG: Bad page state in process swapper pfn:d3601
[ 0.000000] BUG: Bad page state in process swapper pfn:d38a1
[ 0.000000] BUG: Bad page state in process swapper pfn:d4238
[ 0.000000] BUG: Bad page state in process swapper pfn:d4db1
[ 0.000000] BUG: Bad page state in process swapper pfn:d9038
[ 0.000000] BUG: Bad page state in process swapper pfn:df691
[ 0.000000] BUG: Bad page state in process swapper pfn:dfcc1
[ 0.000000] BUG: Bad page state in process swapper pfn:e2351
[ 0.000000] BUG: Bad page state in process swapper pfn:e2511
[ 0.076264] BUG: Bad page state in process swapper/0 pfn:f5901
[ 0.077817] BUG: Bad page state in process swapper/0 pfn:f8921
[ 0.079658] BUG: Bad page state in process swapper/0 pfn:fc241
[ 0.080282] BUG: Bad page state in process swapper/0 pfn:fc2b1We checked the timing settings for the Micron chips in CubeMx in the advanced settings, but they corresponded to those of the eval board. The resulting device tree is used by TF-A.
We ran DDRUTIL with no noticeable issues. Even on the more noticable board intensive around RAM-address 0xc27a1000. Google's ‘Stresstestapp’ only shows errors if the kernel has already reported page faults. After restarting the hardware, the errors are gone.
VREF_DDR and DDR_CORE look good, even in the event of an error. At least when measured after the error was detected.
Should we check the register settings for the timings independently of CubeMX? Or is what you set in CubeMX okay?
Does anyone else have any tips or ideas for this problem?
Forgot to mention:
Kernel-Version 6.1.28-rt10
openstlinux-6.1-yocto-mickledore-mp1-v23.06.21
Thanks,
Nico
2025-11-27 7:33 AM
Hello Nico,
Indeed, your description may be linked to a SDRAM robustness issue. Even if you're closed to the STM32MP157F-EV1 layout, MICRON and NANYA have different internal layouts so it could explain.
To be sure of these assumptions, it could be interesting to evaluate how robust is the DDR sub-system on your board.
We propose to run memtester from linux console as below:
root@stm32mp15-eval-42-45-b0:~# memtester 200M 10
memtester version 4.6.0 (32-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 200MB (209715200 bytes)
got 200MB (209715200 bytes), trying mlock ...locked.
Loop 1/10:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
...In this example, all the standard test algorithms are executed within an allocated area of 200MBytes on 10 loops.
This will bring a high level of stress from the DDR point of view, which we can't reach with STM32DDRFW-UTIL tool.
Depending on the results, we'll be able to have more information about the robustness itself, but also in which conditions the errors occur. This could help to tune the DDR settings.
We hope this will help you.
Please keep us informed of the results.
Best Regards
Nicolas LB
2025-12-08 12:19 AM
Hi Nicolas,
thank you for your answer. Just want to keep you informed, we still investigating the issue. Ideas and thoughts are still welcome :)
We have already created several hardware revisions due to minor design errors, and it appears that the first version of the HW is not prone to RAM problems. The memtester here runs without error on 10 loops.
But on newer revision there are problems:
root@xmaster5:/mnt/data# ./memtester 200M 10
memtester version 4.6.0 (32-bit)
Copyright (C) 2001-2020 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).
pagesize is 4096
pagesizemask is 0xfffff000
want 200MB (209715200 bytes)
got 200MB (209715200 bytes), trying mlock ...locked.
Loop 1/10:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : testing 0FAILURE: 0x00000000 != 0x00000001 at offset 0x020b7344.
FAILURE: 0x00000000 != 0x00000001 at offset 0x020b734c.
FAILURE: 0x00000000 != 0x00000001 at offset 0x020b7354.
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : testing 32FAILURE: 0x00000011 != 0x00000010 at offset 0x01ebb04c.
Walking Ones : ok
Walking Zeroes : testing 23In addition, we have encountered problems with the PMIC in the newer revisions. On some boards the PMIC only supplies 0.9V on VDDCORE at startup, but according it's registers should supply 1.2V. ALTERNATE mode should be off, still investigating here.
However, there have been no changes in hardware design on these paths, neither to the RAM nor to the PMIC.
Whether PMIC and RAM are related to each other is also unclear to us at the moment.
And last but not least we are in contact with the manufacturer of the PCB and have now heard that there was a change of it's supplier between the first and newer revisions of the board.
Will keep you informed.
Best regards,
Nico
2025-12-08 5:51 AM
Hello Nico,
Reading your last message make us think that the first thing you have to investigate on is the PMIC sub-system. Observing this 0.9V on VDDCORE is not normal as you can imagine. This could be a layout issue, but also a too high voltage accidently applied to supply the PMIC (maximum without damage is 6V).
When this first part is totally understood, then it will be time to focus on DDR. VDDCORE is a pilar for the DDR sub-system, it has to be at expected value to make DDR sub-system (and its PLL) working correctly.
Your memtester's logs show a weakness on data bit 0 where failures seem present.
It could be interesting to contact the PCB supplier and confirm that data single impedances agree with specifications, i.e. @ 55ohms (+/- 10%).
These are some tracks to follow up, with possible further steps, but the "road" can depend on the progress on your side also within the different investigations you are presently doing.
Please keep us informed yes.
Best regards
Nicolas