cancel
Showing results for 
Search instead for 
Did you mean: 

Boot failure at BL2 or OP-TEE. Maybe wrong DDR Tuning or not?

Adam_Carpenter
Associate II

Hi,

We are building Linux for STM32MP157D using Buildroot, and we are experiencing an issue where the boot process sometimes does not proceed from BL2 to OP-TEE, and sometimes OP-TEE does not proceed to U-Boot.
Occasionally, the kernel does boot successfully.

Could this be caused by incorrect DDR3 controller settings? We are using a custom board.

If so, would compiling the STM32DDRFW-UTIL source code and using it in engineering mode be the right solution?
I have read that for some chips it is possible to perform automatic tuning and receive the optimal configuration as a result.
Is this possible in this case? If not, what is the proper method to determine the correct DDR controller settings?

If DDR is not the issue, do you have any other suggestions?

For testing with DDR-UTIL, I see that there is no development board listed for the 157D series, as the utility supports "STM32MP157C-EV1_DDR_UTILITIES_A7".
If we order the "STM32MP157D-EV1", will I be able to run DDR tests on it to compare with our custom board, or do we need to order the "STM32MP157C-EV1" because only that one is compatible with DDR-UTIL?

I am attaching 3 log files. One for the BL2 stop, one for the OP-TEE stop, and one for a successful kernel boot.

Thanks in advance,
Adam

https://wiki.st.com/stm32mpu/wiki/STM32DDRFW-UTIL

1 ACCEPTED SOLUTION

Accepted Solutions

Hello,

We found the problem :)

TL;DR
It was a wrong DTS configuration in optee.dts.

The RTC was enabled:

 
&rtc{
status = "okay";
};​

 

But the clock source was configured incorrectly.
There is an LSE crystal on the PCB, however the oscillator was configured as external oscillator (bypass mode).

Wrong configuration:

 

clk_lse: clk-lse {
clock-frequency = <32768>;
st,css;
st,bypass;
}

Good config:

clk_lse: clk-lse {
clock-frequency = <32768>;
st,drive = < LSEDRV_MEDIUM_HIGH >;
}


Longer explanation

 

The main misleading symptom (which I already mentioned in my first post) was that sometimes Linux booted successfully, sometimes the boot stopped in OP-TEE.

There was a reason for this.

We had a kernel that was built by the PCB designer. That kernel always booted, but we had no sources for it — only an SD card. It was basically a test kernel.

If the board booted once with this kernel, then our own kernel usually booted as well afterwards.

I started probing the LSE with an oscilloscope and noticed the following:

  • With the “foreign” kernel, I always measured a sine wave on LSE
  • With our kernel, I did not always see it

Because the STM32MP157 has no internal flash, no persistent configuration should survive a power cycle.

However, the PCB had a supercapacitor connected to VBAT, which could preserve data.
I even desoldered it — but the sine wave was still present.

The stdout (serial0) is connected to UART4 of the STM32MP157 via a CH340 USB‑UART converter.
The CH340 keeps its RX and TX lines pulled up to VCC, because that is the UART idle level.

This idle voltage internally fed the VBAT domain of the STM32MP1 (most likely through protection diodes), thereby keeping the LSE oscillator configuration alive.

So even if I unplugged the main power supply from the PCB:

  • either the supercapacitor
  • or the USB cable

was enough to preserve the correct LSE configuration that had been set up by the foreign kernel.

When, after a “good” kernel was boot successfully, our wrongly configured kernel was started, two situations were possible:

  • If the supercapacitor (0.22 F) was not yet discharged or the USB cable was plugged in → our Linux booted
  • If enough time passed, the supercapacitor was discharged and the USB cable was unplugged → our Linux did not boot

Our wrong kernel config did not overwrite the good "backuped" lse clock config.

That was the magic... 1 month of painful debugging.

View solution in original post

10 REPLIES 10
PatrickF
ST Employee

Hi @Adam_Carpenter ,

running DDRFW-UTIL on your custom board is a mandatory step to confirm all tests are OK.
Not sure having an ST reference board will help to find the potential issue on your board.
You could try to use the STM32MP157C-EV1_DDR_UTILITIES_A7 binary if your HW is very close (A/C/D/F suffix does not hurst as this SW is not using crypto and work at 650MHz), otherwise worth to recompile your own binary aligned with your HW (e.g. UART port, etc..).


Btw, when Linux succeed to boot, did memtester give some results ?

 

Strange that you did not have any error messages when crash.

Usually corrupted DDR data should end up to some early panic or else error messages.

 

Did you check if supplies are correct, stable and does not fade out ? Did you use STPMIC1 ?

 

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Tip of the day: Try Sidekick STM32 AI agent, see here

Dear Patrick,

I managed to run the DDR-UTIL tests.
I opened the STM32Mp157C-EVAL example project, removed the dev board–specific dependencies from the project, and then ran the interactive mode in engineering mode.

I executed the "test all" command, and since the 1 GByte memory is composed of two DDR3 chips, I ran the tests on a 16 kByte region within the address range of both chips.
One at address 0xC0000000, and one at 0xFF000000.

They both passed successfully; I am attaching the logs.

I set the OP-TEE logging level to verbose mode, and I was able to capture logs for both successful and unsuccessful boot cases.

Differences are only visible during HSI and CSI measurements.

Interestingly, in one case the boot process just suddenly stops…

I attach the two verbose logs, and theDDR test.

Thanks,
Adam

Worth to test whole addressable DDR range (1GBytes) to exclude any unwanted hole or shadowing in the addressable space. 

Maybe also check again your schematics (e.g. DQS swapping wrt bytes, any other mistakes, etc...). Maybe share it here (or private message) as I might have a look into.

On STM32MP1 series, our experience show that DDR3L signal integrity is quite robust and issues often come from visible big mistakes.

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Tip of the day: Try Sidekick STM32 AI agent, see here

Dear Patrick,

I am currently running the tests on 2 x 512 MBytes.
(test 0 1 536870912 0xC0000000) and (test 0 1 536870912 0xE0000000)
So far, it has passed tests 1-10. It’s taking several hours...

result 1: Test Simple DataBus = Passed
result 2: Test DataBusWalking0 = Passed
result 3: Test DataBusWalking1 = Passed
result 4: Test AddressBus = Passed
result 5: Test MemDevice = Passed
result 6: Test SimultaneousSwitchingOutput = Passed
result 7: Test Noise = Passed
result 8: Test NoiseBurst = Passed
result 9: Test Random = Passed
result 10: Test FrequencySelectivePattern = Passed

I noticed two things:

1.
There is an error log that stands out to me, which appears both during successful and failed boot:
"E/TC:0 0 check_rcc_secure_configuration:628 Error RCC TZEN=0 MCKPROT=1 and RNG1 (12) secure"

This is logged by the OP-TEE function check_rcc_secure_configuration().
This might not be correct, even if Linux sometimes still boots.

2.
This function is the last one executed in OP-TEE during a failed boot:

void __weak boot_init_primary_late(unsigned long fdt, unsigned long manifest)

This function is no longer executed when boot is fail:
static void tee_entry_exchange_capabilities(struct thread_smc_args *args)

I am now trying to find a connection between these things...

I am attaching the schematic. It is a general schematic and does not contain any private information.
R1 has been changed to 120k to set VDD_CORE to 1.32 V.

Thank you,
Adam

Thanks for the schematics.

 

only one comments after a quick check:

  • missing DDR Address/command terminations to VTT. When using two devices, it is strongly recommended to have them unless you carefully make your own signal integrity analysis. Please refer to AN5122 and AN5031.
    But this might be seen as data corruption which is unlikely to just stop without any message.

 

I cannot answer for the OpTee message, but unlikely the root cause. 

Still puzzling to have just stop working randomly without any message, maybe one buck output supply shutdown or the 24MHz clock is stopping.

Are you sure about the STM32MP157DAA1 library? If build on your side, please carefully check there is no ball number mismatch Vs Datasheet.

Regards.

 

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
Tip of the day: Try Sidekick STM32 AI agent, see here

Hello,

I have determined that the error message reported by the check_rcc_secure_configuration() function —
"Error RCC TZEN=0 MCKPROT=1 and RNG1 (12) secure" — appears because the Random Number Generator (RNG) was defined as a secure resource, while the TZEN bit of the TrustZone control register was set to 0 at the time of the check.

I conclude that this condition cannot be the cause of the boot process stopping.

OP‑TEE uses the RNG, therefore it cannot be configured as non‑secure.

I believe the next step is to determine why execution does not continue past the boot_init_primary_late() function.


Hello,

 

The "RCC TZEN=0 MCKPROT=1 and RNG1 (12) secure error" error was resolved by modifying the .dts file used by op-tee:

- inserting: "compatible = "st,stm32mp1-rcc-secure";" into the rcc settings

- and enabling rng1 with:

&rng1{
status = "okay"; //AAJ-II
};

We continue troubleshooting...

Hello,

After all error messages disappeared from the boot log, the following error still occasionally appears:

I/TC: Primary CPU switching to normal world boot
E/TC:0   tzc_it_handler:80 TZC permission failure
E/TC:0   dump_fail_filter:420 Permission violation on filter 0
E/TC:0   dump_fail_filter:425 Violation @0xfffffe8a, non-secure privileged write, AXI ID 420
E/TC:0   Panic

 or:

E/TC:0   tzc_it_handler:80 TZC permission failure
E/TC:0   dump_fail_filter:420 Permission violation on filter 0
E/TC:0   dump_fail_filter:425 Violation @0xfffffa80, non-secure privileged read, AXI ID 400
E/TC:0   Panic

 

What is even more interesting is that if Linux boots successfully with the factory kernel, then we switch the boot pins and reset the board using the reset button, Linux also boots successfully afterwards with our own custom kernel.

Could it be that TF‑A or OP‑TEE is using something from the Backup RAM that we are not defining or initializing correctly?

Thanks,
Adam

Hello,

We found the problem :)

TL;DR
It was a wrong DTS configuration in optee.dts.

The RTC was enabled:

 
&rtc{
status = "okay";
};​

 

But the clock source was configured incorrectly.
There is an LSE crystal on the PCB, however the oscillator was configured as external oscillator (bypass mode).

Wrong configuration:

 

clk_lse: clk-lse {
clock-frequency = <32768>;
st,css;
st,bypass;
}

Good config:

clk_lse: clk-lse {
clock-frequency = <32768>;
st,drive = < LSEDRV_MEDIUM_HIGH >;
}


Longer explanation

 

The main misleading symptom (which I already mentioned in my first post) was that sometimes Linux booted successfully, sometimes the boot stopped in OP-TEE.

There was a reason for this.

We had a kernel that was built by the PCB designer. That kernel always booted, but we had no sources for it — only an SD card. It was basically a test kernel.

If the board booted once with this kernel, then our own kernel usually booted as well afterwards.

I started probing the LSE with an oscilloscope and noticed the following:

  • With the “foreign” kernel, I always measured a sine wave on LSE
  • With our kernel, I did not always see it

Because the STM32MP157 has no internal flash, no persistent configuration should survive a power cycle.

However, the PCB had a supercapacitor connected to VBAT, which could preserve data.
I even desoldered it — but the sine wave was still present.

The stdout (serial0) is connected to UART4 of the STM32MP157 via a CH340 USB‑UART converter.
The CH340 keeps its RX and TX lines pulled up to VCC, because that is the UART idle level.

This idle voltage internally fed the VBAT domain of the STM32MP1 (most likely through protection diodes), thereby keeping the LSE oscillator configuration alive.

So even if I unplugged the main power supply from the PCB:

  • either the supercapacitor
  • or the USB cable

was enough to preserve the correct LSE configuration that had been set up by the foreign kernel.

When, after a “good” kernel was boot successfully, our wrongly configured kernel was started, two situations were possible:

  • If the supercapacitor (0.22 F) was not yet discharged or the USB cable was plugged in → our Linux booted
  • If enough time passed, the supercapacitor was discharged and the USB cable was unplugged → our Linux did not boot

Our wrong kernel config did not overwrite the good "backuped" lse clock config.

That was the magic... 1 month of painful debugging.