cancel
Showing results for 
Search instead for 
Did you mean: 

STM32MP157C: How to increase the M4 image size

rhaberkorn
Associate III

Hello,

I am running a Linux image based on meta-st-stm32mp (hardknott branch).

I currently have the following in my host device tree:

reserved-memory {
#address-cells = <1>;
#size-cells = <1>;
ranges;

mcuram2: mcuram2@10000000 {
compatible = "shared-dma-pool";
reg = <0x10000000 0x40000>;
no-map;
};

vdev0vring0: vdev0vring0@10040000 {
compatible = "shared-dma-pool";
reg = <0x10040000 0x1000>;
no-map;
};

vdev0vring1: vdev0vring1@10041000 {
compatible = "shared-dma-pool";
reg = <0x10041000 0x1000>;
no-map;
};

vdev0buffer: vdev0buffer@10042000 {
compatible = "shared-dma-pool";
reg = <0x10042000 0x4000>;
no-map;
};

mcuram: mcuram@30000000 {
compatible = "shared-dma-pool";
reg = <0x30000000 0x40000>;
no-map;
};

retram: retram@38000000 {
compatible = "shared-dma-pool";
reg = <0x38000000 0x10000>;
no-map;
};

gpu_reserved: gpu@d4000000 {
reg = <0xd4000000 0x4000000>;
no-map;
};
};

mlahb: ahb {
compatible = "st,mlahb", "simple-bus";
#address-cells = <1>;
#size-cells = <1>;
ranges;
dma-ranges = <0x00000000 0x38000000 0x10000>,
<0x10000000 0x10000000 0x60000>,
<0x30000000 0x30000000 0x60000>;

m4_rproc: m4@10000000 {
compatible = "st,stm32mp1-m4";
reg = <0x10000000 0x40000>,
<0x30000000 0x40000>,
<0x38000000 0x10000>;
/* ... */
};
};

mlahb is from the stm32mp151.dtsi.
Let's say, we'd like to increase the allowed image size to 128kb by changing 0x10000 to 0x20000:

&retram {
reg = <0x38000000 0x20000>;
};

&mlahb {
dma-ranges = <0x00000000 0x38000000 0x20000>,
<0x10000000 0x10000000 0x60000>,
<0x30000000 0x30000000 0x60000>;

m4_rproc {
reg = <0x10000000 0x40000>,
<0x30000000 0x40000>,
<0x38000000 0x20000>;
};
};

Unfortunately, when trying to load a firmware (echo start > /sys/class/remoteproc/remoteproc0/state),
I still get a kernel crash:

[  376.591779] 8<--- cut here ---
[ 376.593848] Unhandled fault: imprecise external abort (0x1c06) at 0x004627a4
[ 376.600885] pgd = fc518049
[ 376.603571] [004627a4] *pgd=c5ef0835, *pte=00000000, *ppte=00000000
[ 376.609836] Internal error: : 1c06 [#1] PREEMPT SMP ARM
[ 376.615042] Modules linked in: rpmsg_tty rpmsg_core etnaviv gpu_sched spi_stm32 stm32_rproc sch_fq_codel ipv6
[ 376.624955] CPU: 1 PID: 298 Comm: sh Not tainted 5.10.10 #1
[ 376.630507] Hardware name: STM32 (Device Tree Support)
[ 376.635648] PC is at memcpy+0x54/0x330
[ 376.639374] LR is at 0x6263000a
[ 376.642499] pc : [<c060d914>] lr : [<6263000a>] psr: 20000013
[ 376.648753] sp : c5cfbe2c ip : 2172656c fp : e0d20000
[ 376.653965] r10: 00000000 r9 : 00000000 r8 : 646e6168
[ 376.659180] r7 : 206b6361 r6 : 626c6c61 r5 : 63206f4e r4 : 09007265
[ 376.665696] r3 : 6c646e61 r2 : 000011b4 r1 : e0bc9220 r0 : e0d30120
[ 376.672213] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 376.679336] Control: 10c5387d Table: c5ea006a DAC: 00000051
[ 376.685072] Process sh (pid: 298, stack limit = 0x86cbec92)
[ 376.690629] Stack: (0xc5cfbe2c to 0xc5cfc000)
[ 376.694979] be20: 00011354 00011354 00000001 00000000 e0d20000
[ 376.703147] be40: e0bb9054 c0a0a584 00011354 10041000 00000000 00000000 00000000 c01175f4
[ 376.711314] be60: 00000100 00000000 00000006 00000001 00000020 c34cd800 c5ea7cc0 c121f270
[ 376.719482] be80: e0bb9000 c34cd820 c0a06c88 c34cd800 c34cd800 c34cd820 c5ea7cc0 c35c2840
[ 376.727649] bea0: 00000000 00000000 00000000 c0be8e54 00000000 c34cd800 c5ea7cc0 c34cd820
[ 376.735817] bec0: c35c2840 c0be9380 c34cd800 c34cd9e8 00000000 c34cd9f4 c34cd820 c0a07f78
[ 376.743984] bee0: c5ea7cc0 3a12e441 00000cc0 c34cd820 c5ea7a80 00000006 c34cd800 c5ea6610
[ 376.752151] bf00: 00000051 c0a097bc 00000006 c5ea6600 c5ea7a80 c5cfbf80 c5ea6610 c03a3c48
[ 376.760317] bf20: 00000000 00000000 0000000b c3337c00 00000006 c03a3b50 00534fb0 c5cfbf80
[ 376.768484] bf40: c2de3c00 00000004 0052f2f0 c0302348 c316d900 3a12e441 c316d900 00000000
[ 376.776650] bf60: c1894900 c3337c00 c3337c00 00000000 00000000 c0100264 c5cfa000 c03026e8
[ 376.784816] bf80: 00000000 00000000 c5cfa000 3a12e441 00000003 00000006 00534fb0 b6f6c1e0
[ 376.792984] bfa0: 00000004 c0100060 00000006 00534fb0 00000001 00534fb0 00000006 00000000
[ 376.801152] bfc0: 00000006 00534fb0 b6f6c1e0 00000004 00000000 bee1b7f0 00534fb0 0052f2f0
[ 376.809318] bfe0: 00000004 bee1b7a0 b6e7135f b6dfd386 60000030 00000001 00000000 00000000
[ 376.817517] [<c060d914>] (memcpy) from [<c0a0a584>] (rproc_elf_load_segments+0x1a0/0x288)
[ 376.825669] [<c0a0a584>] (rproc_elf_load_segments) from [<c0be8e54>] (rproc_start+0x24/0x154)
[ 376.834171] [<c0be8e54>] (rproc_start) from [<c0be9380>] (rproc_fw_boot+0x170/0x1a4)
[ 376.841898] [<c0be9380>] (rproc_fw_boot) from [<c0a07f78>] (rproc_boot+0x150/0x1a4)
[ 376.849541] [<c0a07f78>] (rproc_boot) from [<c0a097bc>] (state_store+0x40/0xc8)
[ 376.856841] [<c0a097bc>] (state_store) from [<c03a3c48>] (kernfs_fop_write+0xf8/0x21c)
[ 376.864750] [<c03a3c48>] (kernfs_fop_write) from [<c0302348>] (vfs_write+0xc0/0x318)
[ 376.872481] [<c0302348>] (vfs_write) from [<c03026e8>] (ksys_write+0x60/0xe4)
[ 376.879603] [<c03026e8>] (ksys_write) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[ 376.887237] Exception stack(0xc5cfbfa8 to 0xc5cfbff0)
[ 376.892282] bfa0: 00000006 00534fb0 00000001 00534fb0 00000006 00000000
[ 376.900451] bfc0: 00000006 00534fb0 b6f6c1e0 00000004 00000000 bee1b7f0 00534fb0 0052f2f0
[ 376.908613] bfe0: 00000004 bee1b7a0 b6e7135f b6dfd386
[ 376.913659] Code: f5d1f07c e8b151f8 e2522020 e8a051f8 (aafffffa)
[ 376.919737] ---[ end trace e8ef0d82ecc3eec4 ]---
[ 376.925593] 8<--- cut here ---
[ 376.927384] Unhandled fault: imprecise external abort (0x1c06) at 0x004627a4
[ 376.934422] pgd = 6e154e4d
[ 376.937107] [004627a4] *pgd=c5c11835, *pte=00000000, *ppte=00000000

This only happens if the actual firmware image has a code+data section larger than 64kb.
Does anybody have any clue what I am missing in the device tree?
I should also try to get a proper backtrace of that crash...

What's the hard limit for M4 firmwares?
In this comment, @PatrickF  said that it's at most 448kb:
https://community.st.com/t5/stm32-mpus-products/possible-advisable-to-increase-the-size-of-m4-coprocessor-text/m-p/51879/highlight/true#M52

What's colliding with the RETRAM section or is it because the M4 core won't execute directly from
RETRAM, but copies code into one of the mcuram regions?
See https://wiki.st.com/stm32mpu/wiki/STM32MP15_MCU_SRAM_internal_memory
Is it the mcuram region that I have to increase?

Yours sincerely,
Robin Haberkorn

7 REPLIES 7
rhaberkorn
Associate III

m4_rproc is a label, so the device should instead be:

&retram {
	reg = <0x38000000 0x20000>;
};

&mlahb {
	dma-ranges = <0x00000000 0x38000000 0x20000>,
	             <0x10000000 0x10000000 0x60000>,
	             <0x30000000 0x30000000 0x60000>;
};

&m4_rproc {
	reg = <0x10000000 0x40000>,
	      <0x30000000 0x40000>,
	      <0x38000000 0x20000>;

	m4_system_resources {
		status = "okay";
	};
};

I also added it to u-boot's device tree, just to be on the safe side. Unfortunately, I am still getting exactly the same error (memory violation?) when loading a ~70kb ELF image.

ArnaudP
ST Employee

Hello,

As described in STM32MP15 RAM mapping the MCU RETRAM is a 64 kB physical RAM memory. Your update in the Linux device tree corresponds to the support of a 128 kB RETRAM, that lead to access to invalid address.

To extend the Cortex-M4 image size, you should only need to update your Cortex-M4 firmware linker script to use the MCU SRAMs

A simple way is to reuse this linker script configuration
https://github.com/STMicroelectronics/STM32CubeMP1/blob/master/Projects/STM32MP157C-DK2/Applications/OpenAMP/OpenAMP_for_signed_fw/STM32CubeIDE/CM4/STM32MP157CAAX_RAM.ld

It stores:

  • the vector table in the RETRAM (mandatory as the Cortex-M4 boot address is the RETRAM start address) 
  • the code in the MCUSRAM 1 (0x10000000 to 0x1001FFFC),
  • the data in the MCUSRAM 2 (0x10020000 to 0x1002FFFC),

 

rhaberkorn
Associate III

I see. I was somehow hoping that the 64kb RETRAM size is arbitrary and can be extended.

So to effectively get more than 64kb of code into the image, you would have to move code into the MCUSRAMx sections without copying, ie. the ELF loader will have to load code directly into these areas. On Zephyr I believe we can achieve this with inplace execution (XIP), NOCOPY code relocation and custom linker scripts. Not trivial, but doable.

I hope we can execute directly from RETRAM as well, so its 64kb are not "lost".

rhaberkorn
Associate III

In principle, I think I got it to link properly. But I still get crashes once I actually try to call a function that's output by the linker into the mcusram section. I can confirm (objdump -d) that the code is emitted correctly and it's supposed to be loaded into a 64k section, beginning at 0x10000000. elfdump -p on my firmware file has an appropriate section:

entry: 4
        p_type: PT_LOAD
        p_offset: 62544
        p_vaddr: 0x10000000
        p_paddr: 0x10000000
        p_filesz: 4096
        p_memsz: 4372
        p_flags: PF_X|PF_W|PF_R
        p_align: 8

I consciously link only a few functions into this section. The remoteproc driver does appear to load this section properly as my debug kernel log confirms:

[ 6122.454631] remoteproc remoteproc0: phdr: type 1 da 0x0 memsz 0xf09c filesz 0xf09c
[ 6122.454660] remoteproc remoteproc0: ptr 5572fed9
[ 6122.454820] remoteproc remoteproc0: phdr: type 1 da 0xf09c memsz 0x20c filesz 0x20c
[ 6122.454837] remoteproc remoteproc0: ptr c9e9fa5a
[ 6122.454854] remoteproc remoteproc0: phdr: type 1 da 0xf2a8 memsz 0x90 filesz 0x90
[ 6122.454869] remoteproc remoteproc0: ptr e5c6f2e5
[ 6122.454885] remoteproc remoteproc0: phdr: type 1 da 0x10000000 memsz 0x1114 filesz 0x1000
[ 6122.454899] remoteproc remoteproc0: ptr 7678d1e6
[ 6122.454925] remoteproc remoteproc0: phdr: type 1 da 0x10010210 memsz 0x242b0 filesz 0x0
[ 6122.454939] remoteproc remoteproc0: ptr 3678c0ce

This is from rproc_elf_load_segments() with the addition of "ptr" outputs for the kernel addresses. So it seems to load the correct code into 0x10000000. Still, once it tries to execute code from this memory region it immediately crashes with "Faulting instruction address (r15/pc): 0x100000c8". This is right the beginning of the first function, I would call from this memory region. This leads me to believe that the ELF loader did not in fact load the section properly...

PS: I also wonder what the mcuram region at 0x30000000 is all about, considering that it's not in the RAM mapping or in the linker script that @ArnaudP has sent. It is originally defined in the Seeed Odyssey BSP I am using. It's probably just a mistake in their device tree!?

 

The 0x30000000 address is an alias of 0x10000000. Both point to the physical MCU SRAM but through different buses.

So your environment is not STM32Cube but Zephyr.

The Zephyr linker script should be reviewed to be more flexible. I made a proof of concept (POC) a long time ago here: https://github.com/arnopo/zephyr_stm32mp1/commits/linker_poc/. Perhaps there is now another way to do it. Feel free to propose something upstream to improve the linker.

I hope that this will help you.


 





 

@ArnaudP  I didn't know that SRAM4 is accessible from the M4 if you don't use DMA buffers on Linux, as it's not shown in the RAM mapping. Good to know! I see that you don't use retram for code, but only mcusramcode at 0x10000000. I guess you could go either way to make use of all the available memory - either relocate code into two unconsecutive memory regions or manually declare retram-sections on some larger objects. Probably the latter will be simpler at the end of the day.

Do you know whether there is any way to access the DDR RAM from the M4 core? I find contradicting statements about that here in the community.



@rhaberkorn wrote:

@ArnaudP  I didn't know that SRAM4 is accessible from the M4 if you don't use DMA buffers on Linux, as it's not shown in the RAM mapping. Good to know! I see that you don't use retram for code, but only mcusramcode at 0x10000000. I guess you could go either way to make use of all the available memory - either relocate code into two unconsecutive memory regions or manually declare retram-sections on some larger objects. Probably the latter will be simpler at the end of the day.


Yes, this is something that could be customized by the customer project. In Zephyr upstream, the linker script should be kept generic as it is used for all examples.  

FYI, some compilers offer syntax to help manage non-contiguous memory regions. This is the case with GCC, which offers the option "--enable-non-contiguous-regions."


@rhaberkorn wrote:

Do you know whether there is any way to access the DDR RAM from the M4 core? I find contradicting statements about that here in the community.


The DDR is accessible by the Cortex-M4, but it is something we advise against. The Cortex-M4 has no cache, which will impact system performance by performing burst access on the interconnect to access its code or data.