STM32MP157C: How to increase the M4 image size

rhaberkorn · ‎2025-01-12

Hello,

I am running a Linux image based on meta-st-stm32mp (hardknott branch).

I currently have the following in my host device tree:

reserved-memory {
                #address-cells = <1>;
                #size-cells = <1>;
                ranges;

                mcuram2: mcuram2@10000000 {
                        compatible = "shared-dma-pool";
                        reg = <0x10000000 0x40000>;
                        no-map;
                };

                vdev0vring0: vdev0vring0@10040000 {
                        compatible = "shared-dma-pool";
                        reg = <0x10040000 0x1000>;
                        no-map;
                };

                vdev0vring1: vdev0vring1@10041000 {
                        compatible = "shared-dma-pool";
                        reg = <0x10041000 0x1000>;
                        no-map;
                };

                vdev0buffer: vdev0buffer@10042000 {
                        compatible = "shared-dma-pool";
                        reg = <0x10042000 0x4000>;
                        no-map;
                };

                mcuram: mcuram@30000000 {
                        compatible = "shared-dma-pool";
                        reg = <0x30000000 0x40000>;
                        no-map;
                };

                retram: retram@38000000 {
                        compatible = "shared-dma-pool";
                        reg = <0x38000000 0x10000>;
                        no-map;
                };

                gpu_reserved: gpu@d4000000 {
                        reg = <0xd4000000 0x4000000>;
                        no-map;
                };
        };

        mlahb: ahb {
                compatible = "st,mlahb", "simple-bus";
                #address-cells = <1>;
                #size-cells = <1>;
                ranges;
                dma-ranges = <0x00000000 0x38000000 0x10000>,
                             <0x10000000 0x10000000 0x60000>,
                             <0x30000000 0x30000000 0x60000>;

                m4_rproc: m4@10000000 {
                        compatible = "st,stm32mp1-m4";
                        reg = <0x10000000 0x40000>,
                              <0x30000000 0x40000>,
                              <0x38000000 0x10000>;
                        /* ... */
                };
        };

mlahb is from the stm32mp151.dtsi.
Let's say, we'd like to increase the allowed image size to 128kb by changing 0x10000 to 0x20000:

&retram {
	reg = <0x38000000 0x20000>;
};

&mlahb {
	dma-ranges = <0x00000000 0x38000000 0x20000>,
	             <0x10000000 0x10000000 0x60000>,
	             <0x30000000 0x30000000 0x60000>;

	m4_rproc {
		reg = <0x10000000 0x40000>,
		      <0x30000000 0x40000>,
		      <0x38000000 0x20000>;
	};
};

Unfortunately, when trying to load a firmware (echo start > /sys/class/remoteproc/remoteproc0/state),
I still get a kernel crash:

[  376.591779] 8<--- cut here ---
[  376.593848] Unhandled fault: imprecise external abort (0x1c06) at 0x004627a4
[  376.600885] pgd = fc518049
[  376.603571] [004627a4] *pgd=c5ef0835, *pte=00000000, *ppte=00000000
[  376.609836] Internal error: : 1c06 [#1] PREEMPT SMP ARM
[  376.615042] Modules linked in: rpmsg_tty rpmsg_core etnaviv gpu_sched spi_stm32 stm32_rproc sch_fq_codel ipv6
[  376.624955] CPU: 1 PID: 298 Comm: sh Not tainted 5.10.10 #1
[  376.630507] Hardware name: STM32 (Device Tree Support)
[  376.635648] PC is at memcpy+0x54/0x330
[  376.639374] LR is at 0x6263000a
[  376.642499] pc : [<c060d914>]    lr : [<6263000a>]    psr: 20000013
[  376.648753] sp : c5cfbe2c  ip : 2172656c  fp : e0d20000
[  376.653965] r10: 00000000  r9 : 00000000  r8 : 646e6168
[  376.659180] r7 : 206b6361  r6 : 626c6c61  r5 : 63206f4e  r4 : 09007265
[  376.665696] r3 : 6c646e61  r2 : 000011b4  r1 : e0bc9220  r0 : e0d30120
[  376.672213] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  376.679336] Control: 10c5387d  Table: c5ea006a  DAC: 00000051
[  376.685072] Process sh (pid: 298, stack limit = 0x86cbec92)
[  376.690629] Stack: (0xc5cfbe2c to 0xc5cfc000)
[  376.694979] be20:                            00011354 00011354 00000001 00000000 e0d20000
[  376.703147] be40: e0bb9054 c0a0a584 00011354 10041000 00000000 00000000 00000000 c01175f4
[  376.711314] be60: 00000100 00000000 00000006 00000001 00000020 c34cd800 c5ea7cc0 c121f270
[  376.719482] be80: e0bb9000 c34cd820 c0a06c88 c34cd800 c34cd800 c34cd820 c5ea7cc0 c35c2840
[  376.727649] bea0: 00000000 00000000 00000000 c0be8e54 00000000 c34cd800 c5ea7cc0 c34cd820
[  376.735817] bec0: c35c2840 c0be9380 c34cd800 c34cd9e8 00000000 c34cd9f4 c34cd820 c0a07f78
[  376.743984] bee0: c5ea7cc0 3a12e441 00000cc0 c34cd820 c5ea7a80 00000006 c34cd800 c5ea6610
[  376.752151] bf00: 00000051 c0a097bc 00000006 c5ea6600 c5ea7a80 c5cfbf80 c5ea6610 c03a3c48
[  376.760317] bf20: 00000000 00000000 0000000b c3337c00 00000006 c03a3b50 00534fb0 c5cfbf80
[  376.768484] bf40: c2de3c00 00000004 0052f2f0 c0302348 c316d900 3a12e441 c316d900 00000000
[  376.776650] bf60: c1894900 c3337c00 c3337c00 00000000 00000000 c0100264 c5cfa000 c03026e8
[  376.784816] bf80: 00000000 00000000 c5cfa000 3a12e441 00000003 00000006 00534fb0 b6f6c1e0
[  376.792984] bfa0: 00000004 c0100060 00000006 00534fb0 00000001 00534fb0 00000006 00000000
[  376.801152] bfc0: 00000006 00534fb0 b6f6c1e0 00000004 00000000 bee1b7f0 00534fb0 0052f2f0
[  376.809318] bfe0: 00000004 bee1b7a0 b6e7135f b6dfd386 60000030 00000001 00000000 00000000
[  376.817517] [<c060d914>] (memcpy) from [<c0a0a584>] (rproc_elf_load_segments+0x1a0/0x288)
[  376.825669] [<c0a0a584>] (rproc_elf_load_segments) from [<c0be8e54>] (rproc_start+0x24/0x154)
[  376.834171] [<c0be8e54>] (rproc_start) from [<c0be9380>] (rproc_fw_boot+0x170/0x1a4)
[  376.841898] [<c0be9380>] (rproc_fw_boot) from [<c0a07f78>] (rproc_boot+0x150/0x1a4)
[  376.849541] [<c0a07f78>] (rproc_boot) from [<c0a097bc>] (state_store+0x40/0xc8)
[  376.856841] [<c0a097bc>] (state_store) from [<c03a3c48>] (kernfs_fop_write+0xf8/0x21c)
[  376.864750] [<c03a3c48>] (kernfs_fop_write) from [<c0302348>] (vfs_write+0xc0/0x318)
[  376.872481] [<c0302348>] (vfs_write) from [<c03026e8>] (ksys_write+0x60/0xe4)
[  376.879603] [<c03026e8>] (ksys_write) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
[  376.887237] Exception stack(0xc5cfbfa8 to 0xc5cfbff0)
[  376.892282] bfa0:                   00000006 00534fb0 00000001 00534fb0 00000006 00000000
[  376.900451] bfc0: 00000006 00534fb0 b6f6c1e0 00000004 00000000 bee1b7f0 00534fb0 0052f2f0
[  376.908613] bfe0: 00000004 bee1b7a0 b6e7135f b6dfd386
[  376.913659] Code: f5d1f07c e8b151f8 e2522020 e8a051f8 (aafffffa)
[  376.919737] ---[ end trace e8ef0d82ecc3eec4 ]---
[  376.925593] 8<--- cut here ---
[  376.927384] Unhandled fault: imprecise external abort (0x1c06) at 0x004627a4
[  376.934422] pgd = 6e154e4d
[  376.937107] [004627a4] *pgd=c5c11835, *pte=00000000, *ppte=00000000

This only happens if the actual firmware image has a code+data section larger than 64kb.
Does anybody have any clue what I am missing in the device tree?
I should also try to get a proper backtrace of that crash...

What's the hard limit for M4 firmwares?
In this comment, @PatrickF said that it's at most 448kb:
https://community.st.com/t5/stm32-mpus-products/possible-advisable-to-increase-the-size-of-m4-coprocessor-text/m-p/51879/highlight/true#M52

What's colliding with the RETRAM section or is it because the M4 core won't execute directly from
RETRAM, but copies code into one of the mcuram regions?
See https://wiki.st.com/stm32mpu/wiki/STM32MP15_MCU_SRAM_internal_memory
Is it the mcuram region that I have to increase?

Yours sincerely,
Robin Haberkorn

ArnaudP · ‎2025-01-27

Do you also update the "vdev0XXXX" memory regions accordingly in the Linux Device tree?
https://elixir.bootlin.com/linux/v6.13-rc3/source/arch/arm/boot/dts/st/stm32mp15xx-dkx.dtsi#L22

rhaberkorn · ‎2025-01-29

@ArnaudP wrote:
Do you also update the "vdev0XXXX" memory regions accordingly in the Linux Device tree?
https://elixir.bootlin.com/linux/v6.13-rc3/source/arch/arm/boot/dts/st/stm32mp15xx-dkx.dtsi#L22

Yes, of course. For instance I tried these adaptions on top of the default memory layout (which I summarized at the very beginning of this thread):

	reserved-memory {
		mcuram3: mcuram3@10020000 {
			compatible = "shared-dma-pool";
			reg = <0x10020000 0x3A000>;
			no-map;
		};

		/delete-node/vdev0vring0;
		vdev0vring0@1005A000 {
			compatible = "shared-dma-pool";
			reg = <0x1005A000 0x1000>;
			no-map;
		};

		/delete-node/vdev0vring1;
		vdev0vring1@1005B000 {
			compatible = "shared-dma-pool";
			reg = <0x1005B000 0x1000>;
			no-map;
		};

		/delete-node/vdev0buffer;
		vdev0buffer@1005C000 {
			compatible = "shared-dma-pool";
			reg = <0x1005C000 0x4000>;
			no-map;
		};
	};
};

&mcuram2 {
	reg = <0x10000000 0x20000>;
};

&m4_rproc {
	memory-region = <&retram>, <&mcuram>, <&mcuram2>, <&mcuram3>,
	                <&vdev0vring0>, <&vdev0vring1>, <&vdev0buffer>;

	/* Why are the vdev0ringX sections missing in this node? */
	reg = <0x10000000 0x20000>, /* mcuram2 */
	      <0x10020000 0x3A000>, /* mcuram3 */
	      <0x30000000 0x40000>, /* mcuram (???) */
	      <0x38000000 0x10000>; /* retram */

	m4_system_resources {
		status = "okay";
	};
};

I had analogous declarations in the Zephyr device tree:

        zephyr,ipc_shm = &mcuram4;
        /* ... */

    mcuram4: memory4@1005A000 {
        compatible = "mmio-sram";
        reg = <0x1005A000 0x6000>;
    };

But unfortunately, I could not get rpmsg_create_ept() to initialize with these settings. I didn't do any further debugging as this reshuffling of memory sections wasn't strictly necessary for what I was trying to achieve - I just declared a third code section at 0x10046000 without moving the IPC buffers from their default location.

rhaberkorn · ‎2025-02-21

@rhaberkorn wrote:
I did get it to work with Zephyr's code relocation feature.

Just that it didn't work. I have the problem that global symbols/variables aren't correctly initialized. I have XIP enabled and use NOCOPY, since there shouldn't be any reason to copy anything around after memory has been initialized by the Linux kernel. Let's say I relocate a file like this:

zephyr_code_relocate(FILES src/main.c LOCATION MCUSRAM1 NOCOPY NOKEEP)

I would expect both code (text) and (ro)data to be mapped into the MCUSRAM1 section. The linker should output ELF program header entries that map the ELF file contents (offset) into the appropriate addresses of MCUSCRAM1 (vaddr; in my case beginning with 0x10000000). For the data section I get something like the following ELF entry (cf. elfdump):

p_type: PT_LOAD
p_offset: 58204
p_vaddr: 0x10010000
p_paddr: 0xe23c
p_filesz: 648
p_memsz: 648
p_flags: PF_W|PF_R
p_align: 4

I can confirm that a certain global variable initializer is in the file after the given p_offset. From that follows a certain vaddr of that symbol. I can confirm that it's the same as the address assigned by the linker (ie. the address C thinks the symbol is at). If the Linux kernel's drivers/remoteproc/remoteproc_elf_loader.c would load the file contents into the correct virtual address (beginning at 0x10010000), everything should be fine. But it's obviously not what happens. I thought it might be because the loader actually uses paddr (see rproc_elf_load_segments()) and tried to patch that to vaddr, but it didn't help either. I do not yet understand where the ELF header's paddr fields actually come from and why they sometimes differ from the vaddr fields. Perhaps the memory is also overwritten by something else - it appears to contain data garbage. It doesn't overlap with any ELF entry, though.

I am certain, this could be fixed somehow and might just be overlooking something, but currently this breaks code relocation for me.