cancel
Showing results for 
Search instead for 
Did you mean: 

Decreasing RPMSG buffer size to 256 bytes on linux kernel will return "Failed to insert all hardware breakpoints.." when flashing or debugging M4 firmware from Cube IDE.

KChar.1
Senior

Hi there,

I am trying to decrease the RPMSG buffer size from 512 to 256 bytes. I have successfully managed to increase the buffer size in the past to 1024 and allocate more memory for 32VRINGS. When attempting to decrease the buffer size to lower than 512 in the linux side Cube IDE will prompt "Failed to insert all hardware breakpoints...". I am really curious on how openamp_conf relates to hardware breakpoints.

The configuration for both sides as follows:

Linux Side

virtio_rpmsg_bus.c :

#define MAX_RPMSG_NUM_BUFS	(256)
#define MAX_RPMSG_BUF_SIZE	(256)

stm32mp15xx-dkx.dtsi:

vdev0vring0: vdev0vring0@10040000 {
			compatible = "shared-dma-pool";
			reg = <0x10040000 0x0800>;
			no-map;
		};
 
		vdev0vring1: vdev0vring1@10040800 {
			compatible = "shared-dma-pool";
			reg = <0x10040800 0x0800>;
			no-map;
		};
 
   		 vdev0buffer: vdev0buffer@10041000 {
    		compatible = "shared-dma-pool";
    		reg = <0x10041000 0x10000>;
    		no-map;
  		};

M4 Side

STM32MP157CACX_RAM.ld

MEMORY
{
  RETRAM_interrupts	(xrw)	: ORIGIN = 0x00000000,	LENGTH = 0x00000298
  FLASH_text		(rx)	: ORIGIN = 0x10000000,	LENGTH = 128K
  RAM1_data		(xrw)	: ORIGIN = 0x10020000,	LENGTH = 128K
  RAM2_ipc_shm		(xrw)	: ORIGIN = 0x10040000,	LENGTH = 0x00011000 
}

1 ACCEPTED SOLUTION

Accepted Solutions
ArnaudP
Senior

The size of the buffer impacts only the vdev0buffer memory area

vdev0buffer_size = buffer_size * number _of buffer * 2

=> the factor 2 is for RX + TX

The number of buffer used impacts vdev0buffer but also the vdev0vringX area.

Indeed the vrings containts the buffers descriptor structure + the used and available buffers structure

=> calculation is here: https://elixir.bootlin.com/linux/v5.10.87/source/drivers/remoteproc/remoteproc_core.c#L332

https://elixir.bootlin.com/linux/v5.10.87/source/include/uapi/linux/virtio_ring.h#L203

So if you increase the number of buffers you could have to increase the vdev0vringX accordingly.

View solution in original post

4 REPLIES 4
ArnaudP
Senior

Hello,

I suggest you to not update stm32mp15xx-dkx.dtsi, vdev0vring<X> should be 4K page aligned.

Then on cube side you have also to update the size in the OpenAMp library:

https://github.com/STMicroelectronics/STM32CubeMP1/blob/master/Middlewares/Third_Party/OpenAMP/open-amp/lib/include/openamp/rpmsg_virtio.h#L26

Hope this will help

KChar.1
Senior

Hi @ArnaudP​ ,

Thanks a lot for your reply and suggestion.

Apologies, I forgot to include in the original post that I had already changed the rpmsg_virtion.h buffer size to 256. This works if I leave the vdev0ring<X> at 4K or above. To my understanding the alignment happens in openamp_conf.h on M4 side with a pre-processor instructive. I am wondering if I am missing something here:

#endif
 
#if defined LINUX_RPROC_MASTER
#define VRING_RX_ADDRESS     ((unsigned int)-1)  /* allocated by Master processor: CA7 */
#define VRING_TX_ADDRESS     ((unsigned int)-1)  /* allocated by Master processor: CA7 */
#define VRING_BUFF_ADDRESS   ((unsigned int)-1)  /* allocated by Master processor: CA7 */
#define VRING_ALIGNMENT         16        /* fixed to match with linux constraint */
#define VRING_NUM_BUFFS         96		  /* number of rpmsg buffer */
#else
 
#define VRING_RX_ADDRESS     0x10040000             /* allocated by Master processor: CA7 */
#define VRING_TX_ADDRESS     0x10040400             /* allocated by Master processor: CA7 */
#define VRING_BUFF_ADDRESS   0x10040800             /* allocated by Master processor: CA7 */
#define VRING_ALIGNMENT      4         /* fixed to match with 4k page alignment requested by linux  */
#define VRING_NUM_BUFFS      4             /* number of rpmsg buffer */
#endif

If I leave the vdev0vring<X> at 4K my memory allocation for buffer size X vrings seems to be identical if I use 128B, 256B or 512B.

To explain my self a bit better: I am able to achieve a buffer size of 1024 with 48 vrings by combining SRAM2+SRAM3 in linux side. This configuration looks like this on the linux side:

vdev0vring0: vdev0vring0@10040000 {
			compatible = "shared-dma-pool";
			reg = <0x10040000 0x02000>;
			no-map;
		};
 
		vdev0vring1: vdev0vring1@10042000 {
			compatible = "shared-dma-pool";
			reg = <0x10042000 0x2000>;
			no-map;
		};
 
   		 vdev0buffer: vdev0buffer@10044000 {
    		compatible = "shared-dma-pool";
    		reg = <0x10044000 0x40000>;
    		no-map;
  		};

By using a similar configuration I can increase the number of vrings to 96 by dropping the size of the buffer to 512. This configuration looks like this:

vdev0vring0: vdev0vring0@10040000 {
			compatible = "shared-dma-pool";
			reg = <0x10040000 0x1000>;
			no-map;
		};
 
		vdev0vring1: vdev0vring1@10041000 {
			compatible = "shared-dma-pool";
			reg = <0x10041000 0x1000>;
			no-map;
		};
 
   		 vdev0buffer: vdev0buffer@10042000 {
    		compatible = "shared-dma-pool";
    		reg = <0x10042000 0x40000>;
    		no-map;
  		};

Based on this assumption I hypothesised that I can increase the number of rings to 192 by decreasing the buffer size to 256. If I do not change the vdev0ring<X> from 4K my memory allocation will be the same and it will allow for a maximum of 96 vrings regardless of buffer size. If I decrease the vdev0ring<X> to 2K I will get a "Failed to insert all hardware breakpoints.." error in cube IDE.

Any points or similar experiences on this are highly appreciated!

ArnaudP
Senior

The size of the buffer impacts only the vdev0buffer memory area

vdev0buffer_size = buffer_size * number _of buffer * 2

=> the factor 2 is for RX + TX

The number of buffer used impacts vdev0buffer but also the vdev0vringX area.

Indeed the vrings containts the buffers descriptor structure + the used and available buffers structure

=> calculation is here: https://elixir.bootlin.com/linux/v5.10.87/source/drivers/remoteproc/remoteproc_core.c#L332

https://elixir.bootlin.com/linux/v5.10.87/source/include/uapi/linux/virtio_ring.h#L203

So if you increase the number of buffers you could have to increase the vdev0vringX accordingly.

KChar.1
Senior

Thanks so much for this @ArnaudP​ ! After checking the calculation I was able to see my mistake. I had the false impression that vdev0vring<X> represents the buffer size and vdev0buf represents the amount of buffers * buffer_size. After checking remoteproc_core.c I was able to see what you indicated. By increasing both vdev0vring<X> and vdev0buffer I was able to get 256 buffers of 128 buffer size or 48 buffers of 1024 buffer size. I leave my configuration below in case someone else finds this useful.

RPMSG buffer size in virtio_rpmsg_bus.c : 128

RPMSG buffer size in rpmsg_virtion.h : 128

VRINGS in openamp_conf.h: 256

dev0vring0: vdev0vring0@10040000 {
			compatible = "shared-dma-pool";
			reg = <0x10040000 0x2000>;
			no-map;
		};
 
		vdev0vring1: vdev0vring1@10042000 {
			compatible = "shared-dma-pool";
			reg = <0x10042000 0x2000>;
			no-map;
		};
 
   		 vdev0buffer: vdev0buffer@10042000 {
    		compatible = "shared-dma-pool";
    		reg = <0x10044000 0x40000>;
    		no-map;
  		};

which results in the following resource table

Entry 0 is of type vdev
  ID 7
  Notify ID 0
  Device features 0x1
  Guest features 0x1
  Config length 0x0
  Status 0x7
  Number of vrings 2
  Reserved (should be zero) [0][0]
 
  Vring 0
    Device Address 0x10040000
    Alignment 16
    Number of buffers 256
    Notify ID 0
    Physical Address 0x0
 
  Vring 1
    Device Address 0x10042000
    Alignment 16
    Number of buffers 256
    Notify ID 1
    Physical Address 0x0