cancel
Showing results for 
Search instead for 
Did you mean: 

STM32H7S78-DK nema_cl_submit hangs, on STM32U5G9J-DK2 it works fine. - Fixed

Jack3
Senior III

NemaGFX already works baremetal on the STM32U5G9J-DK2.
It probably only needs a little tweaking to get it working baremetal on the STM32H7S78-DK.

So the software runs from external flash (XSPI2 0x70000000) and the frame buffer is located in external PSRAM (XSPI1 0x90000000).

When the MCU writes to the frame buffer, it's displayed correctly on the screen.

Then I tried to use NemaGFX but it starts hanging or generating memory traps (Fixed by now).

I know NemaGFX works with RAM and not cache, so I've checked my MPU settings to rule out cache issues.

My LinkerScript.ld:

/*
******************************************************************************
** @file        : LinkerScript.ld
** @author      : STM32CubeIDE
**  Abstract    : Linker script for STM32H7Sxx Device
**                       64KBytes FLASH
**                      456KBytes RAM
**                      256MBytes EXTFLASH
**                       32MBytes EXTRAM
**
**  Target      : STM32H7S7L8
******************************************************************************
** @attention
**
** Copyright (c) 2025 STMicroelectronics.
** All rights reserved.
**
** This software is licensed under terms that can be found in the LICENSE file
** in the root directory of this software component.
** If no LICENSE file comes with this software, it is provided AS-IS.
**
******************************************************************************
*/

/* Entry Point */
ENTRY(Reset_Handler)

/* Highest address of the user mode stack */
_estack = ORIGIN(DTCM) + LENGTH(DTCM); /* end of "DTCM" Ram type memory */

_Min_Heap_Size   = 0x1000;      /* required amount of heap */
_Min_Stack_Size  = 0x2000;      /* required amount of stack */

/* Memories definition */
MEMORY
{
  RAM       (rw)  : ORIGIN = 0x24000000, LENGTH = 0x0006e000  /* 0x24000000 - 0x2406E000 440kB */
  RAM_CMD   (rw)  : ORIGIN = 0x2406E000, LENGTH = 0x00004000  /* 0x2406E000 - 0x24072000 16kB Nemagfx_Memory_Pool_Buffer nemagfx_pool_mem */

  ITCM      (xrw) : ORIGIN = 0x00000000, LENGTH = 0x00010000
  DTCM      (rw)  : ORIGIN = 0x20000000, LENGTH = 0x00010000
  SRAMAHB   (rw)  : ORIGIN = 0x30000000, LENGTH = 0x00008000
  BKPSRAM   (rw)  : ORIGIN = 0x38800000, LENGTH = 0x00001000

  FLASH     (xr)  : ORIGIN = 0x70000000, LENGTH = 0x00200000  /* XSPI2 0x70000000 - 0x701FFFFF EXTFLASH 2MB */
  FLASH_GFX (r)   : ORIGIN = 0x70200000, LENGTH = 0x0FE00000  /* XSPI2 0x70200000 - 0x7FFFFFFF EXTFLASH 254MB */

  EXTFLASH  (xr)  : ORIGIN = 0x70000000, LENGTH = 0x10000000  /* XSPI2 0x70000000 - 0x7FFFFFFF EXTFLASH 256MB */
  
  EXTRAM    (rw)  : ORIGIN = 0x90000000, LENGTH = 0x02000000  /* XSPI1 0x90000000 - 0x92000000 EXTRAM    32MB */
}

/* Sections */
SECTIONS
{
  /* The startup code into "FLASH" FLASH type memory */
  .isr_vector :
  {
    . = ALIGN(4);
    KEEP(*(.isr_vector)) /* Startup code */
    . = ALIGN(4);
  } >FLASH

  /* The program code and other data into "FLASH" FLASH type memory */
  .text :
  {
    . = ALIGN(4);
    *(.text)           /* .text sections (code) */
    *(.text*)          /* .text* sections (code) */
    *(.glue_7)         /* glue arm to thumb code */
    *(.glue_7t)        /* glue thumb to arm code */
    *(.eh_frame)

    KEEP (*(.init))
    KEEP (*(.fini))

    . = ALIGN(4);
    _etext = .;        /* define a global symbols at end of code */
  } >FLASH

  /* Constant data into "FLASH" FLASH type memory */
  .rodata :
  {
    . = ALIGN(4);
    *(.rodata)         /* .rodata sections (constants, strings, etc.) */
    *(.rodata*)        /* .rodata* sections (constants, strings, etc.) */
    . = ALIGN(4);
  } >FLASH

  .ARM.extab (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
  {
    . = ALIGN(4);
    *(.ARM.extab* .gnu.linkonce.armextab.*)
    . = ALIGN(4);
  } >FLASH
  .ARM (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
  {
    . = ALIGN(4);
    __exidx_start = .;
    *(.ARM.exidx*)
    __exidx_end = .;
    . = ALIGN(4);
  } >FLASH

  .preinit_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
  {
    . = ALIGN(4);
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array*))
    PROVIDE_HIDDEN (__preinit_array_end = .);
    . = ALIGN(4);
  } >FLASH

  .init_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
  {
    . = ALIGN(4);
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT(.init_array.*)))
    KEEP (*(.init_array*))
    PROVIDE_HIDDEN (__init_array_end = .);
    . = ALIGN(4);
  } >FLASH

  .fini_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
  {
    . = ALIGN(4);
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT(.fini_array.*)))
    KEEP (*(.fini_array*))
    PROVIDE_HIDDEN (__fini_array_end = .);
    . = ALIGN(4);
  } >FLASH

  /* Used by the startup to initialize data */
  _sidata = LOADADDR(.data);

  /* Initialized data sections into "RAM" Ram type memory */
  .data :
  {
    . = ALIGN(4);
    _sdata = .;        /* create a global symbol at data start */
    *(.data)           /* .data sections */
    *(.data*)          /* .data* sections */
    *(.RamFunc)        /* .RamFunc sections */
    *(.RamFunc*)       /* .RamFunc* sections */

    . = ALIGN(4);
    _edata = .;        /* define a global symbol at data end */

  } >RAM AT> FLASH

  /* Uninitialized data section into "RAM" Ram type memory */
  . = ALIGN(4);
  .bss :
  {
    /* This is used by the startup in order to initialize the .bss section */
    _sbss = .;         /* define a global symbol at bss start */
    __bss_start__ = _sbss;
    *(.bss)
    *(.bss*)
    *(COMMON)

    . = ALIGN(4);
    _ebss = .;         /* define a global symbol at bss end */
    __bss_end__ = _ebss;
  } >RAM

  /* User_heap_stack section, used to check that there is enough "RAM" Ram  type memory left */
  ._user_heap_stack :
  {
    . = ALIGN(8);
    PROVIDE ( end = . );
    PROVIDE ( _end = . );
    . = . + _Min_Heap_Size;
    . = . + _Min_Stack_Size;
    . = ALIGN(8);
  } >DTCM

  /* Remove information from the compiler libraries */
  /DISCARD/ :
  {
    libc.a ( * )
    libm.a ( * )
    libgcc.a ( * )
  }

  .ARM.attributes 0 : { *(.ARM.attributes) }

  BufferSection (NOLOAD) :
  {
    *(Nemagfx_Framebuffer Nemagfx_Framebuffer.*)
    *(.gnu.linkonce.r.*)
    . = ALIGN(0x8);
  } >EXTRAM
  
  UncachedSection (NOLOAD) :
  {
    *(Nemagfx_Memory_Pool_Buffer Nemagfx_Memory_Pool_Buffer.*)
    *(.gnu.linkonce.r.*)
    . = ALIGN(0x8);

    *(Nemagfx_Ring_Buffer Nemagfx_Ring_Buffer.*)
    *(.gnu.linkonce.r.*)
    . = ALIGN(0x8);
  
    *(Nemagfx_Memory_Pool_Buffer_RAM_CMD Nemagfx_Memory_Pool_Buffer_RAM_CMD.*) /* nemagfx_pool_mem 16kB 0x2406E000 */
    *(.gnu.linkonce.r.*)
    . = ALIGN(0x8);
  } >RAM_CMD
  
  FontFlashSection :
  {
    *(FontFlashSection FontFlashSection.*)
    *(.gnu.linkonce.r.*)
    . = ALIGN(0x4);
  } >FLASH_GFX

  TextFlashSection :
  {
    *(TextFlashSection TextFlashSection.*)
    *(.gnu.linkonce.r.*)
    . = ALIGN(0x4);
  } >FLASH_GFX

  ExtFlashSection :
  {
    *(ExtFlashSection ExtFlashSection.*)
    *(.gnu.linkonce.r.*)
    . = ALIGN(0x4);
  } >FLASH_GFX
  
}

This is my MPU configuration:

static void MPU_Config(void)
{
  MPU_Region_InitTypeDef MPU_InitStruct = {0};

  /* Disables the MPU */
  HAL_MPU_Disable();

  /* Disables all MPU regions */
  for(uint8_t i=0; i<__MPU_REGIONCOUNT; i++)
  {
    HAL_MPU_DisableRegion(i);
  }

  /** Initializes and configures the Region and the memory to be protected
  */
  MPU_InitStruct.Enable = MPU_REGION_ENABLE;
  MPU_InitStruct.Number = MPU_REGION_NUMBER0;
  MPU_InitStruct.BaseAddress = 0x00000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_4GB;
  MPU_InitStruct.SubRegionDisable = 0x87;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
  MPU_InitStruct.AccessPermission = MPU_REGION_NO_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /**
   * Initializes and configures the Region and the memory to be protected
   * XSPI2 Range 0x70000000 - 0x7FFFFFFF, Size 0x10000000 = 256MB
   */
  MPU_InitStruct.Number = MPU_REGION_NUMBER1;
  MPU_InitStruct.BaseAddress = 0x70000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_256MB;
  MPU_InitStruct.SubRegionDisable = 0x0;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /**
   * Initializes and configures the Region and the memory to be protected
   * XSPI2 Range 0x70000000 - 0x70200000, Size 0x00200000 = 2MB
   */
  MPU_InitStruct.Number = MPU_REGION_NUMBER2;
  MPU_InitStruct.BaseAddress = 0x70000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_2MB;
  MPU_InitStruct.SubRegionDisable = 0x0;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /**
   * Initializes and configures the Region and the memory to be protected
   * XSPI1 Range 0x90000000 - 0x91FFFFFF, Size 0x02000000 = 32MB
   */
  MPU_InitStruct.Number = MPU_REGION_NUMBER3;
  MPU_InitStruct.BaseAddress = 0x90000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_32MB;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /**
   * Initializes and configures the Region and the memory to be protected
   */
  MPU_InitStruct.Number = MPU_REGION_NUMBER4;
  MPU_InitStruct.BaseAddress = 0x20000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_64KB;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /**
   * Initializes and configures the Region and the memory to be protected
   * RAM Range 0x24000000 - 0x24080000, Size 0x00080000 = 512kB
   */
  MPU_InitStruct.Number = MPU_REGION_NUMBER5;
  MPU_InitStruct.BaseAddress = 0x24000000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_512KB;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
  MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /**
   * Initializes and configures the Region and the memory to be protected
   * RAM_CMD for GPU2D - Nemagfx_Memory_Pool_Buffer nemagfx_pool_mem 16kB
   * RAM Range 0x2406E000 - 24072000, Size 0x00004000 = 16kB
   */
  MPU_InitStruct.Number = MPU_REGION_NUMBER6;
  MPU_InitStruct.BaseAddress = 0x2406E000;
  MPU_InitStruct.Size = MPU_REGION_SIZE_16KB;
  MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE; /* For GPU2D - no cache! */
  MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;

  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /* Enables the MPU */
  HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}

 

 

Update: I got it working.

I modified my nema_sys_init in hal.c because the supplied baremetal template doesn't work on this platform.
I updated nema_hal.h and nema_hal.c so they can be used both on the STM32U5G9J-DK2 and STM32H7S78-DK.

nema_hal.c:

/*
 * File            : nema_hal.c
 * Function        : Part of the NemaGFX library for STM32
 * Modified        : 13-09-2025
 */

/* Includes ----------------------------------------------------------------- */

#include <stdlib.h>
#include <string.h>
#include "nema_hal.h"
#include "main.h"
#include "gpu2d.h"

/* Include platform-specific header files ----------------------------------- */

#if defined(STM32V7R7xx) || defined(STM32V7R5xx) || defined(STM32V7S7xx) || defined(STM32V7S5xx)
  #include "stm32v7xx_hal.h"
  #include "stm32v7xx_hal_gpu2d.h"
#elif defined(STM32H7S7xx)
  #include "stm32h7rsxx_hal.h"
  #include "stm32h7rsxx_hal_gpu2d.h"
#elif defined(STM32U599xx) || defined(STM32U5A9xx) || defined(STM32U5G9xx)
  #include "stm32u5xx_hal.h"
  #include "stm32u5xx_hal_gpu2d.h"
#else
  #error "Unsupported Platform"
#endif /* STM32V7R7xx | STM32V7R5xx | STM32V7S7xx | STM32V7S5xx */

/* Defines ------------------------------------------------------------------ */

#define RING_SIZE                     1024                     /* Ring Buffer Size in byte */

#if defined(STM32H7S7xx)
  #define NEMAGFX_MEM_POOL_SIZE         10240                    /* NemaGFX byte pool size in byte */
  #define NEMAGFX_STENCIL_POOL_SIZE     DISPLAY_FRAMEBUFFER_SIZE /* NemaGFX stencil buffer pool size in byte */

  /* NemaGFX frame buffer memory 0x90000000 */
  Color_RGB888_t FrameBuffer[DISPLAY_NO_OF_FRAMEBUFFERS][DISPLAY_SIZE_H][DISPLAY_SIZE_W] __attribute__((section("Nemagfx_Framebuffer")));

  /* NemaGFX stencil buffer memory 0x90000000  DISPLAY_FRAMEBUFFER_SIZE=800*480*4=1536000 */
  uint8_t StencilBuffer[DISPLAY_FRAMEBUFFER_SIZE] __attribute__((section("Nemagfx_Framebuffer")));

  /* NemaGFX memory pool 10240 bytes */
  static uint8_t nemagfx_pool_mem[NEMAGFX_MEM_POOL_SIZE] __attribute__((section("Nemagfx_Memory_Pool_Buffer"))); /* Good */
#endif

/* NemaGFX ring buffer memory */
static nema_ringbuffer_t ring_buffer_str = {{0}};

static volatile int last_cl_id = -1;

/* Function implementations ------------------------------------------------- */

void HAL_GPU2D_CommandListCpltCallback(GPU2D_HandleTypeDef *hgpu2d, uint32_t CmdListID)
{
  /* Prevent unused argument(s) compilation warning */
  UNUSED(hgpu2d);

  last_cl_id = CmdListID;
}

int32_t nema_sys_init(void)
{
  /*
   * NEMA| GFX includes the following API calls for memory allocation, deallocation and mapping:
   * - nema_buffer_create() - Allocate memory
   * - nema_buffer_create_pool() - Allocate memory from specific memory pool
   * - nema_buffer_map() - Map allocated memory space for CPU access
   * - nema_buffer_unmap() - Unmap previously mapped memory space
   * - nema_buffer_destroy() - Deallocate memory space
   *
   */
  int error_code = 0;

  /* Initialize GPU2D */
  hgpu2d.Instance = GPU2D;
  HAL_GPU2D_Init(&hgpu2d);

#if (USE_HAL_GPU2D_REGISTER_CALLBACKS == 1)
  /* Register Command List Comlete Callback */
  HAL_GPU2D_RegisterCommandListCpltCallback(&hgpu2d, GPU2D_CommandListCpltCallback);
#endif /* USE_HAL_GPU2D_REGISTER_CALLBACKS = 1 */

  /* Allocate ring_buffer memory */
#if defined(STM32U599xx) || defined(STM32U5A9xx) || defined(STM32U5G9xx)
  ring_buffer_str.bo = nema_buffer_create_pool(0, RING_SIZE);
//  (void)nema_buffer_map(&ring_buffer_str.bo);
#elif defined(STM32H7S7xx)
  ring_buffer_str.bo.base_virt = nemagfx_pool_mem;
  ring_buffer_str.bo.base_phys = (uint32_t)ring_buffer_str.bo.base_virt;
  ring_buffer_str.bo.size      = RING_SIZE;
  ring_buffer_str.bo.fd        = 0; /* Buffer allocated */
#endif

  /* Initialize Ring Buffer */
  error_code = nema_rb_init(&ring_buffer_str, 1);
  if (error_code < 0)
  {
    return error_code;
  }

  /* Reset last_cl_id counter */
  last_cl_id = 0;

  return error_code;
}

int nema_wait_irq(void)
{
  return 0;
}

int nema_wait_irq_cl(int cl_id)
{
  while (last_cl_id < cl_id)
  {
    (void)nema_wait_irq();
  }

  return 0;
}

int nema_wait_irq_brk(int brk_id)
{
  /* Prevent unused argument(s) compilation warning */
  UNUSED(brk_id);
  while (nema_reg_read(GPU2D_BREAKPOINT) == 0U)
  {
    (void)nema_wait_irq();
  }

  return 0;
}

uint32_t nema_reg_read(uint32_t reg)
{
  return HAL_GPU2D_ReadRegister(&hgpu2d, reg);
}

void nema_reg_write(uint32_t reg, uint32_t value)
{
  HAL_GPU2D_WriteRegister(&hgpu2d, reg, value);
}

/* Used to select the framebuffer */
nema_buffer_t nema_buffer_create_pool(int pool, int size)
{
  nema_buffer_t bo;

#if defined(STM32U599xx) || defined(STM32U5A9xx) || defined(STM32U5G9xx)
  /* Prevent unused argument(s) compilation warning */
  UNUSED(pool);

  bo.base_virt = malloc(size);
#elif defined(STM32H7S7xx)
  bo.base_virt = FrameBuffer[pool];
#endif
  bo.base_phys = (uint32_t)bo.base_virt;
  bo.size      = size;
  bo.fd        = 0; /* Buffer allocated */

  return bo;
}

void *nema_buffer_map(nema_buffer_t *bo)
{
  return bo->base_virt;
}

void nema_buffer_unmap(nema_buffer_t *bo)
{
  /* Prevent unused argument(s) compilation warning */
  UNUSED(bo);
}

void nema_buffer_destroy(nema_buffer_t *bo)
{
  if (bo->fd == -1)
  {
      return; /* Buffer weren't allocated! */
  }

#if defined(STM32U599xx) || defined(STM32U5A9xx) || defined(STM32U5G9xx)
  free(bo->base_virt);
#endif

  bo->base_virt = (void*)0;
  bo->base_phys = 0;
  bo->size      = 0;
  bo->fd        = -1; /* Buffer not allocated */
}

uintptr_t nema_buffer_phys(nema_buffer_t *bo)
{
  return bo->base_phys;
}

void nema_buffer_flush(nema_buffer_t * bo)
{
#if defined(NEMA_CACHED_MEMORY)
  SCB_CleanInvalidateDCache_by_Addr((uint32_t *)bo->base_virt, bo->size);
#else /* !NEMA_CACHED_MEMORY */
  /* Prevent unused argument(s) compilation warning */
  UNUSED(bo);
#endif /* NEMA_CACHED_MEMORY */
}

void nema_host_free(void *ptr)
{
  if (ptr)
  {
    free(ptr);
  }
}

void *nema_host_malloc(unsigned size)
{
  return malloc(size);
}

int nema_mutex_lock(int mutex_id)
{
  /* Prevent unused argument(s) compilation warning */
  UNUSED(mutex_id);

  return 0;
}

int nema_mutex_unlock(int mutex_id)
{
  /* Prevent unused argument(s) compilation warning */
  UNUSED(mutex_id);

  return 0;
}

void platform_disable_cache(void)
{

}

void platform_invalidate_cache(void)
{

}

/* End of File -------------------------------------------------------------- */

nema_hal.h:

/*
 * File            : nema_hal.h
 * Function        : Part of the NemaGFX library for STM32
 * Modified        : 13-09-2025
 */

#ifndef __NEMA_HAL_H__
#define __NEMA_HAL_H__

/* Includes ----------------------------------------------------------------- */

#include "nema_sys_defs.h"

#ifdef __cplusplus
extern "C" {
#endif

/* Defines ------------------------------------------------------------------ */

#if defined(STM32H7S7xx)
#define DISPLAY_SIZE_W                800
#define DISPLAY_SIZE_H                480
#define DISPLAY_BYTES_PER_PIXEL       4
#define DISPLAY_STRIDE                (DISPLAY_SIZE_W * DISPLAY_BYTES_PER_PIXEL)
#define DISPLAY_FRAMEBUFFER_SIZE      (DISPLAY_SIZE_W * DISPLAY_SIZE_H * DISPLAY_BYTES_PER_PIXEL)
#define DISPLAY_NO_OF_FRAMEBUFFERS    20

/* Global variables --------------------------------------------------------- */

typedef struct
{
  /* Little-endian */
  /* LSB */
  uint8_t B;
  uint8_t G;
  uint8_t R;
  uint8_t A;
  /* MSB */
} __attribute__((packed)) Color_RGB888_t;

typedef struct {
  union {
    struct
    {
      /* Little-endian */
      /* LSB */
      uint8_t A; /* [7:0] */
      uint8_t B; /* [15:8] */
      uint8_t G; /* [23:16] */
      uint8_t R; /* [31:24] */
      /* MSB */
    };
    struct {
      /* Little-endian */
      uint32_t Uint32;
    };
  };
} Display_Color_RGBA_32_t;

/* NemaGFX frame buffer memory 0x90000000 */
extern Color_RGB888_t FrameBuffer[DISPLAY_NO_OF_FRAMEBUFFERS][DISPLAY_SIZE_H][DISPLAY_SIZE_W] __attribute__((section("Nemagfx_Framebuffer")));

/* NemaGFX stencil buffer memory 0x90000000 */
extern uint8_t StencilBuffer[DISPLAY_FRAMEBUFFER_SIZE] __attribute__((section("Nemagfx_Framebuffer")));
#endif

/* Type definitions --------------------------------------------------------- */

typedef struct nema_buffer_t_
{
  int           size;                 /**< Size of buffer */
  int           fd;                   /**< File Descriptor of buffer */
  void         *base_virt;            /**< Virtual address of buffer */
  uintptr_t     base_phys;            /**< Physical address of buffer */
} nema_buffer_t;

#if defined(STM32H7S7xx)
  extern nema_buffer_t stencil_bo;
#endif

/* Private */
typedef struct nema_ringbuffer_t_
{
  nema_buffer_t bo;
  int           offset;               /**< Number of 32-bit entries */
  int           last_submission_id;
} nema_ringbuffer_t;

/* Function declarations ---------------------------------------------------- */

/** \brief Initialize system. Implementor defined. Called in nema_init()
 *
 * \param void
 * \return 0 if no errors occurred
 * \see nema_init()
 *
 */
int32_t nema_sys_init(void);

/** \brief Wait for interrupt from the GPU
 *
 * \param void
 * \return 0 on success
 *
 */
int nema_wait_irq(void);

/** \brief Wait for a Command List to finish
 *
 * \param cl_id Command List ID
 * \return 0 on success
 *
 */
int nema_wait_irq_cl(int cl_id);

/** \brief Wait for a Breakpoint
 *
 * \param cl_id Breakpoint ID
 * \return 0 on success
 *
 */
int nema_wait_irq_brk(int brk_id);

/** \brief Read Hardware register
 *
 * \param reg Register to read
 * \return Value read from the register
 * \see nema_reg_write
 *
 */
uint32_t nema_reg_read(uint32_t reg);

/** \brief Write Hardware Register
 *
 * \param reg Register to write
 * \param value Value to be written
 * \return void()
 * \see nema_reg_read()
 *
 */
void nema_reg_write(uint32_t reg, uint32_t value);

/** \brief Used to select the framebuffer
 *
 * \param pool ID of the desired memory pool
 * \param size Size of buffer in bytes
 * \return nema_buffer_t struct
 *
 */
nema_buffer_t nema_buffer_create_pool(int pool, int size);

/** \brief Maps buffer
 *
 * \param bo Pointer to buffer struct
 * \return Virtual pointer of the buffer (same as in bo->base_virt)
 *
 */
void *nema_buffer_map(nema_buffer_t *bo);

/** \brief Unmaps buffer
 *
 * \param bo Pointer to buffer struct
 * \return void
 *
 */
void nema_buffer_unmap(nema_buffer_t *bo);

/** \brief Destroy/deallocate buffer
 *
 * \param bo Pointer to buffer struct
 * \return void
 *
 */
void nema_buffer_destroy(nema_buffer_t *bo);

/** \brief Get physical (GPU) base address of a given buffer
 *
 * \param bo Pointer to buffer struct
 * \return Physical base address of a given buffer
 *
 */
uintptr_t nema_buffer_phys(nema_buffer_t *bo);

/** \brief Write-back buffer from cache to main memory
 *
 * \param bo Pointer to buffer struct
 * \return void
 *
 */
void nema_buffer_flush(nema_buffer_t * bo);

/** \brief Allocate memory for CPU to use (typically, standard malloc() is called)
 *
 * \param size Size in bytes
 * \return Pointer to allocated memory (virtual)
 * \see nema_host_free()
 *
 */
void *nema_host_malloc(size_t size);

/** \brief Free memory previously allocated with nema_host_malloc()
 *
 * \param ptr Pointer to allocated memory (virtual)
 * \return void
 * \see nema_host_malloc()
 *
 */
void  nema_host_free(void *ptr );

/** \brief Initialize Ring Buffer. Should be called from inside nema_sys_init().
 *   This is a private function, the user should never call it.
 *
 * \param *rb 	Pointer to nema_ring_buffer_t struct
 * \param reset Resets the Ring Buffer if non-zero
 * \return 		Negative number on error
 * \see nema_sys_init()
 *
 */
/** \private */
int nema_rb_init(nema_ringbuffer_t *rb, int reset);

#define MUTEX_RB     0
#define MUTEX_MALLOC 1
#define MUTEX_FLUSH  2
#define MUTEX_MAX    2

/** \brief Mutex Lock for multiple processes/threads
 *
 * \param MUTEX_RB or MUTEX_MALLOC
 * \return int
 *
 */
int nema_mutex_lock(int mutex_id);

/** \brief Mutex Unlock for multiple processes/threads
 *
 * \param MUTEX_RB or MUTEX_MALLOC
 * \return int
 *
 */
int nema_mutex_unlock(int mutex_id);

/** \brief Disable DCache
 *
 * \param MUTEX_RB or MUTEX_MALLOC
 * \return int
 *
 */
void platform_disable_cache(void);

/** \brief Invalidate DCache
 *
 * \param MUTEX_RB or MUTEX_MALLOC
 * \return int
 *
 */
void platform_invalidate_cache(void);

#ifdef __cplusplus
}
#endif

#endif  /* __NEMA_HAL_H__ */

/* End of File -------------------------------------------------------------- */

 

Note: there's still one little issue I have to figure out. But for now it can be circumvented.
I have multiple framebuffers (for double buffering, backup, pre-rendered assets, and stuff).
When I select buffer 0, which starts at the beginning of PSRAM, I see a series of pixels near the start of the framebuffer lighting up, indicating NemaGFX has probably placed some workspace over there.

This happens after calling nema_vg_init or even nema_vg_init_stencil_prealloc defining a safe place for the stencil buffer.
So when I select buffer 0, and the GPU draws over that part of the framebuffer it is doomed to crash.
Since I have enough space I just shift the framebuffer from 1 and up, leaving the entire framebuffer 0 for some NemaGFX working space. This way it keeps working. I'll try to sort that out later.
Perhaps somebody can guide me here to avoid this.
Despite this little issue, I hope this post helps someone!

What about the graphics speed difference between these two platforms?

Drawing 80x80 pixel, linewidth 10, random position and color rectangles:

STM32U5G9J-DK2 (Cortex-M33 170MHz): 11165 rectangles/sec.
STM32H7S78-DK (Cortex-H7 600MHz): 7320 rectangles/sec (XSPI bus optimally tuned).

So the STM32U5G9J-DK2 is the clear winner if it comes to graphics performance, it's over 50% faster!
This probably is because of the large amount of zero-waitstate internal SRAM which hosts the framebuffer in the low-power M33.

And the framebuffer on the 600MHz Cortex-M7 has to be 32-bits/pixel, while the 170MHz Cortex-M33 operates with a 24bits/pixel framebuffer in this case. So both GPU's need to write different amounts of data to the framebuffer.
This is where we lose performance, especially when we need to use external PSRAM for the framebuffers.

So the 600MHz STM32H7S7L8 MCU can't beat the 170MHz STM32U5G9ZJ MCU as it comes to graphics performance. For the MCU workload, it is of course a lot faster. And you could pre-render assets and blit them, it offers more flexibility regarding the use of external memory.

Cheers, and have fun developing with graphics!

https://www.youtube.com/watch?v=HA2m78PKuPc
STM32 Graphics on STM32H7S78-DK (Cortex-M7 600MHz):
Rectangle 80x80 pixels, line width 10: 7320 rectangles/sec.

https://www.youtube.com/watch?v=XblQPOOXQ34
STM32 Graphics on STM32U5G9J-DK2 (Cortex-M33 170MHz):
Rectangle 80x80 pixels, line width 10: 11165 rectangles/sec.


I have now ordered the STM32N6570-DK because it has sufficient SRAM for double-buffer framebuffers AND external PSRAM.
Performance results will follow soon.

1 REPLY 1
LouisB
ST Employee

Hello @Jack3,

I cant really tell about the performances but for your information the different boards (h7/n6/u5) have some differences on their nema implementation TouchGFX on NeoChrom/NeoChromVG | TouchGFX Documentation.

BR,

Louis BOUDO
ST Software Engineer | TouchGFX