2025-09-11 3:46 AM - edited 2026-02-20 10:47 AM
NemaGFX already works baremetal on the STM32U5G9J-DK2.
It probably only needs a little tweaking to get it working baremetal on the STM32H7S78-DK.
So the software runs from external flash (XSPI2 0x70000000) and the frame buffer is located in external PSRAM (XSPI1 0x90000000).
When the MCU writes to the frame buffer, it's displayed correctly on the screen.
Then I tried to use NemaGFX but it starts hanging or generating memory traps (Fixed by now).
I know NemaGFX works with RAM and not cache, so I've checked my MPU settings to rule out cache issues.
My LinkerScript.ld:
/*
******************************************************************************
** @file : LinkerScript.ld
** @author : STM32CubeIDE
** Abstract : Linker script for STM32H7Sxx Device
** 64KBytes FLASH
** 456KBytes RAM
** 256MBytes EXTFLASH
** 32MBytes EXTRAM
**
** Target : STM32H7S7L8
******************************************************************************
** @attention
**
** Copyright (c) 2025 STMicroelectronics.
** All rights reserved.
**
** This software is licensed under terms that can be found in the LICENSE file
** in the root directory of this software component.
** If no LICENSE file comes with this software, it is provided AS-IS.
**
******************************************************************************
*/
/* Entry Point */
ENTRY(Reset_Handler)
/* Highest address of the user mode stack */
_estack = ORIGIN(DTCM) + LENGTH(DTCM); /* end of "DTCM" Ram type memory */
_Min_Heap_Size = 0x1000; /* required amount of heap */
_Min_Stack_Size = 0x2000; /* required amount of stack */
/* Memories definition */
MEMORY
{
RAM (rw) : ORIGIN = 0x24000000, LENGTH = 0x0006e000 /* 0x24000000 - 0x2406E000 440kB */
RAM_CMD (rw) : ORIGIN = 0x2406E000, LENGTH = 0x00004000 /* 0x2406E000 - 0x24072000 16kB Nemagfx_Memory_Pool_Buffer nemagfx_pool_mem */
ITCM (xrw) : ORIGIN = 0x00000000, LENGTH = 0x00010000
DTCM (rw) : ORIGIN = 0x20000000, LENGTH = 0x00010000
SRAMAHB (rw) : ORIGIN = 0x30000000, LENGTH = 0x00008000
BKPSRAM (rw) : ORIGIN = 0x38800000, LENGTH = 0x00001000
FLASH (xr) : ORIGIN = 0x70000000, LENGTH = 0x00200000 /* XSPI2 0x70000000 - 0x701FFFFF EXTFLASH 2MB */
FLASH_GFX (r) : ORIGIN = 0x70200000, LENGTH = 0x0FE00000 /* XSPI2 0x70200000 - 0x7FFFFFFF EXTFLASH 254MB */
EXTFLASH (xr) : ORIGIN = 0x70000000, LENGTH = 0x10000000 /* XSPI2 0x70000000 - 0x7FFFFFFF EXTFLASH 256MB */
EXTRAM (rw) : ORIGIN = 0x90000000, LENGTH = 0x02000000 /* XSPI1 0x90000000 - 0x92000000 EXTRAM 32MB */
}
/* Sections */
SECTIONS
{
/* The startup code into "FLASH" FLASH type memory */
.isr_vector :
{
. = ALIGN(4);
KEEP(*(.isr_vector)) /* Startup code */
. = ALIGN(4);
} >FLASH
/* The program code and other data into "FLASH" FLASH type memory */
.text :
{
. = ALIGN(4);
*(.text) /* .text sections (code) */
*(.text*) /* .text* sections (code) */
*(.glue_7) /* glue arm to thumb code */
*(.glue_7t) /* glue thumb to arm code */
*(.eh_frame)
KEEP (*(.init))
KEEP (*(.fini))
. = ALIGN(4);
_etext = .; /* define a global symbols at end of code */
} >FLASH
/* Constant data into "FLASH" FLASH type memory */
.rodata :
{
. = ALIGN(4);
*(.rodata) /* .rodata sections (constants, strings, etc.) */
*(.rodata*) /* .rodata* sections (constants, strings, etc.) */
. = ALIGN(4);
} >FLASH
.ARM.extab (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
. = ALIGN(4);
*(.ARM.extab* .gnu.linkonce.armextab.*)
. = ALIGN(4);
} >FLASH
.ARM (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
. = ALIGN(4);
__exidx_start = .;
*(.ARM.exidx*)
__exidx_end = .;
. = ALIGN(4);
} >FLASH
.preinit_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
. = ALIGN(4);
PROVIDE_HIDDEN (__preinit_array_start = .);
KEEP (*(.preinit_array*))
PROVIDE_HIDDEN (__preinit_array_end = .);
. = ALIGN(4);
} >FLASH
.init_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
. = ALIGN(4);
PROVIDE_HIDDEN (__init_array_start = .);
KEEP (*(SORT(.init_array.*)))
KEEP (*(.init_array*))
PROVIDE_HIDDEN (__init_array_end = .);
. = ALIGN(4);
} >FLASH
.fini_array (READONLY) : /* The READONLY keyword is only supported in GCC11 and later, remove it if using GCC10 or earlier. */
{
. = ALIGN(4);
PROVIDE_HIDDEN (__fini_array_start = .);
KEEP (*(SORT(.fini_array.*)))
KEEP (*(.fini_array*))
PROVIDE_HIDDEN (__fini_array_end = .);
. = ALIGN(4);
} >FLASH
/* Used by the startup to initialize data */
_sidata = LOADADDR(.data);
/* Initialized data sections into "RAM" Ram type memory */
.data :
{
. = ALIGN(4);
_sdata = .; /* create a global symbol at data start */
*(.data) /* .data sections */
*(.data*) /* .data* sections */
*(.RamFunc) /* .RamFunc sections */
*(.RamFunc*) /* .RamFunc* sections */
. = ALIGN(4);
_edata = .; /* define a global symbol at data end */
} >RAM AT> FLASH
/* Uninitialized data section into "RAM" Ram type memory */
. = ALIGN(4);
.bss :
{
/* This is used by the startup in order to initialize the .bss section */
_sbss = .; /* define a global symbol at bss start */
__bss_start__ = _sbss;
*(.bss)
*(.bss*)
*(COMMON)
. = ALIGN(4);
_ebss = .; /* define a global symbol at bss end */
__bss_end__ = _ebss;
} >RAM
/* User_heap_stack section, used to check that there is enough "RAM" Ram type memory left */
._user_heap_stack :
{
. = ALIGN(8);
PROVIDE ( end = . );
PROVIDE ( _end = . );
. = . + _Min_Heap_Size;
. = . + _Min_Stack_Size;
. = ALIGN(8);
} >DTCM
/* Remove information from the compiler libraries */
/DISCARD/ :
{
libc.a ( * )
libm.a ( * )
libgcc.a ( * )
}
.ARM.attributes 0 : { *(.ARM.attributes) }
BufferSection (NOLOAD) :
{
*(Nemagfx_Framebuffer Nemagfx_Framebuffer.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x8);
} >EXTRAM
UncachedSection (NOLOAD) :
{
*(Nemagfx_Memory_Pool_Buffer Nemagfx_Memory_Pool_Buffer.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x8);
*(Nemagfx_Ring_Buffer Nemagfx_Ring_Buffer.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x8);
*(Nemagfx_Memory_Pool_Buffer_RAM_CMD Nemagfx_Memory_Pool_Buffer_RAM_CMD.*) /* nemagfx_pool_mem 16kB 0x2406E000 */
*(.gnu.linkonce.r.*)
. = ALIGN(0x8);
} >RAM_CMD
FontFlashSection :
{
*(FontFlashSection FontFlashSection.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x4);
} >FLASH_GFX
TextFlashSection :
{
*(TextFlashSection TextFlashSection.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x4);
} >FLASH_GFX
ExtFlashSection :
{
*(ExtFlashSection ExtFlashSection.*)
*(.gnu.linkonce.r.*)
. = ALIGN(0x4);
} >FLASH_GFX
}This is my MPU configuration:
static void MPU_Config(void)
{
MPU_Region_InitTypeDef MPU_InitStruct = {0};
/* Disables the MPU */
HAL_MPU_Disable();
/* Disables all MPU regions */
for(uint8_t i=0; i<__MPU_REGIONCOUNT; i++)
{
HAL_MPU_DisableRegion(i);
}
/** Initializes and configures the Region and the memory to be protected
*/
MPU_InitStruct.Enable = MPU_REGION_ENABLE;
MPU_InitStruct.Number = MPU_REGION_NUMBER0;
MPU_InitStruct.BaseAddress = 0x00000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_4GB;
MPU_InitStruct.SubRegionDisable = 0x87;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
MPU_InitStruct.AccessPermission = MPU_REGION_NO_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/**
* Initializes and configures the Region and the memory to be protected
* XSPI2 Range 0x70000000 - 0x7FFFFFFF, Size 0x10000000 = 256MB
*/
MPU_InitStruct.Number = MPU_REGION_NUMBER1;
MPU_InitStruct.BaseAddress = 0x70000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_256MB;
MPU_InitStruct.SubRegionDisable = 0x0;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/**
* Initializes and configures the Region and the memory to be protected
* XSPI2 Range 0x70000000 - 0x70200000, Size 0x00200000 = 2MB
*/
MPU_InitStruct.Number = MPU_REGION_NUMBER2;
MPU_InitStruct.BaseAddress = 0x70000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_2MB;
MPU_InitStruct.SubRegionDisable = 0x0;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_ENABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/**
* Initializes and configures the Region and the memory to be protected
* XSPI1 Range 0x90000000 - 0x91FFFFFF, Size 0x02000000 = 32MB
*/
MPU_InitStruct.Number = MPU_REGION_NUMBER3;
MPU_InitStruct.BaseAddress = 0x90000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_32MB;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/**
* Initializes and configures the Region and the memory to be protected
*/
MPU_InitStruct.Number = MPU_REGION_NUMBER4;
MPU_InitStruct.BaseAddress = 0x20000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_64KB;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_NOT_SHAREABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_NOT_BUFFERABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/**
* Initializes and configures the Region and the memory to be protected
* RAM Range 0x24000000 - 0x24080000, Size 0x00080000 = 512kB
*/
MPU_InitStruct.Number = MPU_REGION_NUMBER5;
MPU_InitStruct.BaseAddress = 0x24000000;
MPU_InitStruct.Size = MPU_REGION_SIZE_512KB;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL1;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_CACHEABLE;
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/**
* Initializes and configures the Region and the memory to be protected
* RAM_CMD for GPU2D - Nemagfx_Memory_Pool_Buffer nemagfx_pool_mem 16kB
* RAM Range 0x2406E000 - 24072000, Size 0x00004000 = 16kB
*/
MPU_InitStruct.Number = MPU_REGION_NUMBER6;
MPU_InitStruct.BaseAddress = 0x2406E000;
MPU_InitStruct.Size = MPU_REGION_SIZE_16KB;
MPU_InitStruct.TypeExtField = MPU_TEX_LEVEL0;
MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
MPU_InitStruct.DisableExec = MPU_INSTRUCTION_ACCESS_DISABLE;
MPU_InitStruct.IsShareable = MPU_ACCESS_SHAREABLE;
MPU_InitStruct.IsCacheable = MPU_ACCESS_NOT_CACHEABLE; /* For GPU2D - no cache! */
MPU_InitStruct.IsBufferable = MPU_ACCESS_BUFFERABLE;
HAL_MPU_ConfigRegion(&MPU_InitStruct);
/* Enables the MPU */
HAL_MPU_Enable(MPU_PRIVILEGED_DEFAULT);
}
Update: I got it working.
I modified my nema_sys_init in hal.c because the supplied baremetal template doesn't work on this platform.
I updated nema_hal.h and nema_hal.c so they can be used both on the STM32U5G9J-DK2 and STM32H7S78-DK.
nema_hal.c:
#include <display.h>
#include <stdlib.h>
#include <stm32h7s7xx.h>
#include <string.h>
#include <tsi_malloc.h>
#include "nema_hal.h"
#include "nema_vg.h"
#include "nema_graphics.h"
#include "main.h"
#include "gpu2d.h"
#include "gfxmmu.h"
#if defined(STM32V7R7xx) || defined(STM32V7R5xx) || defined(STM32V7S7xx) || defined(STM32V7S5xx)
#include "stm32v7xx_hal.h"
#include "stm32v7xx_hal_gpu2d.h"
#elif defined(STM32H7S7xx)
#include "stm32h7rsxx_hal.h"
#include "stm32h7rsxx_hal_gpu2d.h"
#else
#error "Unsupported Platform"
#endif /* STM32V7R7xx | STM32V7R5xx | STM32V7S7xx | STM32V7S5xx */
#define RING_SIZE 1024
#define NEMAGFX_MEM_POOL_SIZE 24320
#define NEMAGFX_STENCIL_POOL_SIZE 389120 /* NemaGFX stencil buffer pool size in byte 800*480+5120 */
#if defined(STM32H7S7xx)
/* RAM_CMD */
static uint8_t nemagfx_pool_mem[NEMAGFX_MEM_POOL_SIZE] __attribute__((section("Nemagfx_Memory_Pool_Buffer")));
static uint8_t nemagfx_stencil_buffer_mem[NEMAGFX_STENCIL_POOL_SIZE] __attribute__((section("Nemagfx_Stencil_Buffer")));
static uint8_t nemagfx_ring_mem[RING_SIZE];
/* NemaGFX frame buffer memory 0x90000000 */
#if (DISPLAY_BYTES_PER_PIXEL == 2)
Color_RGB565_t FrameBuffer[DISPLAY_NO_OF_FRAMEBUFFERS][DISPLAY_SIZE_H][DISPLAY_SIZE_W] __attribute__((section("Nemagfx_Framebuffer")));
#elif (DISPLAY_BYTES_PER_PIXEL == 3)
Color_RGB888_t FrameBuffer[DISPLAY_NO_OF_FRAMEBUFFERS][DISPLAY_SIZE_H][DISPLAY_SIZE_W] __attribute__((section("Nemagfx_Framebuffer")));
#elif (DISPLAY_BYTES_PER_PIXEL == 4)
Color_RGBA8888_t FrameBuffer[DISPLAY_NO_OF_FRAMEBUFFERS][DISPLAY_SIZE_H][DISPLAY_SIZE_W] __attribute__((section("Nemagfx_Framebuffer")));
#endif /* DISPLAY_BYTES_PER_PIXEL */
#endif /* STM32H7S7xx */
/* NemaGFX ring buffer memory */
static nema_ringbuffer_t ring_buffer_str = {{0}};
static volatile int last_cl_id = -1;
#if (USE_HAL_GPU2D_REGISTER_CALLBACKS == 1)
static void GPU2D_CommandListCpltCallback(GPU2D_HandleTypeDef* hgpu2d, uint32_t CmdListID)
#else /* USE_HAL_GPU2D_REGISTER_CALLBACKS = 0 */
void HAL_GPU2D_CommandListCpltCallback(GPU2D_HandleTypeDef* hgpu2d, uint32_t CmdListID)
#endif /* USE_HAL_GPU2D_REGISTER_CALLBACKS = 1 */
{
UNUSED(hgpu2d);
last_cl_id = CmdListID;
}
void HAL_GPU2D_ErrorCallback(GPU2D_HandleTypeDef *hgpu2d)
{
uint32_t val = nema_reg_read(GPU2D_SYS_INTERRUPT); /* clear the ER interrupt */
nema_reg_write(GPU2D_SYS_INTERRUPT, val);
/* external GPU2D cache maintenance */
if (val & (1UL << 2))
{
HAL_ICACHE_Disable();
nema_ext_hold_deassert_imm(2);
}
if (val & (1UL << 3))
{
HAL_ICACHE_Enable();
HAL_ICACHE_Invalidate();
nema_ext_hold_deassert_imm(3);
}
}
int32_t nema_sys_init(void)
{
int error_code = 0;
/* Initialize GPU2D */
hgpu2d.Instance = GPU2D;
HAL_GPU2D_Init(&hgpu2d);
#if (USE_HAL_GPU2D_REGISTER_CALLBACKS == 1)
/* Register Command List Complete Callback */
HAL_GPU2D_RegisterCommandListCpltCallback(&hgpu2d, GPU2D_CommandListCpltCallback);
// HAL_GPU2D_RegisterCommandListCpltCallback(&hgpu2d, HAL_GPU2D_CommandListCpltCallback);
#endif /* USE_HAL_GPU2D_REGISTER_CALLBACKS = 1 */
/* Initialize Mem Space */
error_code = tsi_malloc_init_pool_aligned(0, (void*)nemagfx_pool_mem, (uintptr_t)nemagfx_pool_mem, NEMAGFX_MEM_POOL_SIZE, 1, 8);
assert(error_code == 0);
error_code = tsi_malloc_init_pool_aligned(1, (void*)nemagfx_stencil_buffer_mem, (uintptr_t)nemagfx_stencil_buffer_mem, NEMAGFX_STENCIL_POOL_SIZE, 1, 8);
assert(error_code == 0);
/* Allocate ring_buffer memory */
ring_buffer_str.bo = nema_buffer_create_pool(0, RING_SIZE);
assert(ring_buffer_str.bo.base_virt);
/* Initialize Ring Buffer */
error_code = nema_rb_init(&ring_buffer_str, 1);
if (error_code < 0)
{
return error_code;
}
/* Reset last_cl_id counter */
last_cl_id = 0;
return error_code;
}
void nema_components_init()
{
/* Initialize NemaGFX library */
nema_init();
nema_reg_write(0xFFC, 0x7E); /* Enable bus error interrupts */
nema_vg_handle_large_coords(1, 1);
nema_ext_hold_enable(2);
nema_ext_hold_irq_enable(2);
nema_ext_hold_enable(3);
nema_ext_hold_irq_enable(3);
nema_sys_init();
/* Initialize the stencil pool after nema_sys_init() */
nema_vg_init_stencil_pool(DISPLAY_SIZE_W, DISPLAY_SIZE_H, STENCIL_MEM_POOL_ID);
}
uint32_t nema_reg_read(uint32_t reg)
{
return HAL_GPU2D_ReadRegister(&hgpu2d, reg);
}
void nema_reg_write(uint32_t reg, uint32_t value)
{
HAL_GPU2D_WriteRegister(&hgpu2d, reg, value);
}
int nema_wait_irq(void)
{
/* Wait indefinitely for a free semaphore - baremetal, not implemented */
return 0;
}
int nema_wait_irq_cl(int cl_id)
{
while (last_cl_id < cl_id)
{
(void)nema_wait_irq();
}
return 0;
}
int nema_wait_irq_brk(int brk_id)
{
UNUSED(brk_id);
while (nema_reg_read(GPU2D_BREAKPOINT) == 0U)
{
(void)nema_wait_irq();
}
return 0;
}
void nema_host_free(void *ptr)
{
if (ptr)
{
tsi_free(ptr);
}
}
void *nema_host_malloc(unsigned size)
{
return tsi_malloc(size);
}
nema_buffer_t nema_buffer_create(int size)
{
nema_buffer_t bo;
memset(&bo, 0, sizeof(bo));
bo.base_virt = tsi_malloc(size);
bo.base_phys = (uint32_t)bo.base_virt;
bo.size = size;
assert(bo.base_virt != 0 && "Unable to allocate memory in nema_buffer_create");
return bo;
}
nema_buffer_t nema_buffer_create_pool(int pool, int size)
{
nema_buffer_t bo;
memset(&bo, 0, sizeof(bo));
bo.base_virt = tsi_malloc_pool(pool, size);
bo.base_phys = (uint32_t)bo.base_virt;
bo.size = size;
bo.fd = 0;
assert(bo.base_virt != 0 && "Unable to allocate memory in nema_buffer_create_pool");
return bo;
}
/* Used to select the framebuffer */
void MX_GFXMMU_Select_FB(uint8_t index)
{
GFXMMU_PackingTypeDef pPacking = {0};
hgfxmmu.Instance = GFXMMU;
hgfxmmu.Init.BlockSize = GFXMMU_12BYTE_BLOCKS;
hgfxmmu.Init.DefaultValue = 0;
hgfxmmu.Init.AddressTranslation = DISABLE;
hgfxmmu.Init.Buffers.Buf0Address = (uint32_t)FrameBuffer[index];
hgfxmmu.Init.Buffers.Buf1Address = (uint32_t)FrameBuffer[index] + 0x00200000; /* 2MB ahead 0x00200000 */
hgfxmmu.Init.Buffers.Buf2Address = 0;
hgfxmmu.Init.Buffers.Buf3Address = 0;
hgfxmmu.Init.Interrupts.Activation = DISABLE;
if (HAL_GFXMMU_Init(&hgfxmmu) != HAL_OK)
{
Error_Handler();
}
pPacking.Buffer0Activation = ENABLE;
pPacking.Buffer0Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.Buffer1Activation = ENABLE;
pPacking.Buffer1Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.Buffer2Activation = DISABLE;
pPacking.Buffer2Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.Buffer3Activation = DISABLE;
pPacking.Buffer3Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.DefaultAlpha = 0xFF;
if (HAL_GFXMMU_ConfigPacking(&hgfxmmu, &pPacking) != HAL_OK)
{
Error_Handler();
}
}
/* Used to select the framebuffer */
nema_buffer_t nema_select_framebuffer(int index)
{
nema_buffer_t bo;
memset(&bo, 0, sizeof(bo));
/* STM32H7S7 */
#if (DISPLAY_BYTES_PER_PIXEL == 3)
MX_GFXMMU_Select_FB(index);
bo.base_virt = (void *)GFXMMU_VIRTUAL_BUFFER0_BASE;
#else
bo.base_virt = FrameBuffer[index];
#endif
bo.base_phys = (uint32_t)bo.base_virt;
bo.size = DISPLAY_FRAMEBUFFER_SIZE;
bo.fd = 0; /* Buffer allocated */
return bo;
}
void *nema_buffer_map(nema_buffer_t *bo)
{
return bo->base_virt;
}
void nema_buffer_unmap(nema_buffer_t *bo)
{
UNUSED(bo);
}
void nema_buffer_destroy(nema_buffer_t *bo)
{
if (bo->fd == -1)
{
return; /* Buffer wasn't allocated! */
}
tsi_free(bo->base_virt);
bo->base_virt = (void*)0;
bo->base_phys = 0;
bo->size = 0;
bo->fd = -1; /* Buffer not allocated */
}
uintptr_t nema_buffer_phys(nema_buffer_t *bo)
{
return bo->base_phys;
}
void nema_buffer_flush(nema_buffer_t * bo)
{
UNUSED(bo);
}
int nema_mutex_lock(int mutex_id)
{
UNUSED(mutex_id);
return 0;
}
int nema_mutex_unlock(int mutex_id)
{
UNUSED(mutex_id);
return 0;
}
void platform_disable_cache(void)
{
nema_ext_hold_assert(2, 1);
}
void platform_invalidate_cache(void)
{
nema_ext_hold_assert(3, 1);
}
Note: there's still one little issue I have to figure out. But for now it can be circumvented.
I have multiple framebuffers (for double buffering, backup, pre-rendered assets, and stuff).
When I select buffer 0, which starts at the beginning of PSRAM, I see a series of pixels near the start of the framebuffer lighting up, indicating NemaGFX has probably placed some workspace over there.
This happens after calling nema_vg_init or even nema_vg_init_stencil_prealloc defining a safe place for the stencil buffer.
So when I select buffer 0, and the GPU draws over that part of the framebuffer it is doomed to crash.
Since I have enough space I just shift the framebuffer from 1 and up, leaving the entire framebuffer 0 for some NemaGFX working space. This way it keeps working. I'll try to sort that out later.
Perhaps somebody can guide me here to avoid this.
Despite this little issue, I hope this post helps someone!
What about the graphics speed difference between these two platforms?
Drawing 80x80 pixel, linewidth 10, random position and color rectangles:
STM32U5G9J-DK2 (Cortex-M33 170MHz): 11165 rectangles/sec.
STM32H7S78-DK (Cortex-H7 600MHz): 7320 rectangles/sec (XSPI bus optimally tuned).
So the STM32U5G9J-DK2 is the clear winner if it comes to graphics performance, it's over 50% faster!
This probably is because of the large amount of zero-waitstate internal SRAM which hosts the framebuffer in the low-power M33.
And the framebuffer on the 600MHz Cortex-M7 has to be 32-bits/pixel, while the 170MHz Cortex-M33 operates with a 24bits/pixel framebuffer in this case. So both GPU's need to write different amounts of data to the framebuffer.
This is where we lose performance, especially when we need to use external PSRAM for the framebuffers.
So the 600MHz STM32H7S7L8 MCU can't beat the 170MHz STM32U5G9ZJ MCU as it comes to graphics performance. For the MCU workload, it is of course a lot faster. And you could pre-render assets and blit them, it offers more flexibility regarding the use of external memory.
Cheers, and have fun developing with graphics!
https://www.youtube.com/watch?v=HA2m78PKuPc
STM32 Graphics on STM32H7S78-DK (Cortex-M7 600MHz):
Rectangle 80x80 pixels, line width 10: 7320 rectangles/sec.
https://www.youtube.com/watch?v=XblQPOOXQ34
STM32 Graphics on STM32U5G9J-DK2 (Cortex-M33 170MHz):
Rectangle 80x80 pixels, line width 10: 11165 rectangles/sec.
I have now ordered the STM32N6570-DK because it has sufficient SRAM for double-buffer framebuffers AND external PSRAM.
Performance results will follow soon.
2025-09-24 6:29 AM
Hello @Jack3,
I cant really tell about the performances but for your information the different boards (h7/n6/u5) have some differences on their nema implementation TouchGFX on NeoChrom/NeoChromVG | TouchGFX Documentation.
BR,