2025-09-21 4:27 AM - edited 2025-10-03 1:26 AM
Hi, I have been using NemaGFX on STM32U5G9J-DK2 kit with no problems however my luck with the STM32H7S78-DK kit is different.
1. The NemaGFX for this platform only seems to work with 32-bit framebuffers, not 24-bit ones. Or am I doing something wrong?
2. The NemaGFX for this platform uses the beginning of the framebuffer as a sort of workspace. We see random dots at the beginning. I think I'm having trouble allocating buffers correctly.
3. I can draw filled rectangles, but filled circles with 'nema_vg_draw_circle()' only show the edge slightly and produce small images in a sort of workspace at the beginning of the frame buffer.
nema_vg_set_fill_rule(NEMA_VG_FILL_NON_ZERO); does not fill the shape, as it should.
Well, 'nema_vg_draw_rect()' works fine with nema_vg_set_fill_rule(NEMA_VG_FILL_NON_ZERO).
4. Many more graphics functions don't work properly on the STM32H7S78, but they all work fine on the STM32U5G9J, which one uses 24-bit framebuffers, while the STM32H7S78 can only work with 32-bit framebuffers, it seems. This could be a reason.
nema_vg_set_fill_rule(NEMA_VG_FILL_EVEN_ODD); It does fill the circle, but there still are the four small images in the frame buffer displayed in white.
What am I doing wrong?
So I use the next framebuffer, and I don't have the random pixels at the top, but it doesn't solve drawing a filled circle, rounded filled rectangle, and a lot of other functions don't work properly.
Note the filled rectangle with no rounded corners works, but a rounded filled rectangle only shows the corners somewhat, it has the same problem as the filled circle:
nema_vg_set_fill_rule(NEMA_VG_FILL_NON_ZERO);
nema_vg_set_fill_rule(NEMA_VG_FILL_EVEN_ODD); (this is not the solution!)
What did I miss?
I have been careful with caching, I think, the GPU2D works from memory, not cache.
When I run nema_stencil_init(), a row of white dots appears at the beginning of the framebuffer.
This happens if this function calls one of these:
- nema_vg_init()
- nema_vg_init_stencil_pool(DISPLAY_SIZE_W, DISPLAY_SIZE_H, 1);
- nema_vg_init_stencil_prealloc(fbo.w, fbo.h, stencil_bo);
So something is definitely wrong. Even when nema_vg_init() is executed, this seems to happen, regardless of where my framebuffer is located.
What can I to better?
Thanks for any help, it is much appreciated!
2025-09-23 2:28 AM
Hi @Jack3
Thorough work, and interesting details.
First and foremost I can clear up that the GPU2D in the STM32H7R/S does not handle 24bpp natively. This applies for both input and output (framebuffer). What I see people do is either choose 16bpp or a full 32bpp framebuffer. But there is a third option that uses another IP of the STM32H7R/S called the GFXMMU. It can be configured in "packing mode". In this mode the Virtual buffers of the GFXMMU can expose an 24bpp framebuffer as if it was a 32bpp framebuffer. So while the GPU2D in the STM32H7R/S does not handle 24bpp, the overall system does. If you want to get a fast start using the gfxmmu in packing mode I suggest to start from the TouchGFX designer. It has a Template TBS called "STM32H7S78 DK 24 BPP". Have a look in MX_GFXMMU_INIT(), the most important stuff here is the BlockSize, BufXAddress and Packing:
static void MX_GFXMMU_Init(void)
{
GFXMMU_PackingTypeDef pPacking = {0};
hgfxmmu.Instance = GFXMMU;
hgfxmmu.Init.BlockSize = GFXMMU_12BYTE_BLOCKS;
hgfxmmu.Init.DefaultValue = 0;
hgfxmmu.Init.AddressTranslation = DISABLE;
hgfxmmu.Init.Buffers.Buf0Address = 0x90000000;
hgfxmmu.Init.Buffers.Buf1Address = 0x90200000;
hgfxmmu.Init.Buffers.Buf2Address = 0;
hgfxmmu.Init.Buffers.Buf3Address = 0;
hgfxmmu.Init.Interrupts.Activation = DISABLE;
if (HAL_GFXMMU_Init(&hgfxmmu) != HAL_OK)
{
Error_Handler();
}
pPacking.Buffer0Activation = ENABLE;
pPacking.Buffer0Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.Buffer1Activation = ENABLE;
pPacking.Buffer1Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.Buffer2Activation = DISABLE;
pPacking.Buffer2Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.Buffer3Activation = DISABLE;
pPacking.Buffer3Mode = GFXMMU_PACKING_MSB_REMOVE;
pPacking.DefaultAlpha = 0xFF;
if (HAL_GFXMMU_ConfigPacking(&hgfxmmu, &pPacking) != HAL_OK)
{
Error_Handler();
}
}
The second thing that I can point out is that the STM32H7R/S has a texture cache that is a little different than on STM32U5G9. In STM32H7R/S the texture cache is implemented with the ICACHE instance. It is correct that the GPU2D works with memory but for speeding up reading source images from external Flash the Texture cache (ICACHE) boost performance. This is great but for drawing Vector Graphics the GPU2D in some operations uses the Stencil buffer to do intermediate calculations. Without proper cache maintenance the ICACHE end up with stale data when reading back from the stencil buffer. All the Template TBS projects in TouchGFX designer has this setup, I suggest to have a look in one of these. The highlights of the maintenance is:
In nema_hal.c
void HAL_GPU2D_ErrorCallback(GPU2D_HandleTypeDef *hgpu2d)
{
uint32_t val = nema_reg_read(GPU2D_SYS_INTERRUPT); /* clear the ER interrupt */
nema_reg_write(GPU2D_SYS_INTERRUPT, val);
/* external GPU2D cache maintenance */
if (val & (1UL << 2))
{
HAL_ICACHE_Disable();
nema_ext_hold_deassert_imm(2);
}
if (val & (1UL << 3))
{
HAL_ICACHE_Enable();
HAL_ICACHE_Invalidate();
nema_ext_hold_deassert_imm(3);
}
return;
}
void platform_disable_cache(void)
{
nema_ext_hold_assert(2, 1);
}
void platform_invalidate_cache(void)
{
nema_ext_hold_assert(3, 1);
}
platform_disable_cache() and platform_invalidate_cache() are called from inside the nemagfx library.
In the TouchGFXConfiguration.cpp (or any other init section) we need to enable the user configurable interrupts from the GPU2D. Look for nema_ext_hold_xxx :
void touchgfx_components_init()
{
nema_init();
nema_reg_write(0xFFC, 0x7E); /* Enable bus error interrupts */
nema_vg_init_stencil_pool(800, 480, 1);
nema_vg_handle_large_coords(1, 1);
nema_ext_hold_enable(2);
nema_ext_hold_irq_enable(2);
nema_ext_hold_enable(3);
nema_ext_hold_irq_enable(3);
}
Regarding memory allocation and framebuffer location, this is similar to the STM32U5G9. In short the nemagfx library needs to have a small amount of memory to allocate from. This memory is always drawn from memory pool 0 if not specifically specified. This means that the implementation of nema_buffer_create_pool() (in nema_hal.c) needs to allow for multiple allocations from pool 0. One way of solving this can be seen in the nema_hal.c file generated by the STM32H7S78 TouchGFX TBS. Pool zero does typically not need to be especially big if the Stencil buffer is allocated from a different pool.
Stencil buffer creation is done as part of the call to nema_vg_init(), instead of this we can control which pool this should be allocated from like this:
nema_vg_init_stencil_pool(800, 480, 1). Each pool can be place in specific locations in memory with the help of the linker or with a manually maintained memory map. Notice that performance can be affected if the framebuffers and stencil buffer are not 8bytes aligned, again this can also be seen in the TBS for the STM32H7S78.
I hope it helps you further.
Regards,
2025-09-23 10:36 AM - edited 2025-10-03 1:24 AM
Hi Jakob,
Thank you! This is very helpful information!
This gives me the opportunity to learn more about GPU2D/NemaGFX on the STM32H7S78.
I was confident I would get it working with your help!
However, I use it baremetal, without RTOS, so there's just that one small glitch in my project that prevents it from working.
It probably has to do with some of those things:
1. Initializing the GPU properly.
2. Initalizing "nemagfx_pool_mem", how to do this correctly on a baremetal solution.
3. Initalizing "nemagfx_ring_buffer_mem", how to do this correctly on a baremetal solution.
4. Telling NemaGFX to use a small memory space *outside* the framebuffer. It now produces ghost images.
Observations:
1. ICHACHE, it seems not to work yet.
2. I called these, in this particular order:
nema_init();
nema_reg_write(0xFFC, 0x7E); /* Enable bus error interrupts */
nema_vg_init_stencil_pool(800, 480, 1);
nema_vg_handle_large_coords(1, 1);
nema_ext_hold_enable(2);
nema_ext_hold_irq_enable(2);
nema_ext_hold_enable(3);
nema_ext_hold_irq_enable(3);
nema_sys_init();
Note:
"GPU2D_CommandListCpltCallback" is in nema_hal.c is called.
"HAL_GPU2D_ErrorCallback" is never called.
3. In the end, I had two buffers with NEMAGFX_MEM_POOL_SIZE: nemagfx_pool_mem and nemagfx_ring_buffer_mem. So I had to modify the linker file to accommodate for both. That doesn't seem right.
What's the correct way to initialize both in a baremetal project?
This is what I changed:
1. I forgot to include the HAL_ICACHE_MODULE (ICHACHE_GPU2D) on the STM32H7S78!
So I've added the ICHACHE_GPU2D functionality in the project.
2. I updated my nema_hal to use the ICACHE module.
3. I called the functions in "touchgfx_components_init()", enabling the required interrupts.
I noticed "HAL_GPU2D_ErrorCallback" never fires, so the ICACHE functionality probably don't work, yet.
I also tried to call it from the GPU2D_IRQHandler. Then it triggers, but the graphic results didn't change.
4. I've also added the GFXMMU functionality in the project to support 24-bit framebuffers, and modfied "MX_LTDC_Init" to use the GFXMMU base address as the LTDC start address.
Despite this, in 24-bit mode, vertical lines were displayed over the full width with no GPU2D output seen.
I would be able to get this working, but only once the GPU is up and running. First things first, I think.
5. "nema_buffer_create_pool" has been made to work and effectively selects a framebuffer to draw to, which works. It no longer hangs.
I don't quite understand initializing the stencil buffer and the concept of POOL_IDs.
And I don't understand the part: "Regarding memory allocation and framebuffer location, this is similar to the STM32U5G9. In short the nemagfx library needs to have a small amount of memory to allocate from."
I think the frame buffer is rather quite large (800x480x4 bytes), so it resides in the external RAM of the STM32H7S78. The GPU can apparently access it, as it can draw some shapes. So I assume that's configured correctly.
But I think it's a small pool of working space that really needs to be placed outside the framebuffer.
The unwanted ghosting seems to be related to borrowing it from the actual framebuffer for no apparent reason.
It's this part of the initializing that I don't understand. I need to find a way to specify another place for this workspace, not inside the framebuffer. I can't get this done, yet.
So we I can render some things (rectangles with sharp corners, for example), but a lot of things aren't working yet, like filled circles or filled rounded rectangles.
With stroked versions of these, they work surprisingly well, like before. I just wanted to note this again.
nema_vg_set_fill_rule(NEMA_VG_STROKE);
nema_vg_set_fill_rule(NEMA_VG_FILL_EVEN_ODD);
Note: Ghost images at the top: NemaGFX uses workspace inside the framebuffer!
The shown ghost images only reflect the rendering of the two right primitives.
nema_vg_set_fill_rule(NEMA_VG_FILL_NON_ZERO);
Note: Ghost images at the top: NemaGFX uses workspace inside the framebuffer!
Remember, the in the above picture shown shown ghost images reflected the two right primitive.
They no longer appear in the image below, using NEMA_VG_FILL_NON_ZERO.
After this, I start drawing lots of random colored stroked rectangles at random positions 80x80, stroke 10.
STM32H7S7L8 600MHz 32BPP: 7120 rectangles/sec.
STM32U5G9ZJ 170MHz 24BPP: 11200 rectangles/sec.
My reworked nema_hal.
I briefly thought TouchGFX might be useful for initializing and then using NemaGFX to create my own graphical widgets.
So, I created a project with 4.26.0 that displays a button.
I looked at the generated code, but it doesn't contain any NemaGFX functions, so it might not be using GPU2D.
2025-09-25 2:50 AM
Hi @Jack3
The TouchGFX application you created uses GPU2D, but the calls to the nema library are part of the touchgfx libraries. But the initialization of GPU2D, GFXMMU, LTDC etc are all part of your generated application.
As I see it there are three topics in this:
1) Setting up memory pools so the nemagfx library can allocate memory dynamically.
2) Setting up cache management when drawing vector graphics.
3) GFXMMU config
For 1) The concept of pool IDs is designed to have a way to direct memory allocation to different memory types depending on what the platform can provide. On the STM32H7 one will often have the framebuffer in external memory but try to have the stencil buffer in internal memory.
Memory wise we often have these categories of memory needs:
Command list and ring buffer memory - This needs to be your fastest memory seen from the GPU2D. It is the easiest if you make this pool ID 0. The size of the pool is typically around 16kB. If you allocate too much make sure to have an assertion or similar if you run our of memory (see nema_buffer_create_pool() in the generated application).
Stencil buffer - This pool needs to be the size of you bound output destination, in you case it will be one byte per pixel in your framebuffer. Having this as a separate makes it possible to move it around if your applications grows. Ideally you will need the stencil buffer in internal SRAM if possible, but it will also work in external memory it will just have a performance penalty. speaking about performance the stencilbuffer likes to be 8 bytes aligned in memory. this makes the reads/writes more effective.
Frame buffer - It is optional if you want to allocate the framebuffers in the same way as the two other memory categories. The TouchGFX implementation chose to allocate the framebuffer without the use of nema_buffer_create_pool(), but then you need another way of making sure it is 8 bytes alligned
The initialization of the pools are happening in nema_sys_init(). You already have this in the your nema_hal.c. Try to debug the inputs to the calls to tsi_malloc_init_pool_aligned(). In your current implementation every attempt to allocate from pool zero will return the address of FrameBuffer[0]. This will not work, multiple variable s needed by the nemagfx lib will use the same memory. I recommend using the tsi_malloc solution, but it is entirely up to you, it is only a memory allocator.
For 2) Setting up the ICACHE (Texture cache) is of course a step in the right direction. When the ICACHE is enabled the cache maintenance is needed. On the other side if the ICACHE is disabled it is not important. So if your initial setup did not configure the texture cache then you have not seen these problems yet. When texture cache is enabled and you are drawing with eg. nema_vg_draw_ring() you should see calls to:
void platform_disable_cache(void)
{
nema_ext_hold_assert(2, 1);
}
void platform_invalidate_cache(void)
{
nema_ext_hold_assert(3, 1);
}
It seems from your code that the nema_ext_hold signals are enabled. try to set breakpoints in these function and excise the nema_vg_draw_xxx calls. If you are still seeing the breakpoints being hit, then check if your application is linking the correct nemagfx library. The STM32H7S7 is a Cortex M7 so the lib you need will be in this folder:
\YOUR_STM32H7S78_TOUCHGFX_APP\Appli\Middlewares\ST\touchgfx_components\gpu2d\NemaGFX\lib\core\cortex_m7\
If you are running another compiler than the TBS, you can find the full set of libs and headers here:
C:\TouchGFX\4.26.0\touchgfx_components\gpu2d\NemaGFX_NemaP_m7_r01\
For 3) The initialization of the GFXMMU I mentioned previously only links some physical memory to be mapped by the GFXMMU virtual buffers. To access the 32bpp version of the framebuffers you should go through the memory mapped addresses, GFXMMU_VIRTUAL_BUFFER0_BASE and GFXMMU_VIRTUAL_BUFFER1_BASE.
So in code you will have something like:
nema_bind_dst_tex(GFXMMU_VIRTUAL_BUFFER0_BASE , 800, 480, NEMA_RGBA8888, -1);
If you are rendering to framebuffer 0.
You do not need to have the LTDC access the framebuffer through the virtual buffers, but of course make sure the LTDC reads the physical memory as 24BPP like you did on STM32U5G9.
I hope it makes you go further. In any circumstance you have a working template in the TouchGFX application.
Regards,
2025-09-25 9:29 AM - edited 2025-10-03 4:25 AM
Hi Jakob, thank you!
I addressed above issues. Though, the rendering problems with the shapes remain the same.
Ruled out:
- The framebuffer depth.
- The version of NemaGFX.
- The initialization of NemaGFX.
What changed:
1) Corrected set up of memory pools for the nemagfx library.
2) Configured cache management when drawing vector graphics.
3) Tested 24-bit framebuffers using GFXMMU (shows no difference).
Note:
- NEMAGFX_STENCIL_POOL_SIZE = 389120, which is 800*480 +5120.
Using 384000 does not work. Why are the extra 5120 bytes required to let the GPU work?
In this example, the last two functions are not renedering correctly. However, there are more functions that expose renderings issues.
nema_vg_set_fill_rule(NEMA_VG_FILL_NON_ZERO);
nema_vg_draw_rect(80, 140, 200, 200, NemaGFX.m, NemaGFX.paint); /* Works */
nema_vg_draw_circle(400, 240, 100, NemaGFX.m, NemaGFX.paint); /* Fails*/
nema_vg_draw_rounded_rect(520, 140, 200, 200, 40, 40, NemaGFX.m, NemaGFX.paint); /* Fails*/
The difference between the STM32H7S7 and the STM32U5G9:
The STM32H7S7 (Cotex-M7) reveals rendering issues in the 2nd and 3rd shape:
The STM32U5G9 (Cotex-M33) works flawless:
Due to the failing GPU2D/NemaGFX on the STM32H7RS (Cotex-M7), we will instead use the STM32U5G9J (Cotex-M33) which works flawless, or the STM32N6 (Cotex-M55) after successful tests.
And for those interested, there's the revised nema_hal.c, linker file, and ioc file.
Feedback is always welcome!
If I find a solution for the STM32H7RS, I'll be sure to post an update on the cause of the problem.
It's probably my mistake, though I have no issues with the STM32U5G9J.