cancel
Showing results for 
Search instead for 
Did you mean: 

Use a 4-channel (H x W x 4) input for NPU acceleration on STM32N6 (STM32Cube.AI)?

qldrh
Associate

Hello ST Community,

I am currently developing a coin recognition vision model (MobileNetV4) to be deployed on the NUCLEO-N657X0-Q (STM32N657X0H3Q) board. I am trying to fully utilize the integrated NPU using STM32Cube.AI.

[My Approach] To maximize feature extraction of the coin's engraved patterns, I am using a Photometric Stereo-like approach. I capture 4 grayscale images using a global shutter camera with 4 directional lights (Top, Bottom, Left, Right). My goal is to stack these 4 images into a single 4-channel tensor (H x W x 4) and feed it directly into the NPU. I have already modified the first Conv2d layer of my PyTorch model to accept in_chans=4 and trained it successfully.

[The Issue] When trying to validate and deploy this model, I encountered the following constraints regarding the Neural Network Input:

  • "Only RGB888 format is tested for Neural Network Input."

  • "Only UINT8 format is supported."

The UINT8 constraint is perfectly fine for our quantization pipeline. However, the RGB888 (3-channel) restriction seems to block my 4-channel approach.

[My Questions]

  1. Is there any way to bypass the RGB888 constraint and feed a custom 4-channel (H x W x 4) UINT8 tensor into the STM32N6 NPU?

  2. If the tool strictly enforces 3-channel (RGB) specifically for "Image" inputs, is there a workaround? (For example, configuring the input as a "generic generic tensor/array" rather than an image, so the NPU just processes it as standard 4-channel data?)

  3. Alternatively, does the NPU hardware/DMA2D inherently limit input buffers to standard color formats (like ARGB8888 or RGB888)?

Any guidance or workarounds for deploying a 4-channel input model on the STM32N6 NPU would be greatly appreciated.

Thank you in advance!

0 REPLIES 0