2026-04-29 4:20 AM - last edited on 2026-04-29 4:47 AM by Andrew Neil
Hello ST Community,
I am currently developing a coin recognition vision model (MobileNetV4) to be deployed on the NUCLEO-N657X0-Q (STM32N657X0H3Q) board. I am trying to fully utilize the integrated NPU using STM32Cube.AI.
[My Approach] To maximize feature extraction of the coin's engraved patterns, I am using a Photometric Stereo-like approach. I capture 4 grayscale images using a global shutter camera with 4 directional lights (Top, Bottom, Left, Right). My goal is to stack these 4 images into a single 4-channel tensor (H x W x 4) and feed it directly into the NPU. I have already modified the first Conv2d layer of my PyTorch model to accept in_chans=4 and trained it successfully.
[The Issue] When trying to validate and deploy this model, I encountered the following constraints regarding the Neural Network Input:
"Only RGB888 format is tested for Neural Network Input."
"Only UINT8 format is supported."
The UINT8 constraint is perfectly fine for our quantization pipeline. However, the RGB888 (3-channel) restriction seems to block my 4-channel approach.
[My Questions]
Is there any way to bypass the RGB888 constraint and feed a custom 4-channel (H x W x 4) UINT8 tensor into the STM32N6 NPU?
If the tool strictly enforces 3-channel (RGB) specifically for "Image" inputs, is there a workaround? (For example, configuring the input as a "generic generic tensor/array" rather than an image, so the NPU just processes it as standard 4-channel data?)
Alternatively, does the NPU hardware/DMA2D inherently limit input buffers to standard color formats (like ARGB8888 or RGB888)?
Any guidance or workarounds for deploying a 4-channel input model on the STM32N6 NPU would be greatly appreciated.
Thank you in advance!
Solved! Go to Solution.
2026-05-07 6:36 AM - edited 2026-05-07 6:37 AM
Hi @qldrh
Question: if I bypass the standard camera pipeline and manually feed a 4-channel UINT8 buffer directly to the NPU input memory, is that supported at the hardware/runtime level?
Short answer: yes, it should be.
from the NPU point of view, there is no inherent restriction on the number of input channels.
So H×W×3H×W×3, H×W×4H×W×4, and more generally H×W×CH×W×C inputs are supported in principle. This is also consistent with what happens inside the network itself, since intermediate feature maps naturally use many different channel counts.
tips:
https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html
That said, the channel count and the data type / quantization format are two slightly different topics:
UINT8 vs INT8, it can be more subtle depending on:
UINT8 to INT8 conversion step, but as far as I know, that is supported by the hardware.
If you could please link me where you found this, it would be great, thanks
"- The constraint I mentioned — "Only RGB888 format is tested for Neural Network Input" and "Only UINT8 format is supported" — was from the STM32N6 documentation / NPU_Validation firmware reference, not a direct error from the ST Edge AI Core conversion step itself."
Have a good day,
Julian
2026-05-04 7:22 AM
Hi @qldrh,
Which package are you using?
I believe the error messages you get are not errors from the ST Edge AI Core, but from an application we provide, right?
Have a good day,
Julian
2026-05-06 9:45 PM
Hi Julian,
To clarify my setup:
-I am using STM32Cube.AI Studio (local installation) for model
conversion, validation, and quantization.
- The constraint I mentioned — "Only RGB888 format is tested for Neural Network Input" and "Only UINT8 format is supported" — was from the STM32N6 documentation / NPU_Validation firmware reference, not a direct error from the ST Edge AI Core conversion step itself.
The ST Edge AI Core conversion of my 4-channel ONNX model actually completes successfully. My concern was whether the NPU runtime on the STM32N6 hardware can actually accept and process a 4-channel (H×W×4) UINT8 tensor at inference time, given that the documented pipeline seems to assume RGB888 as the standard input format.
So my main question is: if I bypass the standard camera pipeline and manually feed a 4-channel UINT8 buffer directly to the NPU input memory, is that supported at the hardware/runtime level?
Thanks again for your help!
2026-05-07 6:36 AM - edited 2026-05-07 6:37 AM
Hi @qldrh
Question: if I bypass the standard camera pipeline and manually feed a 4-channel UINT8 buffer directly to the NPU input memory, is that supported at the hardware/runtime level?
Short answer: yes, it should be.
from the NPU point of view, there is no inherent restriction on the number of input channels.
So H×W×3H×W×3, H×W×4H×W×4, and more generally H×W×CH×W×C inputs are supported in principle. This is also consistent with what happens inside the network itself, since intermediate feature maps naturally use many different channel counts.
tips:
https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html
That said, the channel count and the data type / quantization format are two slightly different topics:
UINT8 vs INT8, it can be more subtle depending on:
UINT8 to INT8 conversion step, but as far as I know, that is supported by the hardware.
If you could please link me where you found this, it would be great, thanks
"- The constraint I mentioned — "Only RGB888 format is tested for Neural Network Input" and "Only UINT8 format is supported" — was from the STM32N6 documentation / NPU_Validation firmware reference, not a direct error from the ST Edge AI Core conversion step itself."
Have a good day,
Julian