cancel
Showing results for 
Search instead for 
Did you mean: 

Use a 4-channel (H x W x 4) input for NPU acceleration on STM32N6 (STM32Cube.AI)?

qldrh
Associate II

Hello ST Community,

I am currently developing a coin recognition vision model (MobileNetV4) to be deployed on the NUCLEO-N657X0-Q (STM32N657X0H3Q) board. I am trying to fully utilize the integrated NPU using STM32Cube.AI.

[My Approach] To maximize feature extraction of the coin's engraved patterns, I am using a Photometric Stereo-like approach. I capture 4 grayscale images using a global shutter camera with 4 directional lights (Top, Bottom, Left, Right). My goal is to stack these 4 images into a single 4-channel tensor (H x W x 4) and feed it directly into the NPU. I have already modified the first Conv2d layer of my PyTorch model to accept in_chans=4 and trained it successfully.

[The Issue] When trying to validate and deploy this model, I encountered the following constraints regarding the Neural Network Input:

  • "Only RGB888 format is tested for Neural Network Input."

  • "Only UINT8 format is supported."

The UINT8 constraint is perfectly fine for our quantization pipeline. However, the RGB888 (3-channel) restriction seems to block my 4-channel approach.

[My Questions]

  1. Is there any way to bypass the RGB888 constraint and feed a custom 4-channel (H x W x 4) UINT8 tensor into the STM32N6 NPU?

  2. If the tool strictly enforces 3-channel (RGB) specifically for "Image" inputs, is there a workaround? (For example, configuring the input as a "generic generic tensor/array" rather than an image, so the NPU just processes it as standard 4-channel data?)

  3. Alternatively, does the NPU hardware/DMA2D inherently limit input buffers to standard color formats (like ARGB8888 or RGB888)?

Any guidance or workarounds for deploying a 4-channel input model on the STM32N6 NPU would be greatly appreciated.

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions

Hi @qldrh 

 

Question: if I bypass the standard camera pipeline and manually feed a 4-channel UINT8 buffer directly to the NPU input memory, is that supported at the hardware/runtime level?

Short answer: yes, it should be.

 

from the NPU point of view, there is no inherent restriction on the number of input channels.
So H×W×3H×W×4, and more generally H×W×C inputs are supported in principle. This is also consistent with what happens inside the network itself, since intermediate feature maps naturally use many different channel counts.

 

tips:

JulianE_0-1778160801811.png

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html

 

That said, the channel count and the data type / quantization format are two slightly different topics:

  • On the channel dimension, the NPU should not be limited to RGB-only input.
  • On UINT8 vs INT8, it can be more subtle depending on:
    • the original model format
    • how the model was quantized
    • whether the source model comes from TFLite or ONNX
    • the exact input tensor expectations after conversion

 There may still be a UINT8 to INT8 conversion step, but as far as I know, that is supported by the hardware.

 

If you could please link me where you found this, it would be great, thanks
"- The constraint I mentioned — "Only RGB888 format is tested for Neural Network Input" and "Only UINT8 format is supported" — was from the STM32N6 documentation / NPU_Validation firmware reference, not a direct error from the ST Edge AI Core conversion step itself."

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

3 REPLIES 3
Julian E.
ST Employee

Hi @qldrh,

 

Which package are you using? 

I believe the error messages you get are not errors from the ST Edge AI Core, but from an application we provide, right?

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hi Julian,

To clarify my setup:
-I am using STM32Cube.AI Studio (local installation) for model
conversion, validation, and quantization.
- The constraint I mentioned — "Only RGB888 format is tested for Neural Network Input" and "Only UINT8 format is supported" — was from the STM32N6 documentation / NPU_Validation firmware reference, not a direct error from the ST Edge AI Core conversion step itself.

The ST Edge AI Core conversion of my 4-channel ONNX model actually completes successfully. My concern was whether the NPU runtime on the STM32N6 hardware can actually accept and process a 4-channel (H×W×4) UINT8 tensor at inference time, given that the documented pipeline seems to assume RGB888 as the standard input format.

So my main question is: if I bypass the standard camera pipeline and manually feed a 4-channel UINT8 buffer directly to the NPU input memory, is that supported at the hardware/runtime level?

Thanks again for your help!

Hi @qldrh 

 

Question: if I bypass the standard camera pipeline and manually feed a 4-channel UINT8 buffer directly to the NPU input memory, is that supported at the hardware/runtime level?

Short answer: yes, it should be.

 

from the NPU point of view, there is no inherent restriction on the number of input channels.
So H×W×3H×W×4, and more generally H×W×C inputs are supported in principle. This is also consistent with what happens inside the network itself, since intermediate feature maps naturally use many different channel counts.

 

tips:

JulianE_0-1778160801811.png

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html

 

That said, the channel count and the data type / quantization format are two slightly different topics:

  • On the channel dimension, the NPU should not be limited to RGB-only input.
  • On UINT8 vs INT8, it can be more subtle depending on:
    • the original model format
    • how the model was quantized
    • whether the source model comes from TFLite or ONNX
    • the exact input tensor expectations after conversion

 There may still be a UINT8 to INT8 conversion step, but as far as I know, that is supported by the hardware.

 

If you could please link me where you found this, it would be great, thanks
"- The constraint I mentioned — "Only RGB888 format is tested for Neural Network Input" and "Only UINT8 format is supported" — was from the STM32N6 documentation / NPU_Validation firmware reference, not a direct error from the ST Edge AI Core conversion step itself."

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.