Question about RSQRT implementation on STM32N6 Neural-ART NPU

retertert · ‎2026-02-18

Hello ST Support Team,
I am using STM32N6 with Neural-ART NPU (ST Edge AI 3.0) and comparing inference outputs between:

1. TFLite on PC, and
2. the converted model running on STM32N6 NPU.

Could you please clarify how `rsqrt` is implemented on STM32N6 NPU?

I would like to know:

1. Is `rsqrt` a native HW operator, or is it decomposed (for example into `sqrt + reciprocal`)?
2. What approximation method is used internally (LUT / polynomial / Newton-Raphson / other)?
3. What numeric format is used in the HW path (fixed-point details, precision)?
4. What rounding and saturation rules are applied?
5. Are there documented error bounds or expected max/mean error vs float reference?
6. Is there any compiler/runtime option to select a more accurate vs faster mode for this operation?

My model uses INT8 I/O and the same input tensor on both platforms.

Thank you.

Julian E. · ‎2026-03-06

HI @retertert,

RSQRT is not supported on NPU:

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html

It seems to be supported only for TFLITE models (not onnx):

https://stedgeai-dc.st.com/assets/embedded-docs/supported_ops_tflite.html#rsqrt

I asked for more details on its implementation

regarding your two last points, we don't provide such details for layers and there is no way to custom the behavior of a particular layer. For target without NPU, there is the possibility to "optimize" the model in such way, but the whole model, not just some layers.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.