2025-06-28 6:36 AM
Hello, we encountered the following issues when deploying a binary classification model using the ll library. Do you have any solutions?
1. The output data of the AI model deployed according to the example (without quantization) does not match the data calculated locally on the PC (before softmax).
2. We compared the model calculation results on the PC with those on the STM32N6 layer by layer and found that the mismatch in the calculation results before and after reshape (in the underlying code module of DMA to NPU) led to the mismatch in the final output results. The calculation results of other layers such as conv, maxpool, and relu were all correct.
3. How should we solve the model deployment?
2025-06-30 1:47 AM
Hello @Z-YF,
The C model and python models are not bit to bit exact, so you may see differences. The real concern is to determine if these differences have an impact on the performance of the model or not.
In the validation report, do you get a high COS (>0.99)? If so, your model should behave as expected (like the python model). If not, it may be a bug on our side.
If you get a bad COS, please share the model with us.
Also, just in case you don't know, the NPU is only capable of running int8 operations, so, while using the non quantized model, most of your epochs are probably in SW.
As I explained, there is a difference of results between the python model and the C (MCU) model.
But note that, because the code that runs on the NPU is not the same as the one that run on the MCU (HW and SW epochs), you may also have differences later, when deploying the quantized model. Again, please look at the COS to see if these differences are impactful or not.
Have a good day,
Julian