cancel
Showing results for 
Search instead for 
Did you mean: 

Validation missmatch on C Model

M_Benz
Associate II

Hey Guys,

I am currently in the process of running an object detection model on the N6. To make it accelerated on the NPU I quantize it during Training (QAT) in pytorch. I made some adjustments to the exported QAT model in onnx to make it run on the n6 NPU (I compared the original onnx output before and after my transformations and they are the same).
When i try to validate my Model on the Desktop (MCU) however i get completely different output for the c model:

M_Benz_0-1769088326906.png

This is a part of the onnx model in netron

M_Benz_1-1769088444997.png

The input is quantized and the output not
What i observed is when i remove the layer after the Conv->Transpose->Reshape the Output of the C model somehow matches the onnx model:

M_Benz_3-1769089693714.png

M_Benz_2-1769088536709.png

Why is this the case? Is there any problem with the slicing and reshape layer because St has channel last and not first?
Thanks for the help,
If you need more context let me know

5 REPLIES 5
Julian E.
ST Employee

Hi @M_Benz,

 

This is indeed very strange.

Could you share both models?

 

Are you validating the model with random data? or are you using a dataset to get these results?

 

I will try to replicate it.

At first glance there is maybe a bug in the compiler caused by your last model part...

 

Have a good day,

Julian 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hey,

I don't have exaclty the same model anymore but the two with the same structure.
I validated both of them again and the still show the same problem.
For a quick context I am trying to get a yolo based architecture to run on the n6, I use QAT in pytorch and export to onnx (This is a whole other problem where i also get missmatches in the output). Both models were not really trained only exported at epoch 0 for testing so some Observers might not be calibrated perfectly but I also had this problem with acutally QAT trained models. Removing the last "normal" layers seems to fix it. The one model i attached to this post with the last layers also has inf values probably due to the random input and exp for box decoding. But i also validated the outputs ignoring the infs values, but cos sim is still bad, as well as all the other metrics.

I zipped the models since i am not allowed to attach .onnx files

Thanks for looking into it.

Have a good day,

Mike

I mostly validated with random data, but I have once used a one represantive img which the same problems.

I am not sure if it could be a problem with the dimensions, since onnx is channel, and then the slicing and reshaping might behave differently

M_Benz
Associate II

Thats what i thought, the onnx model is channel first while the C model while be channel last, maybe thats the problem and the compiler does not change the following reshape / slice layers to match the channel last layout.
But this would mean it is a issue with the Compiler, not with Cube AI runtime. So I guess i should just stick to doing the bbox decoding myself and not inside the model.

So should i expect this the stay like this or will this be fixed in the compiler. If not i guess this issue is closed

Hello @M_Benz,

 

Maybe you are right, or it may be comes from the exp node creating a huge difference.

 

we are looking at it internally; its resolution will depend on the priority.

If you feel like it is very important for your company project, please contact your local FAE.

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.