STM32N6 : CubeAI ?? Epoch Issue and Why PReLU Runs in Software After Quantization

qiqi · ‎2025-06-23

Hello everyone,

I am using CubeAI 10.1.0 and STAGEAI 2.1 to analyze my model for STM32N6, and I encountered an issue where some epochs show ?? instead of the expected results. Here's the log:

epoch ID HW/SW/EC Operation (SW only)
epoch 1 EC
epoch 2 EC
epoch 3 -SW- (DequantizeLinear)
epoch 4 -SW- (PRelu)
epoch 5 -SW- (QuantizeLinear)
epoch 6 -SW- (MaxPool)
epoch 7 EC
epoch 8 EC
epoch 9 -SW- (DequantizeLinear)
epoch 10 -SW- (PRelu)
epoch 11 -SW- (QuantizeLinear)
epoch 12 EC
epoch 13 EC
epoch 14 -SW- (DequantizeLinear)
epoch 15 -SW- (PRelu)
epoch 16 EC
epoch 17 -SW- (Conv)
epoch 18 -SW- (Add)
epoch 19 EC
epoch 20 -SW- (Conv)
epoch 21 -SW- (Add)
epoch 22 ??
epoch 23 -SW- (Add)
epoch 24 ??
epoch 25 -SW- (Add)
epoch 26 EC

In epoch 22 and epoch 24, the result is shown as ??, and I couldn't retrieve any computation results. I have a few questions:

1. What does ?? mean?

Does the ?? represent that some operators or operations failed to execute during these epochs? Does it imply that those operators are not supported on STM32N6, or could it be due to hardware resource limitations?

2. Will this affect model results?

If epoch shows ??, will it impact the final recognition or inference accuracy of the model? Should I be concerned that this issue may lead to unreliable results from the model?

3. Why is the PReLU operator still executed in software after quantization?

The official documentation mentions that PReLU is supported on STM32N6, but after model quantization, the computation for PReLU is still executed in software rather than on the hardware. Why is that? Is it due to hardware limitations, or is STM32N6's hardware acceleration for this operator not fully optimized? Is there any other reason why PReLU still runs in software?

4. How can I optimize the model to avoid these issues?

If these issues occur, are there any recommended optimization methods or adjustment strategies to address them and ensure that the model runs smoothly and gives accurate results? Should I consider simplifying the model or replacing PReLU with another activation function to avoid the operator being executed in software?

Thank you in advance for your help and suggestions!

Julian E. · ‎2025-06-25

Hello @qiqi,

So, the PReLU being in software is a bug of the CLI front end. It is supported by the aton compiler.

The bug is fixed and will be part of the next version (2.2) planned for beginning/mid July.

Concerning the ?? bug, I opened an internal ticket, and I will update you.

Until I know more, I would suggest either not to use the option causing the issue or to use the validate on target with and without the option to see the difference and make sure the results are correct.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

Julian E. · ‎2025-06-24

Hello @qiqi,

Could you please share your model in a .zip file?

Concerning the PReLU, it is indeed supported. As for why it is not used in SW epoch it could be because the compiler decided that it is faster to do it in SW. I will look at it with more detail if you share your model.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

qiqi · ‎2025-06-24

Dear Julian,

Thank you so much for your help! I have packed the models into a .zip file and attached it for your review. The zip file contains three models: mobilefacenet.onnx, ONet.onnx, and RNet.onnx, all of which are quantized models. During the analysis, both ONet.onnx and RNet.onnx showed ?? epochs. Could you kindly take a look and help identify any issues and suggest possible solutions?

Additionally, if you don't mind, I would like to ask you one more question. The mobilefacenet.onnx feature extraction model has relatively large parameters, and the analysis shows a total of 164 epochs, of which 111 are implemented in software. Through empirical testing, the inference time is around 100ms, which I feel is a bit long. Is there a way to move more epochs to hardware execution instead of software?

Furthermore, the model’s activations are 3.062 MB, and apart from npuRAM3, npuRAM4, npuRAM5, and npuRAM6, it must occupy some space in hyperRAM. According to the official documentation I reviewed, this might affect the inference speed. Is that the case? If so, can it be optimized by adjusting the options in the user_neuralart.json file?

Apologies for all the questions, and I really appreciate your help in answering them and optimizing the model.

Thanks again for your support, and I look forward to your reply!

Best regards,
QiQi

Julian E. · ‎2025-06-24

Hello @qiqi,

Thank you for the models, I will first take a look at this ?? issue.

Regarding optimization, if the activations do not fit into internal RAM, then, it will indeed have a big impact on the inference time. The weights are in external flash, but because they are read one time when needed, the impact is not heavy. For activations however, multiple read and writes will require to access external memory, inducing this augmentation of inference time.

I will take a look with my colleague to see if we can provide you with some tips to help you.

In the meantime, you can look at this piece of information, if you have not already seen it:
https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_neural_art_compiler.html#tips-variations-around-the-basic-use-case

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

qiqi · ‎2025-06-24

Dear Julian,

Thank you for your prompt and helpful response! I will carefully review the documentation you provided and look forward to hearing from you with any tips or insights you and your colleague may have.

Thanks again for your support, and I appreciate your assistance in helping me optimize the models.

Have a great day!

Best regards,

QiQi

Julian E. · ‎2025-06-25

Hello @qiqi,

I do not reproduce the ?? issue.

Concerning the epoch being in SW, I we take for example the PReLU in your RNet, we can see that it uses float32, but an epoch can only be mapped in HW if the operation is supported and in int8:

You do get warning at the beginning at the report telling you that nodes are not quantized. Could you please try first to quantize your model and see.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

qiqi · ‎2025-06-25

Hello,

After further testing, I found that the issue with ?? was caused by the --Oalt-sched optimization option. Once I removed this option, the problem disappeared. Do you know what might be causing this issue? Currently, I am using the following optimization options:

--enable-epoch-controller -O3 --all-buffers-info --mvei --cache-maintenance --native-float --enable-virtual-mem-pools --Omax-ca-pipe 4 --Ocache-opt --Os

Do you think further optimization is needed with these settings?

Additionally, we tried quantizing the PReLU operator, but with ONNX-based quantization, it seems that it can only be quantized to uint8. After quantization, the analysis results in an error, and it seems we cannot resolve this issue. Could you provide guidance on how to handle this, or suggest another approach for quantizing PReLU?

Thank you for your assistance!

Best regards,
QIQI

Julian E. · ‎2025-06-25

Hello @qiqi,

So, the PReLU being in software is a bug of the CLI front end. It is supported by the aton compiler.

The bug is fixed and will be part of the next version (2.2) planned for beginning/mid July.

Concerning the ?? bug, I opened an internal ticket, and I will update you.

Until I know more, I would suggest either not to use the option causing the issue or to use the validate on target with and without the option to see the difference and make sure the results are correct.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

qiqi · ‎2025-06-26

Hello Julian,

I encountered a new issue while using the ST Edge AI optimization options. If you have some time, could you please take a look and help me with it? Your assistance would mean a lot to me. The issue is posted at the following link:

https://community.st.com/t5/edge-ai/issue-with-input-data-type-inputs-ch-position-output-data-type/m-p/817020#M5024

Thank you so much for your help!