AI Core inconsistent inference results

MichalPesko · ‎2024-06-15

Hello, I am developing a Dog Activity Recognition app with CubeAI on STM32WB55 usign AI Core 9.0.0.

So far, I have been following https://wiki.st.com/stm32mcu/wiki/AI:How_to_perform_motion_sensing_on_STM32L4_IoTnode . I am using the same sensor and ODR, same AI input size, same Model structure, but my own dataset from Google Colab.

Model Structure:

Accuracy in Google Colab: 94 %.

After getting almost always the same "wrong" result in STMCubeIDE, I did the following:

I performed validation on target in STMCubeIDE (the accuracy is similar to Google Colab, which is good:(acc=95.13%, rmse=0.145657539, mae=0.044287317, l2r=0.261498094, nse=0.905, cos=0.968
I tested a couple of inputs manually to compare with result in Colab and this is where the problem seems to be (Test1 is OK), the results do not match quite a lot:

How this can be? The model in Colab is the same as STMCubeIDE.

Side question: does AI Core transform Conv1D layers into Conv2D, as seen in the first picure? Can this possibly be the reason?

Thank you very much!

fauvarque.daniel · ‎2024-07-17

You can't expect the results to be binary the same as on one side you have code in python and on the target you have another implementation of the kernels in C.

If the accuracy for the validation on the target is ok with a significant dataset then the differences you are seeing are probably in the 5% of accuracy loss.

That said a cos of 0.968 on a significant dataset is not so good, we are expecting more in the range of 0.99 to 1.

Maybe you can share your model for a closer look at any potential issue.

Regards

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

unsigned_char_array · ‎2024-07-17

@fauvarque.daniel wrote:
You can't expect the results to be binary the same as on one side you have code in python and on the target you have another implementation of the kernels in C.

I assume this is due to model quantization or other form of transforming so the model can be deployed to the MCU, is this correct? Is there a way to simulate using the same kernel as the target? Is there a way to retrain using the same kernel to improve accuracy? To me such a huge difference is unacceptable and I wonder if there is a way to make them closer.

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

fauvarque.daniel · ‎2024-07-17

You can verify the accuracy by running the model on the host, this will use the same C kernels and compare the resuts with a run in python.

The important point is to have a relevant dataset when running the validation. You can run the validation with random data but with quantized networks the results may not be the real ones.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

MichalPesko · ‎2024-07-17

Thanks for replying,

the error was that the dataset was segmented in different way in Python and in CubeIDE (which was my mystake), so the AI input was after all different in Colab and Cube. That is why it always predicted the same activity. If this wasnt the case, then the performance of AI Core would indeed be unnacceptable.

You are also correct that the overall model does not perform too well, that is to be fixed later.

unsigned_char_array · ‎2024-07-17

What do you mean by segmented? Splitting the data set into a training and validation set?

Kudo posts if you have the same problem and kudo replies if the solution works.
Click "Accept as Solution" if a reply solved your problem. If no solution was posted please answer with your own.

MichalPesko · ‎2024-07-17

The sequence of sensor input data for model input batch (segment) was different between Colab and Cube, which was not obvious at first. After correcting the input batch format in Colab, now I get same results in Colab and Cube. You can perform "Validation on target" in CubeIDE using one-hot-encoded input data and UART and this will tell you how much the accuracy differs between your Cube model and Python model.