Error analysing LSTM model using X-Cube-AI:INTERNAL ERROR : Unkonwn dimensions : CH

Wangxingkun · ‎2024-12-19

When I analyze the ONNX model in CubeAl,it says INTERNAL ERROR : Unkonwn dimensions : CH.The code of the model is simple as follows，input_size = 1 hidden_size = 1 output_size = 1 num_layers = 1
I don't know why and I will really appreciate any advice or workarounds. Thank you：

class LSTMNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1)
        super(LSTMNet, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=False)  # batch_first=False
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(num_layers, x.size(1), hidden_size).to(x.device) 
        c0 = torch.zeros(num_layers, x.size(1), hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[-1, :, :]) 
        return out

Julian E. · ‎2024-12-20

Hello @Wangxingkun ,

What version of ST Edge AI are you using?

I tried it and I do not get errors...

This is the version I have (the last I believe)

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

Wangxingkun · ‎2024-12-19

And this is my complete training code, if you know the reason why please reply me, thank you !!!

class SequenceDataset(Dataset):
    def __init__(self, sequences, labels):
        self.sequences = sequences
        self.labels = labels

    def __len__(self:(
        return len(self.labels)

    def __getitem__(self, idx):
        return torch.Tensor(self.sequences[idx]), torch.Tensor(self.labels[idx])

class LSTMNet(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1:(
        super(LSTMNet, self).__init__()
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=False)  # batch_first=False
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        h0 = torch.zeros(num_layers, x.size(1), hidden_size).to(x.device) 
        c0 = torch.zeros(num_layers, x.size(1), hidden_size).to(x.device)
        out, _ = self.lstm(x, (h0, c0))
        out = self.fc(out[-1, :, :]) 
        return out

input_size = 1 
hidden_size = 1 
output_size = 1
num_layers = 1
num_epochs = 120
batch_size = 32
learning_rate = 0.001

num_samples = 1000
sequence_length = 15
sequences = torch.rand(num_samples, sequence_length, input_size)
labels = torch.randint(0, 2, (num_samples, 1))

dataset = SequenceDataset(sequences.numpy(), labels.numpy())
train_loader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

model = LSTMNet(input_size, hidden_size, output_size, num_layers)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

model.train()
for epoch in range(num_epochs):
    for seqs, lbls in train_loader:
        seqs = seqs.permute(1, 0, 2)  
        outputs = model(seqs)
        loss = criterion(outputs, lbls)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

example_input = torch.rand(sequence_length, 1, input_size) 
torch.onnx.export(model, example_input, 'lstm_model.onnx',
                  input_names=['input'],
                  output_names=['output'],
                  dynamic_axes={'input': {1: 'batch_size', 0: 'sequence_length'},
                                'output': {0: 'batch_size'}})
print('Model exported as lstm_model.onnx')

Julian E. · ‎2024-12-20

Hello @Wangxingkun,

I am not an expert regarding pytorch so I will need my colleagues for a more complete answer. But they are in vacation. In the meantime, you can take a look at this code that works and should help you

import torch
import torch.nn as nn

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_output=1):
        super(LSTM, self).__init__()
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_output = num_output

        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=False)

        self.fc = nn.Linear(hidden_size, num_output)

    def forward(self, x, device='cuda'):

        ula, (h_out, _) = self.lstm(x)  
        out = self.fc(h_out[-1])  
        return out

# If not 1 : error
# other we can put anything
input_size = 10
hidden_size = 10
num_layers = 1
num_output = 10

model = LSTM(input_size, hidden_size, num_layers, num_output)

model.eval()

dummy_input = torch.randn(1, 128, input_size)  # Batch size 1, Sequence length 128, Input size 10

onnx_file_path = "simple_lstml.onnx"
torch.onnx.export(model, dummy_input, onnx_file_path, 
                  input_names=['input'], output_names=['output'])

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Wangxingkun · ‎2024-12-20

Thank you very much Julian, I have tried your code but it still says 'INTERNAL ERROR : Unkonwn dimensions : CH'.It seems that as long as there is an LSTM layer in the model, there will be the following error. If it is a fully connected layer, there will be no such error.But in my project, the LSTM layer is necessary, and I don't know how to fix it.It makes me a little frustrated.

Julian E. · ‎2024-12-20

Hello @Wangxingkun ,

What version of ST Edge AI are you using?

I tried it and I do not get errors...

This is the version I have (the last I believe)

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Wangxingkun · ‎2024-12-20

Hello Julian,

Thank you very much!! I tried your zip again and when I converted the version to 10.0.0, this error no longer occurred.From what I see now, this is just a minor version issue, but you still patiently helped me clarify my confusion and analyze possible problems. I am truly grateful to you.

Have a good day :) ,

Wang

wypeng · ‎2024-12-26

Dear Julian,

Thank you for your assistance and guidance. I would like to seek further advice regarding a challenge I encountered. When executing the provided model on my local environment, the predictions align well with expectations. However, when deploying the same model to an STM32 embedded board, the results differ.
Below is a simplified version of the Pytorch code and model for reference:

import torch
import torch.nn as nn
from torch.utils.data import DataLoader

import numpy as np
from sklearn.preprocessing import MinMaxScaler

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_output=1):
        super(LSTM, self).__init__()
        self.num_layers = num_layers
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_output = num_output

        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=False)

        self.fc = nn.Linear(hidden_size, num_output)

    def forward(self, x, device='cuda'):

        ula, (h_out, _) = self.lstm(x)  
        out = self.fc(h_out[:, -1, :])  
        return out

input_size = 1
hidden_size = 64
num_layers = 1
num_output = 1

model = LSTM(input_size, hidden_size, num_layers, num_output)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

seq_length = 50  
num_samples = 1000  
X, y = generate_sine_wave(seq_length, num_samples)
scaler = MinMaxScaler(feature_range=(0, 1))
X_scaled = scaler.fit_transform(X)
X_scaled = X_scaled.reshape((X_scaled.shape[0], X_scaled.shape[1], 1))
y = y.reshape(-1, 1)

# train
X_train = torch.tensor(X_scaled, dtype=torch.float32)  
y_train = torch.tensor(y, dtype=torch.float32)
train_loader = DataLoader(list(zip(X_train, y_train)), batch_size=1, shuffle=True)

epochs = 50
for epoch in range(epochs):
    model.train()
    for i, (X_batch, y_batch) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(X_batch)
        loss = criterion(output, y_batch)
        loss.backward()
        optimizer.step()
    print(f'Epoch: {epoch+1:2d}, loss: {loss.item()}')

X_batch = torch.randn(1, seq_length, input_size)
onnx_file_path = "torch_lstm.onnx"
torch.onnx.export(model, X_batch, onnx_file_path,
                  input_names=['input'], output_names=['output'])

Currently, I have confirmed that models built with Keras (without LSTM layers) produce consistent predictions between the local environment and the STM32. Below is a simplified version of the Keras code and model for reference:

from keras.models import Model
from keras.layers import Input, LSTM, Dense

input_layer = Input(shape=(seq_length, 1), name="input_layer")
lstm_1 = LSTM(32, activation='relu', return_sequences=True, name="lstm_1")(input_layer)
lstm_2 = LSTM(32, activation='relu', name="lstm_2")(lstm_1)
output_layer = Dense(units=1, activation='linear', name="output_layer")(lstm_2)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()
model.compile(optimizer='adam', loss='mse') #mean_squared_error
model.fit(X_scaled, y, epochs=30, batch_size=1, validation_split=0.2, 
          shuffle=True, verbose=1)

model.save('./keras_lstm.h5')

Below is a simplified test set for inference:

static float X_test_2d[50][1] = {

      {0.5908357501029968f},
      {0.6132590174674988f},
      {0.6401912569999695f},
      {0.6743371486663818f},
      {0.7207258343696594f},
      {0.7869139313697815f},
      {0.8616802096366882f},
      {0.9366850852966309f},
      {0.9935588240623474f},
      {0.9785177111625671f},
      {0.9057679772377014f},
      {0.82420414686203f},
      {0.7471164464950562f},
      {0.6801365613937378f},
      {0.6331720352172852f},
      {0.598699688911438f},
      {0.5715680122375488f},
      {0.549017608165741f},
      {0.5294104218482971f},
      {0.5116827487945557f},
      {0.4950788617134094f},
      {0.4790057837963104f},
      {0.4629416763782501f},
      {0.44636598229408264f},
      {0.42868947982788086f},
      {0.4091642498970032f},
      {0.3867409825325012f},
      {0.3598087728023529f},
      {0.32566285133361816f},
      {0.2792741358280182f},
      {0.2130860686302185f},
      {0.13831977546215057f},
      {0.06331492215394974f},
      {0.006441186182200909f},
      {0.021482301875948906f},
      {0.09423204511404037f},
      {0.17579583823680878f},
      {0.25288352370262146f},
      {0.3198634684085846f},
      {0.36682793498039246f},
      {0.401300311088562f},
      {0.42843198776245117f},
      {0.45098239183425903f},
      {0.4705895781517029f},
      {0.48831725120544434f},
      {0.5049211382865906f},
      {0.520994246006012f},
      {0.5370582938194275f},
      {0.5536340475082397f},
      {0.5713105201721191f}

};

static float y_test[1][1] = {
    {0.49757889651355536f}
};

When using PyTorch to build models and deploying them to an STM32 board, I noticed that the parameter size differ between the local environment and the STM32 after deployment, even with the same operations. However, when using Keras, the parameter size remain consistent between the two environments.

Could you kindly provide guidance on what I should pay attention to in order to resolve this issue?

Thank you for your time and support.

Best regards,

Wendy

Julian E. · ‎2025-01-06

Hello @wypeng ,

I think the answer to your question is page 37&38 of this document:

https://www.st.com/resource/en/user_manual/dm00570145-getting-started-with-x-cube-ai-expansion-package-for-artificial-intelligence-ai-stmicroelectronics.pdf

It says that there are two causes of differences:

The implementation of the kernel in python and C code is different which can lead to differences in results.
if I am not mistaking, you can also observe a difference between the validation on desktop and on target because one run a X86 generated C code and the other run an Arm generated C code.

It leads in some cases to differences because the operations are not exactly the same.

So, to validate your model, you need to first make sure that you have a good accuracy when working in python and then deploy it and make new test to see the accuracy that you get.

Tests only in python are not enough.

If you see a significant accuracy drop using X-CUBE-AI then it may be an issue we have to work on. But seeing different is "normal".

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

wypeng · ‎2025-01-09

Dear Julian,

Thank you very much for your prompt reply and valuable suggestions. I truly appreciate your insights, and I will definitely try exploring this direction as you recommended.

I look forward to further updates and will keep you posted on my progress. Once again, thank you for your support.

Best regards,
Wendy