Issue Running a GRU/LSTM Model on STM32 with Neural-ART

Dresult · ‎2025-01-30

Hello everyone,

I’m trying to run a model on an STM32 MCU with Neural-ART using the ST Edge AI Developer Cloud.

The final model includes GRU layers, but when I attempt to quantize and then optimize it, I encounter issues. To debug this, I created a minimal test model, which is as follows:

class GRUTEST(nn.Module):
    def __init__(self, hidden_layer_size, n_layers, output_size, dropout, device):
        super(GRUTEST, self).__init__()
        self.device = device
        self.hidden_layer_size = hidden_layer_size
        self.n_layers = n_layers

        self.rnn = nn.GRU(256, hidden_layer_size, n_layers, batch_first=True, dropout=dropout, bidirectional=False)
        self.fc = nn.Linear(64, output_size)

    def forward(self, x):
        x, _ = self.rnn(x)
        x = self.fc(x[:, -1, :])
        return x

I then convert this model to ONNX as follows:

torch_model = GRUTEST(
    hidden_layer_size=64,
    n_layers=2,
    output_size=1,
    dropout=0.2,
    device='cpu'
)
torch_model.eval()

torch.onnx.export(
    torch_model,
    torch.randn(1, 1, 256),
    "modelGRU.onnx",
    opset_version=15,
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}},
    export_params=True, 
    keep_initializers_as_inputs=False
)

I successfully perform per-channel quantization, but as soon as I reach the optimization step (with default optimization options), I get the following error—both with the quantized and non-quantized versions of the model:

TOOL ERROR: operands could not be broadcast together with shapes (256,) (128,).

If I replace the GRU with an LSTM model:

class LSTMTEST(nn.Module):
    def __init__(self, hidden_layer_size, n_layers, output_size, dropout, device):
        super(LSTMTEST, self).__init__()
        self.device = device
        self.hidden_layer_size = hidden_layer_size
        self.n_layers = n_layers

        self.rnn = nn.LSTM(256, hidden_layer_size, n_layers, batch_first=True, dropout=dropout, bidirectional=False)
        self.fc = nn.Linear(64, output_size)

    def forward(self, x):
        batch_size = x.size(0)
        h_0 = torch.zeros(self.n_layers, batch_size, self.hidden_layer_size, device=x.device)
        c_0 = torch.zeros(self.n_layers, batch_size, self.hidden_layer_size, device=x.device)

        x, _ = self.rnn(x, (h_0, c_0))
        x = self.fc(x[:, -1, :])
        return x

Even though I explicitly define h_0 and c_0, I still get the following error:

NOT IMPLEMENTED: Sixth input (initial_h) of LSTM _rnn_LSTM_output_0_forward is not constant or constant propagation was not able to compute it

Would anyone be able to point out what I’m doing wrong?

Thanks in advance!

Julian E. · ‎2025-01-31

Hello @Dresult,

The neural-art does not support GRU nor LSTM. You can find the list of supported layers here:

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

Julian E. · ‎2025-01-31

Hello @Dresult,

The neural-art does not support GRU nor LSTM. You can find the list of supported layers here:

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_operator_support.html

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Dresult · ‎2025-02-03

Hello @Julian E.

Thanks for your answer, I hadn't check the page dedicated to the NPU. Now, it makes sense.

However, I am also trying to deploy the model on the STM32 MCUs and MPUs but when it comes to the optimaztion step, the process hangs. I tried with both the ST Edge AI Core 2.0 and the STM32Cube.AI 9.0. Do GRU and LSTM layers remain unsupported even on these platforms ?