Error when exporting PyTorch model "NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted"

JCame.3 · ‎2023-04-03

I am using the STM32Cube.AI Developer Cloud to convert my ONNX model that I built using PyTorch.

Here is my export code:

input_size = [1, 8, 1000]
 
        x = torch.randn(input_size)
 
        onnx_folder_path = 'onnx_models/'
        if not os.path.isdir(onnx_folder_path):
            os.mkdir(onnx_folder_path)
        onnx_filename = "{}{}.onnx".format(onnx_folder_path, filename)
 
        torch.onnx.export(model,  # model being run
                          x,  # model input (or a tuple for multiple inputs)
                          onnx_filename,  # where to save the model (can be a file or file-like object)
                          # export_params=True,  # store the trained parameter weights inside the model file
                          opset_version=11,  # the ONNX version to export the model to
                          # do_constant_folding=True,  # whether to execute constant folding for optimization
                          input_names=['input_1'],  # the model's input names
                          output_names=['output_1'],  # the model's output names
                          )

And my model code:

class Custom1DCNN(nn.Module):
    def __init__(self, n_input=128, n_output=7, n_channel=8, pretrained=None):
        super().__init__()
 
        input_0 = n_channel
        input_1 = n_input
        input_2 = n_input // 4
        input_3 = n_input // 8
 
        self.conv1 = nn.Conv1d(input_0, input_1, kernel_size=3)
        self.bn1 = nn.BatchNorm1d(input_1)
 
        self.conv2 = nn.Conv1d(input_1, input_2, kernel_size=3)
        self.bn2 = nn.BatchNorm1d(input_2)
 
        self.conv3 = nn.Conv1d(input_2, input_3, kernel_size=3)
        self.bn3 = nn.BatchNorm1d(input_3)
 
        self.avgpool = nn.AdaptiveAvgPool1d(1)
 
        self.fc1 = nn.Linear(input_3, n_output)
 
        self.activation = nn.ReLU()
 
        if pretrained is not None:
            self.load_pretrained(pretrained)
            self.is_pretrained = True
        else:
            self.is_pretrained = False
 
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.activation(x)
 
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.activation(x)
 
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.activation(x)
 
        x = self.avgpool(x)
        x = x.permute(0, 2, 1)
        x = self.fc1(x)
        x = x.flatten(1)
        x = F.softmax(x, dim=1)
 
        return x

I am getting the following error:

>>> stm32ai validate --model exported_1d_cnn_input_1000.onnx --workspace workspace --output output --allocate-inputs --allocate-outputs --relocatable --compression none --optimization balanced
Neural Network Tools for STM32AI v1.6.0 (STM.ai v7.3.0-RC5)
NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted

I would appreciate guidance because this is blocking my research.

fauvarque.daniel · ‎2023-10-03

Sorry for the late answer, the problem has been fixed in X-CUBE-AI 8.1 and now the model can be analyzed and validated

Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)

Setting validation data...
generating random data, size=10, seed=42, range=(0, 1)
I[1]: (10, 1, 8, 1000)/float32, min/max=[0.000, 1.000], mean/std=[0.500, 0.288], input_1
No output/reference samples are provided
Copying the AI runtime files to the user workspace: C:\Users\fauvarqd\Downloads\stm32ai_ws\inspector_network\workspace

Exec/report summary (validate)
-------------------------------------------------------------------------------------
model file : C:\Users\fauvarqd\Downloads\exported_1d_cnn_input_1000.onnx
type : onnx
c_name : network
compression : lossless
optimization : balanced
workspace dir : C:\Users\fauvarqd\Downloads\stm32ai_ws
output dir : C:\Users\fauvarqd\Downloads\stm32ai_output
model_fmt : float
model_name : exported_1d_cnn_input_1000
model_hash : 294ebca9dfef0693515b87f952908f4d
params # : 17,191 items (67.15 KiB)
-------------------------------------------------------------------------------------
input 1/1 : 'input_1' (domain:user/)
: 8000 items, 31.25 KiB, ai_float, float, (1,1,8,1000)
output 1/1 : 'output_1' (domain:user/)
: 7 items, 28 B, ai_float, float, (1,7)
macc : 17,031,315
weights (ro) : 68,764 B (67.15 KiB) (1 segment)
activations (rw) : 512,160 B (500.16 KiB) (1 segment)
ram (total) : 544,188 B (531.43 KiB) = 512,160 + 32,000 + 28
-------------------------------------------------------------------------------------

Running the STM AI c-model (AI RUNNER)...(name=network, mode=x86)

X86 shared lib (C:\Users\fauvarqd\Downloads\stm32ai_ws\inspector_network\workspace\lib\libai_network.dll) ['network']

Summary "network" - ['network']
----------------------------------------------------------------------------------------------
inputs/ouputs : 1/1
input_1 : input_1, (1,1,8,1000), float32, 32,000 bytes, user
output_1 : output_1, (1,1,1,7), float32, 28 bytes, user
n_nodes : 13
compile_datetime : Oct 3 2023 11:29:26
activations : 512160
weights : 68764
macc : 17031315
----------------------------------------------------------------------------------------------
runtime : STM.AI(/) 8.1.0 (Tools 8.1.0) -
capabilities : IO_ONLY, PER_LAYER, PER_LAYER_WITH_DATA
device : AMD64 Intel64 Family 6 Model 142 Stepping 12, GenuineIntel (Windows)

NOTE: duration and exec time per layer is just an indication. They are dependent of the HOST-machine work-load.

STM.AI Profiling results v1.2 - network
---------------------------------------------------------------
nb sample(s) : 10
duration : 35.863ms by sample (34.324/36.488/0.565)
macc : 17031315
---------------------------------------------------------------
HOST duration : 0.379s (total)
---------------------------------------------------------------

Inference time per node
--------------------------------------------------------------------------
c_id m_id type dur (ms) % name
--------------------------------------------------------------------------
0 1 Transpose (0x10a) 0.066 0.2% ai_node_0
1 1 Transpose (0x10a) 0.062 0.2% ai_node_1
2 2 Conv2D (0x103) 7.543 21.0% ai_node_2
3 3 NL (0x107) 0.623 1.7% ai_node_3
4 4 Conv2D (0x103) 24.180 67.4% ai_node_4
5 5 NL (0x107) 0.103 0.3% ai_node_5
6 6 Conv2D (0x103) 3.215 9.0% ai_node_6
7 7 NL (0x107) 0.033 0.1% ai_node_7
8 8 Pool (0x10b) 0.023 0.1% ai_node_8
9 10 Dense (0x104) 0.004 0.0% ai_node_9
10 11 Eltwise (0x113) 0.002 0.0% ai_node_10
11 12 Transpose (0x10a) 0.001 0.0% ai_node_11
12 13 NL (0x107) 0.004 0.0% ai_node_12
--------------------------------------------------------------------------
total 35.860
--------------------------------------------------------------------------

Statistic per tensor
-----------------------------------------------------------------------------
tensor shape/type min max mean std name
-----------------------------------------------------------------------------
I.0 (1,1,8,1000)/float32 0.000 1.000 0.500 0.288 input_1
O.0 (1,1,1,7)/float32 0.118 0.168 0.143 0.020 output_1
-----------------------------------------------------------------------------

Running the ONNX model...

Saving validation data...
output directory: C:\Users\fauvarqd\Downloads\stm32ai_output
creating C:\Users\fauvarqd\Downloads\stm32ai_output\network_val_io.npz
m_outputs_1: (10, 1, 1, 7)/float64, min/max=[0.118, 0.168], mean/std=[0.143, 0.020], output_1
c_outputs_1: (10, 1, 1, 7)/float32, min/max=[0.118, 0.168], mean/std=[0.143, 0.020], output_1

Computing the metrics...

Cross accuracy report #1 (reference vs C-model)
----------------------------------------------------------------------------------------------------
notes: - data type is different: r/float64 instead p/float32
- the output of the reference model is used as ground truth/reference value
- 10 samples (7 items per sample)

acc=100.00%, rmse=0.000021865, mae=0.000015804, l2r=0.000151639, nse=100.00%, cos=100.00%

7 classes (10 samples)
-------------------------------------------
C0 0 . . . . . .
C1 . 0 . . . . .
C2 . . 10 . . . .
C3 . . . 0 . . .
C4 . . . . 0 . .
C5 . . . . . 0 .
C6 . . . . . . 0

Evaluation report (summary)
--------------------------------------------------------------------------------------------------------------------------------------------------
Output acc rmse mae l2r mean std nse cos tensor
--------------------------------------------------------------------------------------------------------------------------------------------------
X-cross #1 100.00% 0.0000219 0.0000158 0.0001516 0.0000000 0.0000220 0.9999988 1.0000000 output_1, ai_float, (1,7), m_id=[13]
--------------------------------------------------------------------------------------------------------------------------------------------------

acc : Classification accuracy (all classes)
rmse : Root Mean Squared Error
mae : Mean Absolute Error
l2r : L2 relative error
nse : Nash-Sutcliffe efficiency criteria
cos : COsine Similarity

Creating txt report file C:\Users\fauvarqd\Downloads\stm32ai_output\network_validate_report.txt
elapsed time (validate): 6.563s

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.