2023-04-03 04:37 AM
I am using the STM32Cube.AI Developer Cloud to convert my ONNX model that I built using PyTorch.
Here is my export code:
input_size = [1, 8, 1000]
x = torch.randn(input_size)
onnx_folder_path = 'onnx_models/'
if not os.path.isdir(onnx_folder_path):
os.mkdir(onnx_folder_path)
onnx_filename = "{}{}.onnx".format(onnx_folder_path, filename)
torch.onnx.export(model, # model being run
x, # model input (or a tuple for multiple inputs)
onnx_filename, # where to save the model (can be a file or file-like object)
# export_params=True, # store the trained parameter weights inside the model file
opset_version=11, # the ONNX version to export the model to
# do_constant_folding=True, # whether to execute constant folding for optimization
input_names=['input_1'], # the model's input names
output_names=['output_1'], # the model's output names
)
And my model code:
class Custom1DCNN(nn.Module):
def __init__(self, n_input=128, n_output=7, n_channel=8, pretrained=None):
super().__init__()
input_0 = n_channel
input_1 = n_input
input_2 = n_input // 4
input_3 = n_input // 8
self.conv1 = nn.Conv1d(input_0, input_1, kernel_size=3)
self.bn1 = nn.BatchNorm1d(input_1)
self.conv2 = nn.Conv1d(input_1, input_2, kernel_size=3)
self.bn2 = nn.BatchNorm1d(input_2)
self.conv3 = nn.Conv1d(input_2, input_3, kernel_size=3)
self.bn3 = nn.BatchNorm1d(input_3)
self.avgpool = nn.AdaptiveAvgPool1d(1)
self.fc1 = nn.Linear(input_3, n_output)
self.activation = nn.ReLU()
if pretrained is not None:
self.load_pretrained(pretrained)
self.is_pretrained = True
else:
self.is_pretrained = False
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.activation(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.activation(x)
x = self.conv3(x)
x = self.bn3(x)
x = self.activation(x)
x = self.avgpool(x)
x = x.permute(0, 2, 1)
x = self.fc1(x)
x = x.flatten(1)
x = F.softmax(x, dim=1)
return x
I am getting the following error:
>>> stm32ai validate --model exported_1d_cnn_input_1000.onnx --workspace workspace --output output --allocate-inputs --allocate-outputs --relocatable --compression none --optimization balanced
Neural Network Tools for STM32AI v1.6.0 (STM.ai v7.3.0-RC5)
NOT IMPLEMENTED: Order of dimensions of input cannot be interpreted
I would appreciate guidance because this is blocking my research.
2023-10-03 02:30 AM
Sorry for the late answer, the problem has been fixed in X-CUBE-AI 8.1 and now the model can be analyzed and validated
Neural Network Tools for STM32 family v1.7.0 (stm.ai v8.1.0-19520)
Setting validation data...
generating random data, size=10, seed=42, range=(0, 1)
I[1]: (10, 1, 8, 1000)/float32, min/max=[0.000, 1.000], mean/std=[0.500, 0.288], input_1
No output/reference samples are provided
Copying the AI runtime files to the user workspace: C:\Users\fauvarqd\Downloads\stm32ai_ws\inspector_network\workspace
Exec/report summary (validate)
-------------------------------------------------------------------------------------
model file : C:\Users\fauvarqd\Downloads\exported_1d_cnn_input_1000.onnx
type : onnx
c_name : network
compression : lossless
optimization : balanced
workspace dir : C:\Users\fauvarqd\Downloads\stm32ai_ws
output dir : C:\Users\fauvarqd\Downloads\stm32ai_output
model_fmt : float
model_name : exported_1d_cnn_input_1000
model_hash : 294ebca9dfef0693515b87f952908f4d
params # : 17,191 items (67.15 KiB)
-------------------------------------------------------------------------------------
input 1/1 : 'input_1' (domain:user/)
: 8000 items, 31.25 KiB, ai_float, float, (1,1,8,1000)
output 1/1 : 'output_1' (domain:user/)
: 7 items, 28 B, ai_float, float, (1,7)
macc : 17,031,315
weights (ro) : 68,764 B (67.15 KiB) (1 segment)
activations (rw) : 512,160 B (500.16 KiB) (1 segment)
ram (total) : 544,188 B (531.43 KiB) = 512,160 + 32,000 + 28
-------------------------------------------------------------------------------------
Running the STM AI c-model (AI RUNNER)...(name=network, mode=x86)
X86 shared lib (C:\Users\fauvarqd\Downloads\stm32ai_ws\inspector_network\workspace\lib\libai_network.dll) ['network']
Summary "network" - ['network']
----------------------------------------------------------------------------------------------
inputs/ouputs : 1/1
input_1 : input_1, (1,1,8,1000), float32, 32,000 bytes, user
output_1 : output_1, (1,1,1,7), float32, 28 bytes, user
n_nodes : 13
compile_datetime : Oct 3 2023 11:29:26
activations : 512160
weights : 68764
macc : 17031315
----------------------------------------------------------------------------------------------
runtime : STM.AI(/) 8.1.0 (Tools 8.1.0) -
capabilities : IO_ONLY, PER_LAYER, PER_LAYER_WITH_DATA
device : AMD64 Intel64 Family 6 Model 142 Stepping 12, GenuineIntel (Windows)
NOTE: duration and exec time per layer is just an indication. They are dependent of the HOST-machine work-load.
STM.AI Profiling results v1.2 - network
---------------------------------------------------------------
nb sample(s) : 10
duration : 35.863ms by sample (34.324/36.488/0.565)
macc : 17031315
---------------------------------------------------------------
HOST duration : 0.379s (total)
---------------------------------------------------------------
Inference time per node
--------------------------------------------------------------------------
c_id m_id type dur (ms) % name
--------------------------------------------------------------------------
0 1 Transpose (0x10a) 0.066 0.2% ai_node_0
1 1 Transpose (0x10a) 0.062 0.2% ai_node_1
2 2 Conv2D (0x103) 7.543 21.0% ai_node_2
3 3 NL (0x107) 0.623 1.7% ai_node_3
4 4 Conv2D (0x103) 24.180 67.4% ai_node_4
5 5 NL (0x107) 0.103 0.3% ai_node_5
6 6 Conv2D (0x103) 3.215 9.0% ai_node_6
7 7 NL (0x107) 0.033 0.1% ai_node_7
8 8 Pool (0x10b) 0.023 0.1% ai_node_8
9 10 Dense (0x104) 0.004 0.0% ai_node_9
10 11 Eltwise (0x113) 0.002 0.0% ai_node_10
11 12 Transpose (0x10a) 0.001 0.0% ai_node_11
12 13 NL (0x107) 0.004 0.0% ai_node_12
--------------------------------------------------------------------------
total 35.860
--------------------------------------------------------------------------
Statistic per tensor
-----------------------------------------------------------------------------
tensor shape/type min max mean std name
-----------------------------------------------------------------------------
I.0 (1,1,8,1000)/float32 0.000 1.000 0.500 0.288 input_1
O.0 (1,1,1,7)/float32 0.118 0.168 0.143 0.020 output_1
-----------------------------------------------------------------------------
Running the ONNX model...
Saving validation data...
output directory: C:\Users\fauvarqd\Downloads\stm32ai_output
creating C:\Users\fauvarqd\Downloads\stm32ai_output\network_val_io.npz
m_outputs_1: (10, 1, 1, 7)/float64, min/max=[0.118, 0.168], mean/std=[0.143, 0.020], output_1
c_outputs_1: (10, 1, 1, 7)/float32, min/max=[0.118, 0.168], mean/std=[0.143, 0.020], output_1
Computing the metrics...
Cross accuracy report #1 (reference vs C-model)
----------------------------------------------------------------------------------------------------
notes: - data type is different: r/float64 instead p/float32
- the output of the reference model is used as ground truth/reference value
- 10 samples (7 items per sample)
acc=100.00%, rmse=0.000021865, mae=0.000015804, l2r=0.000151639, nse=100.00%, cos=100.00%
7 classes (10 samples)
-------------------------------------------
C0 0 . . . . . .
C1 . 0 . . . . .
C2 . . 10 . . . .
C3 . . . 0 . . .
C4 . . . . 0 . .
C5 . . . . . 0 .
C6 . . . . . . 0
Evaluation report (summary)
--------------------------------------------------------------------------------------------------------------------------------------------------
Output acc rmse mae l2r mean std nse cos tensor
--------------------------------------------------------------------------------------------------------------------------------------------------
X-cross #1 100.00% 0.0000219 0.0000158 0.0001516 0.0000000 0.0000220 0.9999988 1.0000000 output_1, ai_float, (1,7), m_id=[13]
--------------------------------------------------------------------------------------------------------------------------------------------------
acc : Classification accuracy (all classes)
rmse : Root Mean Squared Error
mae : Mean Absolute Error
l2r : L2 relative error
nse : Nash-Sutcliffe efficiency criteria
cos : COsine Similarity
Creating txt report file C:\Users\fauvarqd\Downloads\stm32ai_output\network_validate_report.txt
elapsed time (validate): 6.563s