porting yolov11 model to stm32mp257.

fanronghua0123456 · ‎2025-10-10

I porting yolov11 model to stm32mp257. and The target detection model I use ,The output tensor is a combination of [1x7x8400] and [1x4x8400]. [1x7x8400] should be our classification because I have set 7 categories, and [1x4x8400] should be our positional information.

1. best.pt changed to best.onnx

from ultralytics import YOLO

# Load the YOLO11 model

model = YOLO("best.pt")

# Export the model to ONNX format

model.export(format="onnx") # creates 'yolo11n.onnx'

2. best.onnx changed to best.nb

./stedgeai generate -m path/to/onnx/model --target stm32mp25

3. into broad stm32mp257 ,and get output shape.

#C++

float *pfloatdata = static_cast<float*>(nn_model->get_output(0));

std::vector<ObjDetect_Results> returnx = parseModelOutput(pfloatdata);

std::vector<ObjDetect_Results> parseModelOutput(float* output, int num_classes = 7, float confidence_threshold = 0.55f)

{

std::vector<ObjDetect_Results> detections;

float *data = output;

const int num_boxes = 8400;

const int attributes_per_box = 4 + num_classes; // 4坐标 + 1置信度 + n类别

for (int w = 0; w < 8400; w++)

{

for (int h = 0; h < attributes_per_box; h++)

{

printf(" %.2f ",data[h * 8400 + w]);

}

printf("\r\n");

}

4. I received the following output.I use the onnx model, and the probability of all 7 categories on the PC is 0. This seems to be correct, and the reasoning is also correct. However, when converted to. nb format, why is the probability of all 7 categories 1? This is obviously not correct. Why is this?

16.66 10.19 38.69 12.38 1.00 1.00 0.00 1.00 1.00 1.00 1.00

22.69 12.00 42.62 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

16.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

24.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

32.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

40.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

47.53 12.01 71.06 16.02 1.00 1.00 1.00 1.00 1.00 1.00 1.00

54.12 12.00 68.25 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

60.03 12.00 64.06 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

68.00 12.00 80.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

76.00 12.00 80.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

88.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

96.00 11.99 72.00 16.02 1.00 1.00 1.00 1.00 1.00 1.00 1.00

104.00 11.56 72.00 15.15 1.00 1.00 1.00 1.00 1.00 1.00 1.00

109.88 11.91 76.25 15.84 1.00 1.00 1.00 1.00 1.00 1.00 1.00

116.00 8.00 80.00 8.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

124.00 8.00 80.00 8.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

132.00 8.02 80.00 8.02 1.00 1.00 1.00 1.00 1.00 1.00 1.00

144.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

152.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

160.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

168.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

176.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

183.75 12.00 72.38 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

192.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

200.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

208.00 12.00 72.00 16.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

Julian E. · ‎2025-10-21

Hello @fanronghua0123456,

To do an inference with a nb, you can look at this article:

https://wiki.st.com/stm32mpu/wiki/How_to_run_inference_using_the_STAI_MPU_Python_API

In your case, you need to edit the code to print the outputs as you expect, as by default it is done for image classification.

The 6 last lines:

    top_k = results.argsort()[-5:][::-1]
    labels = load_labels(args.label_file)
    for i in top_k:
        if output_tensor_dtype == np.uint8:
            print('{:08.6f}: {}'.format(float(results[i] / 255.0), labels[i]))
        else:
            print('{:08.6f}: {}'.format(float(results[i]), labels[i]))

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

fanronghua0123456 · ‎2025-10-21

@Julian E.

Thanks for your reply.

I ran it according to your method and encountered the following error. Could you tell me what caused this and how it should be fixed?

It looks like there stai_model.set_input(0, input_data) is an error here.

root@ATK-DLMP257:/opt/ui/src/apps/resource/x-linux-ai/object-detection# python3 testpy.py -m /home/root/best_integer_quant.nb -i /home/root/hander1_1.jpg -l /home
/root/best.txt
Loading dynamically: /usr/lib/libstai_mpu_ovx.so.6
[OVX]: Loading nbg model
**Input node: 0 -Input_name: -Input_dims:4 - input_type:float16 -Input_shape:(1, 640, 640, 3)
**Output node: 0 -Output_name: -Output_dims:3 -  Output_type:float16 -Output_shape:(1, 11, 8400)
640
640
----1
Segmentation fault (core dumped)

My testpy.py script is as follows:

from stai_mpu import stai_mpu_network
from numpy.typing import NDArray
from typing import Any, List
from pathlib import Path
from PIL import Image
from argparse import ArgumentParser
from timeit import default_timer as timer
import cv2 as cv
import numpy as np
import time

def load_labels(filename):
    with open(filename, 'r') as f:
        return [line.strip() for line in f.readlines()]

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('-i','--image', help='image to be classified.')
    parser.add_argument('-m','--model_file',help='model to be executed.')
    parser.add_argument('-l','--label_file', help='name of labels file.')
    parser.add_argument('--input_mean', default=127.5, help='input_mean')
    parser.add_argument('--input_std', default=127.5,help='input stddev')
    args = parser.parse_args()

    stai_model = stai_mpu_network(model_path=args.model_file, use_hw_acceleration=True)
    # Read input tensor information
    num_inputs = stai_model.get_num_inputs()
    input_tensor_infos = stai_model.get_input_infos()
    for i in range(0, num_inputs):
        input_tensor_shape = input_tensor_infos[i].get_shape()
        input_tensor_name = input_tensor_infos[i].get_name()
        input_tensor_rank = input_tensor_infos[i].get_rank()
        input_tensor_dtype = input_tensor_infos[i].get_dtype()
        print("**Input node: {} -Input_name:{} -Input_dims:{} - input_type:{} -Input_shape:{}".format(i, input_tensor_name,
                                                                                                    input_tensor_rank,
                                                                                                    input_tensor_dtype,
                                                                                                    input_tensor_shape))
        if input_tensor_infos[i].get_qtype() == "staticAffine":
            # Reading the input scale and zero point variables
            input_tensor_scale = input_tensor_infos[i].get_scale()
            input_tensor_zp = input_tensor_infos[i].get_zero_point()
        if input_tensor_infos[i].get_qtype() == "dynamicFixedPoint":
            # Reading the dynamic fixed point position
            input_tensor_dfp_pos = input_tensor_infos[i].get_fixed_point_pos()


    # Read output tensor information
    num_outputs = stai_model.get_num_outputs()
    output_tensor_infos = stai_model.get_output_infos()
    for i in range(0, num_outputs):
        output_tensor_shape = output_tensor_infos[i].get_shape()
        output_tensor_name = output_tensor_infos[i].get_name()
        output_tensor_rank = output_tensor_infos[i].get_rank()
        output_tensor_dtype = output_tensor_infos[i].get_dtype()
        print("**Output node: {} -Output_name:{} -Output_dims:{} -  Output_type:{} -Output_shape:{}".format(i, output_tensor_name,
                                                                                                        output_tensor_rank,
                                                                                                        output_tensor_dtype,
                                                                                                        output_tensor_shape))
        if output_tensor_infos[i].get_qtype() == "staticAffine":
            # Reading the output scale and zero point variables
            output_tensor_scale = output_tensor_infos[i].get_scale()
            output_tensor_zp = output_tensor_infos[i].get_zero_point()
        if output_tensor_infos[i].get_qtype() == "dynamicFixedPoint":
            # Reading the dynamic fixed point position
            output_tensor_dfp_pos = output_tensor_infos[i].get_fixed_point_pos()

    # Reading input image
    input_width = input_tensor_shape[1]
    print(input_width)
    input_height = input_tensor_shape[2]
    print(input_height)
    input_image = Image.open(args.image).resize((input_width,input_height))
    input_data = np.expand_dims(input_image, axis=0)
    if input_tensor_dtype == np.float32:
        input_data = (np.float32(input_data) - args.input_mean) /args.input_std
    print("----1")
    stai_model.set_input(0, input_data)
    print("----2")
    start = timer()
    stai_model.run()
    end = timer()

    print("Inference time: ", (end - start) *1000, "ms")
    output_data = stai_model.get_output(index=0)
    results = np.squeeze(output_data)
    top_k = results.argsort()[-5:][::-1]
    labels = load_labels(args.label_file)
    for i in top_k:
        if output_tensor_dtype == np.uint8:
            print('{:08.6f}: {}'.format(float(results[i] / 255.0), labels[i]))
        else:
            print('{:08.6f}: {}'.format(float(results[i]), labels[i]))

Thank you very much for your help！

Julian E. · ‎2025-10-21

Hi @fanronghua0123456,

Can you please add this to your code, after the resize, to make sure the shape and type of the input are correct:

---
img_array_after = np.array(input_data)
print("Dtype after resize: ", img_array_after.dtype)                                                                                                    
print("Shape after resize: ", img_array_after.shape)
---
stai_model.set_input(0, input_data)

Your input is in float16, so your data must also be in float16, which can make this line give an error:

if input_tensor_dtype == np.float32:                                                                    
        input_data = (np.float32(input_data) - args.input_mean) /args.input_std

We do a conversion in float32 of the input if the dtype of the input is of type float32. But in your case, it is float16 so we still need to a conversion to float16, something like

if input_tensor_dtype == np.float16:                                                                    
        print("input data float 16")                                                                           
        input_data = (np.float16(input_data) - args.input_mean) /args.input_std

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

fanronghua0123456 · ‎2025-10-21

@Julian E.

Thanks for your reply.

Using your method, I added the following code.

    # Reading input image
    input_width = input_tensor_shape[1]
    print(input_width)
    input_height = input_tensor_shape[2]
    print(input_height)
    input_image = Image.open(args.image).resize((input_width,input_height))
    input_data = np.expand_dims(input_image, axis=0)
    if input_tensor_dtype == np.float32:
        input_data = (np.float32(input_data) - args.input_mean) /args.input_std
    print("----1")
    img_array_after = np.array(input_data)
    print("Dtype after resize: ", img_array_after.dtype)
    print("Shape after resize: ", img_array_after.shape)
    print("----1test")
    if input_tensor_dtype == np.float32:
         print("float32")
    if input_tensor_dtype == np.float16:
         print("float16")
         input_data = (np.float16(input_data) - args.input_mean) /args.input_std
    print("----2test")
    stai_model.set_input(0, input_data)
    print("----2")
    start = timer()
    stai_model.run()
    end = timer()

It looks like my output is of int8 type after resize. Do I need to convert it?

root@ATK-DLMP257:/opt/ui/src/apps/resource/x-linux-ai/object-detection# python3 testpy.py -m /home/root/best_integer_quant.nb -i /home/root/hander1_1.jpg -l /home
/root/best.txt
Loading dynamically: /usr/lib/libstai_mpu_ovx.so.6
[OVX]: Loading nbg model
**Input node: 0 -Input_name: -Input_dims:4 - input_type:float16 -Input_shape:(1, 640, 640, 3)
**Output node: 0 -Output_name: -Output_dims:3 -  Output_type:float16 -Output_shape:(1, 11, 8400)
640
640
----1
Dtype after resize:  uint8
Shape after resize:  (1, 640, 640, 3)
----1test
float16
----2test
Segmentation fault (core dumped)
root@ATK-DLMP257:/opt/ui/src/apps/resource/x-linux-ai/object-detection#

Best regards

Charles Fan

Julian E. · ‎2025-10-27

Hi @fanronghua0123456,

Our expert took a look and did tests with your zip. It seems that you need to convert the input to float32 even if the input on the model is float16 because it is badly supported by the stedgeai core.

So something like this:

    # Reading input image                                                                                   
    input_width = input_tensor_shape[1]                                                                     
    input_height = input_tensor_shape[2]                                                                    

    input_image = Image.open(args.image).resize((input_width,input_height))                                 
    input_data = np.expand_dims(input_image, axis=0)                                                        
    if input_tensor_dtype == np.float32:                                                                    
        print("input data float")                                                                           
        input_data = (np.float32(input_data) - args.input_mean) /args.input_std                             

<strong>    if input_tensor_dtype == np.float16:                                                                    </strong>
<strong>        print("input data float")                                                                           </strong>
<strong>        input_data = (np.float32(input_data) - args.input_mean) /args.input_std   </strong>                          

    img_array_after = np.array(input_data)                                                                  
    print("Dtype after resize: ", img_array_after.dtype)                                                    
    print("Shape after resize: ", img_array_after.shape)                        
    stai_model.set_input(0, input_data)                         
    start = timer()                                                                                         
    stai_model.run()                                                                                        
    end = timer()

And this is the output that you should get:

root@stm32mp2-e3-c3-c9:~/test_mobilenet_alexis/yolo# python3 test.py -i hander1_1.jpg -m best_integer_quant.
nb -l best.txt 
Loading dynamically: /usr/lib/libstai_mpu_ovx.so.6
[OVX]: Loading nbg model
**Input node: 0 -Input_name: -Input_dims:4 - input_type:float16 -Input_shape:(1, 640, 640, 3)
**Output node: 0 -Output_name: -Output_dims:3 -  Output_type:float16 -Output_shape:(1, 11, 8400)
input data float
Dtype after resize:  float32
Shape after resize:  (1, 640, 640, 3)
Inference time:  108.06444752961397 ms

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

fanronghua0123456 · ‎2025-10-28

@Julian E.

hi, Thanks for your reply.

I was able to infer using your code, but I still haven't received any inference results. My model outputs a shape of 4（position）+7（confidence level）, but I couldn't find any items with a confidence greater than 0, which is the same as the result I verified using C++ code.

However, I can infer using best_integer_quant.tflite file on Ubuntu without any problems.

The attachment is the inference result.

from ultralytics import YOLO

tflite_model = YOLO("/home/alientek/best_saved_model/best_integer_quant.tflite")

results = tflite_model("/home/alientek/hander4_Left_379.jpg")
plotted_img = results[0].plot()
from PIL import Image
im = Image.fromarray(plotted_img)

im.show();

so，Can I infer if there is an error in the best_integer_quant.tflite to best_integer_quant.nb file?

./stedgeai generate -m /home/alientek/best_saved_model/best_integer_quant.tflite --target stm32mp25

Thanks for your help.

Thanks

charles fan

Julian E. · ‎2025-10-29

Hi @fanronghua0123456,

Could you share your tflite models in a .zip?

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

fanronghua0123456 · ‎2025-10-29

@Julian E.

Hi, Thanks for your reply!

Of course, no problem！

I first use the following script to convert the PT file into a TF file. Attached are the files before and after the conversion and the output log information!

from onnxruntime.quantization import quantize_dynamic, QuantType, quantize_static
import onnx
from ultralytics import YOLO
import onnx
from onnxruntime.quantization import quantize_static, QuantType, CalibrationMethod, CalibrationDataReader

if __name__ == '__main__':

    model = YOLO('/home/alientek/best.pt')  # 替换为你的模型路径
    # 1. 首先导出标准 ONNX 模型
    model.export(format='tflite', imgsz=640, int8=True)

    # 2. 加载并检查模型
    # onnx_model = onnx.load('runs/detect/train12/weights/best.onnx')
    # onnx.checker.check_model(onnx_model)

    #3. 进行动态量化
    # quantize_dynamic(
    #     'runs/detect/train12/weights/best.onnx',
    #     'runs/detect/train12/weights/best_int8.onnx',
    #     weight_type=QuantType.QUInt8,
    #    # optimize_model=True
    # )

    # quantize_static(
    #     'runs/detect/train6/weights/yolo11n.onnx',
    #     'runs/detect/train6/weights/yolo11n_int8.onnx',
    #     weight_type=QuantType.QUInt8,
    #    # optimize_model=True
    # )
    print("INT8 量化完成！")

Thanks for your help!

Charles fan

Julian E. · ‎2025-10-30

Hi @fanronghua0123456,

using the saved_model.pb and this quantize script:

import tensorflow as tf
import numpy as np

def representative_dataset():
    for _ in range(10):
      data = np.random.rand(1, 640, 640, 3)
      yield [data.astype(np.float32)]

# Convert the model
converter = tf.lite.TFLiteConverter.from_saved_model("./saved_model")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

converter.inference_input_type = tf.uint8  # or tf.int8
converter.inference_output_type = tf.float32  # or tf.int8
converter.representative_dataset = representative_dataset
converter._experimental_disable_per_channel = True
tflite_model = converter.convert()

# Save the model
with open("model.tflite", 'wb') as f:
    f.write(tflite_model)

then passing the edge ai -> model.nb

and testing on the board with this script:

from stai_mpu import stai_mpu_network
from numpy.typing import NDArray
from typing import Any, List
from pathlib import Path
from PIL import Image
from argparse import ArgumentParser
from timeit import default_timer as timer
import cv2 as cv
import numpy as np
import time

def intersection(rect1, rect2):
    """
    This method return the intersection of two rectangles
    """
    rect1_x1,rect1_y1,rect1_x2,rect1_y2 = rect1[:4]
    rect2_x1,rect2_y1,rect2_x2,rect2_y2 = rect2[:4]
    x1 = max(rect1_x1,rect2_x1)
    y1 = max(rect1_y1,rect2_y1)
    x2 = min(rect1_x2,rect2_x2)
    y2 = min(rect1_y2,rect2_y2)
    return (x2-x1)*(y2-y1)

def union(rect1,rect2):
    """
    This method return the union of two rectangles
    """
    rect1_x1,rect1_y1,rect1_x2,rect1_y2 = rect1[:4]
    rect2_x1,rect2_y1,rect2_x2,rect2_y2 = rect2[:4]
    rect1_area = (rect1_x2-rect1_x1)*(rect1_y2-rect1_y1)
    rect2_area = (rect2_x2-rect2_x1)*(rect2_y2-rect2_y1)
    return rect1_area + rect2_area - intersection(rect1,rect2)

def iou(rect1,rect2):
    """
    This method compute IoU
    """
    return intersection(rect1,rect2)/union(rect1,rect2)

def get_results(stai_mpu_model, threshold, iou_threshold):
    # Lists to hold respective values while unwrapping.
    base_objects_list = []
    final_dets = []

    # Output (0-4: box coordinates, 5-84: COCO classes confidence)
    output = stai_mpu_model.get_output(index=0)
    output = np.transpose(np.squeeze(output))
    #output = np.squeeze(output)
    print("output shape: ", output.shape)

    # Split output -> [0..3]: box coordinates, [5]: confidence level
    confidence_level = output[:, 4:]  # Shape: (1344, 1)
    print("confidence shape: ", confidence_level.shape)
    print(np.max(confidence_level, axis=0))
    print(np.max(confidence_level, axis=1))
    indices = np.where(confidence_level > threshold)[0]
    print(indices)
    filtered_output = output[indices]
    print(filtered_output.shape)

    for i in range(filtered_output.shape[0]):
        x_center, y_center, width, height = filtered_output[i][:4]
        left = (x_center - width/2)
        top = (y_center - height/2)
        right = (x_center + width/2)
        bottom = (y_center + height/2)
        score = np.max(filtered_output[i][4:]) # filtered_output[i][4]
        class_id = 0
        base_objects_list.append([left, top, right, bottom, score, class_id])

    # Do NMS
    base_objects_list.sort(key=lambda x: x[4], reverse=True)
    while len(base_objects_list)>0:
        final_dets.append(base_objects_list[0])
        base_objects_list = [objects for objects in base_objects_list if iou(objects,base_objects_list[0]) < iou_threshold]

    return final_dets

def load_labels(filename):
    with open(filename, 'r') as f:
        return [line.strip() for line in f.readlines()]

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('-i','--image', help='image to be classified.')
    parser.add_argument('-m','--model_file',help='model to be executed.')
    parser.add_argument('-l','--label_file', help='name of labels file.')
    parser.add_argument('--input_mean', default=127.5, help='input_mean')
    parser.add_argument('--input_std', default=127.5,help='input stddev')
    args = parser.parse_args()

    stai_model = stai_mpu_network(model_path=args.model_file, use_hw_acceleration=True)
    # Read input tensor information
    num_inputs = stai_model.get_num_inputs()
    input_tensor_infos = stai_model.get_input_infos()
    for i in range(0, num_inputs):
        input_tensor_shape = input_tensor_infos[i].get_shape()
        input_tensor_name = input_tensor_infos[i].get_name()
        input_tensor_rank = input_tensor_infos[i].get_rank()
        input_tensor_dtype = input_tensor_infos[i].get_dtype()
        print("**Input node: {} -Input_name:{} -Input_dims:{} - input_type:{} -Input_shape:{}".format(i, input_tensor_name,
                                                                                                    input_tensor_rank,
                                                                                                    input_tensor_dtype,
                                                                                                    input_tensor_shape))
        if input_tensor_infos[i].get_qtype() == "staticAffine":
            # Reading the input scale and zero point variables
            input_tensor_scale = input_tensor_infos[i].get_scale()
            input_tensor_zp = input_tensor_infos[i].get_zero_point()
        if input_tensor_infos[i].get_qtype() == "dynamicFixedPoint":
            # Reading the dynamic fixed point position
            input_tensor_dfp_pos = input_tensor_infos[i].get_fixed_point_pos()


    # Read output tensor information
    num_outputs = stai_model.get_num_outputs()
    output_tensor_infos = stai_model.get_output_infos()
    for i in range(0, num_outputs):
        output_tensor_shape = output_tensor_infos[i].get_shape()
        output_tensor_name = output_tensor_infos[i].get_name()
        output_tensor_rank = output_tensor_infos[i].get_rank()
        output_tensor_dtype = output_tensor_infos[i].get_dtype()
        print("**Output node: {} -Output_name:{} -Output_dims:{} -  Output_type:{} -Output_shape:{}".format(i, output_tensor_name,
                                                                                                        output_tensor_rank,
                                                                                                        output_tensor_dtype,
                                                                                                        output_tensor_shape))
        if output_tensor_infos[i].get_qtype() == "staticAffine":
            # Reading the output scale and zero point variables
            output_tensor_scale = output_tensor_infos[i].get_scale()
            output_tensor_zp = output_tensor_infos[i].get_zero_point()
        if output_tensor_infos[i].get_qtype() == "dynamicFixedPoint":
            # Reading the dynamic fixed point position
            output_tensor_dfp_pos = output_tensor_infos[i].get_fixed_point_pos()

    # Reading input image
    input_width = input_tensor_shape[1]
    print(input_width)
    input_height = input_tensor_shape[2]
    print(input_height)
    input_image = Image.open(args.image).resize((input_width,input_height))
    input_data = np.expand_dims(input_image, axis=0)
    if input_tensor_dtype == np.float32:
        input_data = (np.float32(input_data) - args.input_mean) /args.input_std
    print("----1")
    img_array_after = np.array(input_data)
    print("Dtype after resize: ", img_array_after.dtype)                                                                                                    
    print("Shape after resize: ", img_array_after.shape)
    print("----1test")
    if input_tensor_dtype == np.float32:
         print("float32")
    if input_tensor_dtype == np.float16:
         print("float16")
         #input_data = (np.float32(input_data) - args.input_mean) /args.input_std
         input_data = np.float32(input_data)
    print("----2test")
    stai_model.set_input(0, input_data)
    print("----2")
    start = timer()
    stai_model.run()
    end = timer()

    print("Inference time: ", (end - start) *1000, "ms")
    final_dets = get_results(stai_model, 0.5, 0.5)
    print(final_dets)

we get this:

Loading dynamically: /usr/lib/libstai_mpu_ovx.so.6
[OVX]: Loading nbg model
**Input node: 0 -Input_name: -Input_dims:4 - input_type:uint8 -Input_shape:(1, 640, 640, 3)
**Output node: 0 -Output_name: -Output_dims:3 -  Output_type:float16 -Output_shape:(1, 11, 8400)
640
640
----1
Dtype after resize:  uint8
Shape after resize:  (1, 640, 640, 3)
----1test
----2test
----2
Inference time:  118.66035591810942 ms
output shape:  (8400, 11)
confidence shape:  (8400, 7)
[0.75878906 0.01785278 0.54003906 0.         0.         0.0044632
0.        ]
[0. 0. 0. ... 0. 0. 0.]
[8250 8251 8252 8270 8272 8291]
(6, 11)
[[0.3995361328125, 0.265625, 0.7967529296875, 1.001953125, 0.75878906, 0]]

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

fanronghua0123456 · ‎2025-10-31

Thanks your help! it's running.