Issue Converting tf.matmul Model to .tflite for STM32MP257F-EV1 NPU Acceleration

Justin_wu · ‎2025-03-20

Hello everyone,

I am trying to convert a tf.matmul operation into a .tflite model and deploy it on the STM32MP257F-EV1 NPU for acceleration. My inference inputs are (1, 1, 384) and (1, 51865, 384). Below is my code:

import tensorflow as tf
import numpy as np

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)

# Create a new standalone model
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=output)
matmul_model.summary()

# Define the inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32) * 3
output_embeddings = tf.random.normal((1, 51865, 384), dtype=tf.float32) * 3

# Call the model with separate arguments
output = matmul_model([hidden_states, output_embeddings])
print(output.shape)

matmul_model.save("matmul_saved_model", save_format="tf")

# Load the saved encoder model
matmul_model = tf.keras.models.load_model("matmul_saved_model")

# Function to generate representative dataset
def representative_data_gen():
for _ in range(10): # 10 samples for calibration
hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
output_embeddings = np.random.normal(size=(1, 51865, 384)).astype(np.float32)
yield [hidden_states, output_embeddings]

# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
model = tf.keras.models.load_model(model_path)
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Enable post-training quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Provide the representative dataset for proper scaling
converter.representative_dataset = representative_data_gen
converter._experimental_disable_per_channel = True
converter._experimental_new_quantizer = False

# Ensure we use 8-bit asymmetric quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

# Keep input and output in float32 for compatibility
converter.inference_input_type = tf.float32
converter.inference_output_type = tf.float32

# Convert and save the TFLite model
tflite_model = converter.convert()
with open(output_tflite_path, "wb") as f:
f.write(tflite_model)

print(f"{output_tflite_path} saved successfully.")

# Convert the model with quantization
convert_and_quantize_to_tflite("matmul_saved_model", "matmul_int8.tflite", representative_data_gen)

Following the ST Edge AI tool guide, I used the command:
./stedgeai generate --target stm32mp25 -m matmul_int8.tflite --input-data-type float32 --output-data-type float32

However, I encountered the following error:
ST Edge AI Core v2.0.0-20049
PASS: 0%| | 0/2 [00:00<?, ?it/s]E 17:17:28 Acuity need 2 input files, but got 1

INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)

Does anyone know what might be causing this issue?
Is it possible to deploy a model with multiple inputs on the NPU?
Or am I missing something in my conversion process?

Any insights or suggestions would be greatly appreciated!
Thank you in advance for your help!

Julian E. · ‎2025-04-08

Hello @Justin_wu,

Sorry for the late answer, I had big troubles with my pc so I took me quite some time to test your issue out.

It seems that in your code:

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)

the transpose_b=true is not supported.

If you do a transpose to 'output_embeddings' and remove 'transpose_b', it works.

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(384,51865), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings)

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Julian E. · ‎2025-04-08

Hello @Justin_wu,

Also, we find a bug thanks to your code, please avoid naming any of your input with "output".

It should be fixed, but for now, make sure not to use it.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Justin_wu · ‎2025-04-11

Hello Julian E.,

I appreciate your reply and clarification a lot! I follow your suggestion, change my code to the following:

hidden_states = tf.keras.layers.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.layers.Input(shape=(384, 51865), dtype=tf.float32)
result = tf.matmul(hidden_states, output_embeddings)
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=result)

# Define the dummpy inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32)
output_embeddings = tf.random.normal((1, 384, 51865), dtype=tf.float32)
result = matmul_model([hidden_states, output_embeddings])
matmul_model.save(f"matmul_saved_model", save_format="tf")

def representative_data_gen():
    for _ in range(10):  # 100 samples for calibration
        hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
        output_embeddings = np.random.normal(size=(1, 384, 51865)).astype(np.float32)
        yield [hidden_states, output_embeddings]

# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
    model = tf.keras.models.load_model(model_path)
    converter = tf.lite.TFLiteConverter.from_keras_model(model)

    # Enable post-training quantization
    converter.optimizations = [tf.lite.Optimize.DEFAULT]

    # Provide the representative dataset for proper scaling
    converter.representative_dataset = representative_data_gen
    converter._experimental_disable_per_channel = True
    converter._experimental_new_quantizer = False

    # Ensure we use 8-bit asymmetric quantization
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

    # Keep input and output in float32 for compatibility
    converter.inference_input_type = tf.float32
    converter.inference_output_type = tf.float32

    # Convert and save the TFLite model
    tflite_model = converter.convert()
    with open(output_tflite_path, "wb") as f:
        f.write(tflite_model)
    
    print(f"{output_tflite_path} saved successfully.")

# Convert the model with quantization
convert_and_quantize_to_tflite(f"matmul_saved_model", f"matmul_int8.tflite", representative_data_gen)

But I still get the same error when I try to convert .tflite to .nb format by the code:

$ ./stedgeai generate --target stm32mp25 -m ./matmul_int8.tflite --input-data-type float32 --output-data-type float32

ST Edge AI Core v2.0.0-20049
PASS:   0%|                                                                                                                                                                        | 0/2 [00:00<?, ?it/s]E 14:07:53 Acuity need 2 input files, but got 1
                                                                                                                                                                                                         
INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)

So I wonder whether the way I use to generate .tflite is wrong? Or do I use the wrong command when using ST Edge AI tool to convert .nb format? Thanks a lot!

Best regards,

Justin

Julian E. · ‎2025-04-15

Hello @Justin_wu,

I use:

tensorflow version 2.8.3
numpy version 1.23.4
ST Edge AI Core v2.0.0-20049

I copied and executed your code to generate the .tflite and ran your exact command in a git bash terminal and it worked.

I attach the complete folder to this message.

You can look at /st_ai_output/network_generate_report.txt to have the output of my generate command.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Justin_wu · ‎2025-04-19

Hello Julian E.,

Thanks for you test again! But it's weird that I still have problems when try to convert the matmul_int8.tflite, even though I use the .tflite in you test_user.zip.

So I wonder whether I use the tool in a wrong way or didn't install correctly. I cannot find the files in the st_ai_ws repository inside your test_user.zip, even after I convert other .tflite models correctly to .nb, those files won't be generated.

Here's how I install and use the tools in detail:

I use stedgeai-linux-onlineinstaller in the en.stedgeai-lin.zip got from https://www.st.com/en/development-tools/stedgeai-core.html to install the ST Edge AI tool, and between the installing process, it ask me to provide st neural-art archive, so I upload the en.stedgeai-stneuralart-10.0.0.zip.

After finishing installation, I use command

./stedgeai generate --target stm32mp25 -m ./matmul_int8.tflite --input-data-type float32 --output-data-type float32

under the path ~/stedgeai_tool/2.0/Utilities/linux

When I convert some model, say, model1.tflite into model1.nb, it generates ./stm32ai_ws/report_mode1.json. It also generates mode1.nb and two empty folder /inc and /src under ./stm32ai_output.

May I ask you how you get the files under your ./stm32ai_ws and ./stm32ai_output inside the test_user.zip?

Maybe there is something wrong when I install the ST Edge AI tool? Thanks!

Sorry for bothering you again.

Best regards,

Justin

Julian E. · ‎2025-04-22

Hello @Justin_wu,

I made a mistake sorry, I did not use the stm32mp2 as a target.

Because mcu, mpu and npu are different, the supported layers are different and you can get different error.

The stedge ai core for MPU is supported only on linux. I need to find a pc with linux to test it out.

I'll update you as soon as possible.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Julian E. · ‎2025-04-24

Hello @Justin_wu,

So the issue is that --input-data-type and --output-data-type are not supported for stm32mp

(https://stedgeai-dc.st.com/assets/embedded-docs/command_line_interface.html#ref_input_data_type_option)

Without these argument, the model do not create errors with the stedgeaicore

have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Justin_wu · ‎2025-05-23

Hello @Justin_wu,

Thanks for your replies, it did succeed, but the accuracy drop seems too high to be acceptable.

Actually, when I tried to convert other .tflite models containing linear, convolution, or activation layers, if I didn't add --input-data-type float32 and --output-data-type float32, the converted .nb model suffered an unacceptable accuracy drop.

Therefore, I wonder whether the two arguments are effective for --target stm32mp2?

May I ask you the official have an SOP or guideline to generate .tflite model and convert to .nb? Because I have been curious about whether the way I used is the correct way to generate .tflite, like mobilenet_v2_1.0_224_int8_per_tensor.tflite in the official github https://github.com/STMicroelectronics/meta-st-x-linux-ai/tree/main/recipes-samples/image-classification/models/files

Thanks!

Best regards,

Justin