2025-03-20 3:16 AM
Hello everyone,
I am trying to convert a tf.matmul operation into a .tflite model and deploy it on the STM32MP257F-EV1 NPU for acceleration. My inference inputs are (1, 1, 384) and (1, 51865, 384). Below is my code:
import tensorflow as tf
import numpy as np
hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)
# Create a new standalone model
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=output)
matmul_model.summary()
# Define the inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32) * 3
output_embeddings = tf.random.normal((1, 51865, 384), dtype=tf.float32) * 3
# Call the model with separate arguments
output = matmul_model([hidden_states, output_embeddings])
print(output.shape)
matmul_model.save("matmul_saved_model", save_format="tf")
# Load the saved encoder model
matmul_model = tf.keras.models.load_model("matmul_saved_model")
# Function to generate representative dataset
def representative_data_gen():
for _ in range(10): # 10 samples for calibration
hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
output_embeddings = np.random.normal(size=(1, 51865, 384)).astype(np.float32)
yield [hidden_states, output_embeddings]
# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
model = tf.keras.models.load_model(model_path)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
# Enable post-training quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Provide the representative dataset for proper scaling
converter.representative_dataset = representative_data_gen
converter._experimental_disable_per_channel = True
converter._experimental_new_quantizer = False
# Ensure we use 8-bit asymmetric quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Keep input and output in float32 for compatibility
converter.inference_input_type = tf.float32
converter.inference_output_type = tf.float32
# Convert and save the TFLite model
tflite_model = converter.convert()
with open(output_tflite_path, "wb") as f:
f.write(tflite_model)
print(f"{output_tflite_path} saved successfully.")
# Convert the model with quantization
convert_and_quantize_to_tflite("matmul_saved_model", "matmul_int8.tflite", representative_data_gen)
Following the ST Edge AI tool guide, I used the command:
./stedgeai generate --target stm32mp25 -m matmul_int8.tflite --input-data-type float32 --output-data-type float32
However, I encountered the following error:
ST Edge AI Core v2.0.0-20049
PASS: 0%| | 0/2 [00:00<?, ?it/s]E 17:17:28 Acuity need 2 input files, but got 1
INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)
Does anyone know what might be causing this issue?
Is it possible to deploy a model with multiple inputs on the NPU?
Or am I missing something in my conversion process?
Any insights or suggestions would be greatly appreciated!
Thank you in advance for your help!