cancel
Showing results for 
Search instead for 
Did you mean: 

Issue Converting tf.matmul Model to .tflite for STM32MP257F-EV1 NPU Acceleration

Justin_wu
Associate II

Hello everyone,

I am trying to convert a tf.matmul operation into a .tflite model and deploy it on the STM32MP257F-EV1 NPU for acceleration. My inference inputs are (1, 1, 384) and (1, 51865, 384). Below is my code:

import tensorflow as tf
import numpy as np

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)

# Create a new standalone model
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=output)
matmul_model.summary()

# Define the inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32) * 3
output_embeddings = tf.random.normal((1, 51865, 384), dtype=tf.float32) * 3

# Call the model with separate arguments
output = matmul_model([hidden_states, output_embeddings])
print(output.shape)

matmul_model.save("matmul_saved_model", save_format="tf")

# Load the saved encoder model
matmul_model = tf.keras.models.load_model("matmul_saved_model")

# Function to generate representative dataset
def representative_data_gen():
for _ in range(10): # 10 samples for calibration
hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
output_embeddings = np.random.normal(size=(1, 51865, 384)).astype(np.float32)
yield [hidden_states, output_embeddings]

# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
model = tf.keras.models.load_model(model_path)
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Enable post-training quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Provide the representative dataset for proper scaling
converter.representative_dataset = representative_data_gen
converter._experimental_disable_per_channel = True
converter._experimental_new_quantizer = False

# Ensure we use 8-bit asymmetric quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

# Keep input and output in float32 for compatibility
converter.inference_input_type = tf.float32
converter.inference_output_type = tf.float32

# Convert and save the TFLite model
tflite_model = converter.convert()
with open(output_tflite_path, "wb") as f:
f.write(tflite_model)

print(f"{output_tflite_path} saved successfully.")

# Convert the model with quantization
convert_and_quantize_to_tflite("matmul_saved_model", "matmul_int8.tflite", representative_data_gen)

Following the ST Edge AI tool guide, I used the command:
./stedgeai generate --target stm32mp25 -m matmul_int8.tflite --input-data-type float32 --output-data-type float32

However, I encountered the following error:
ST Edge AI Core v2.0.0-20049
PASS: 0%| | 0/2 [00:00<?, ?it/s]E 17:17:28 Acuity need 2 input files, but got 1

INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)

Does anyone know what might be causing this issue?
Is it possible to deploy a model with multiple inputs on the NPU?
Or am I missing something in my conversion process?

Any insights or suggestions would be greatly appreciated!
Thank you in advance for your help!





0 REPLIES 0