cancel
Showing results for 
Search instead for 
Did you mean: 

Issue Converting tf.matmul Model to .tflite for STM32MP257F-EV1 NPU Acceleration

Justin_wu
Associate II

Hello everyone,

I am trying to convert a tf.matmul operation into a .tflite model and deploy it on the STM32MP257F-EV1 NPU for acceleration. My inference inputs are (1, 1, 384) and (1, 51865, 384). Below is my code:

import tensorflow as tf
import numpy as np

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)

# Create a new standalone model
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=output)
matmul_model.summary()

# Define the inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32) * 3
output_embeddings = tf.random.normal((1, 51865, 384), dtype=tf.float32) * 3

# Call the model with separate arguments
output = matmul_model([hidden_states, output_embeddings])
print(output.shape)

matmul_model.save("matmul_saved_model", save_format="tf")

# Load the saved encoder model
matmul_model = tf.keras.models.load_model("matmul_saved_model")

# Function to generate representative dataset
def representative_data_gen():
for _ in range(10): # 10 samples for calibration
hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
output_embeddings = np.random.normal(size=(1, 51865, 384)).astype(np.float32)
yield [hidden_states, output_embeddings]

# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
model = tf.keras.models.load_model(model_path)
converter = tf.lite.TFLiteConverter.from_keras_model(model)

# Enable post-training quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]

# Provide the representative dataset for proper scaling
converter.representative_dataset = representative_data_gen
converter._experimental_disable_per_channel = True
converter._experimental_new_quantizer = False

# Ensure we use 8-bit asymmetric quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

# Keep input and output in float32 for compatibility
converter.inference_input_type = tf.float32
converter.inference_output_type = tf.float32

# Convert and save the TFLite model
tflite_model = converter.convert()
with open(output_tflite_path, "wb") as f:
f.write(tflite_model)

print(f"{output_tflite_path} saved successfully.")

# Convert the model with quantization
convert_and_quantize_to_tflite("matmul_saved_model", "matmul_int8.tflite", representative_data_gen)

Following the ST Edge AI tool guide, I used the command:
./stedgeai generate --target stm32mp25 -m matmul_int8.tflite --input-data-type float32 --output-data-type float32

However, I encountered the following error:
ST Edge AI Core v2.0.0-20049
PASS: 0%| | 0/2 [00:00<?, ?it/s]E 17:17:28 Acuity need 2 input files, but got 1

INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)

Does anyone know what might be causing this issue?
Is it possible to deploy a model with multiple inputs on the NPU?
Or am I missing something in my conversion process?

Any insights or suggestions would be greatly appreciated!
Thank you in advance for your help!





3 REPLIES 3
Julian E.
ST Employee

Hello @Justin_wu,

Sorry for the late answer, I had big troubles with my pc so I took me quite some time to test your issue out.

 

It seems that in your code:

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(51865, 384), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings, transpose_b=True)

the transpose_b=true is not supported.

 

If you do a transpose to 'output_embeddings' and remove 'transpose_b', it works.

hidden_states = tf.keras.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.Input(shape=(384,51865), dtype=tf.float32)
output = tf.matmul(hidden_states, output_embeddings)

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hello @Justin_wu,

 

Also, we find a bug thanks to your code, please avoid naming any of your input with "output".

It should be fixed, but for now, make sure not to use it.

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Hello Julian E.,

I appreciate your reply and clarification a lot!  I follow your suggestion, change my code to the following:

hidden_states = tf.keras.layers.Input(shape=(1, 384), dtype=tf.float32)
output_embeddings = tf.keras.layers.Input(shape=(384, 51865), dtype=tf.float32)
result = tf.matmul(hidden_states, output_embeddings)
matmul_model = tf.keras.Model(inputs=[hidden_states, output_embeddings], outputs=result)

# Define the dummpy inputs
hidden_states = tf.random.normal((1, 1, 384), dtype=tf.float32)
output_embeddings = tf.random.normal((1, 384, 51865), dtype=tf.float32)
result = matmul_model([hidden_states, output_embeddings])
matmul_model.save(f"matmul_saved_model", save_format="tf")

def representative_data_gen():
    for _ in range(10):  # 100 samples for calibration
        hidden_states = np.random.normal(size=(1, 1, 384)).astype(np.float32)
        output_embeddings = np.random.normal(size=(1, 384, 51865)).astype(np.float32)
        yield [hidden_states, output_embeddings]

# Function to convert and quantize models to .tflite
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen):
    model = tf.keras.models.load_model(model_path)
    converter = tf.lite.TFLiteConverter.from_keras_model(model)

    # Enable post-training quantization
    converter.optimizations = [tf.lite.Optimize.DEFAULT]

    # Provide the representative dataset for proper scaling
    converter.representative_dataset = representative_data_gen
    converter._experimental_disable_per_channel = True
    converter._experimental_new_quantizer = False

    # Ensure we use 8-bit asymmetric quantization
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

    # Keep input and output in float32 for compatibility
    converter.inference_input_type = tf.float32
    converter.inference_output_type = tf.float32

    # Convert and save the TFLite model
    tflite_model = converter.convert()
    with open(output_tflite_path, "wb") as f:
        f.write(tflite_model)
    
    print(f"{output_tflite_path} saved successfully.")

# Convert the model with quantization
convert_and_quantize_to_tflite(f"matmul_saved_model", f"matmul_int8.tflite", representative_data_gen)

 

But I still get the same error when I try to convert .tflite to .nb format by the code:

$ ./stedgeai generate --target stm32mp25 -m ./matmul_int8.tflite --input-data-type float32 --output-data-type float32

ST Edge AI Core v2.0.0-20049
PASS:   0%|                                                                                                                                                                        | 0/2 [00:00<?, ?it/s]E 14:07:53 Acuity need 2 input files, but got 1
                                                                                                                                                                                                         
INTERNAL ERROR: ('Acuity need 2 input files, but got 1', None)

 

So I wonder whether the way I use to generate .tflite is wrong? Or do I use the wrong command when using ST Edge AI tool to convert .nb format?   Thanks a lot!

 

Best regards,

Justin