Error When Converting TensorFlow Whisper Encoder to .nb Using ST Edge AI Tool
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-02-28 9:39 PM - edited ‎2025-02-28 9:40 PM
I am a new user of the STM32MP257F-EV1 board and am not very familiar with the ST Edge AI tool. I am currently trying to extract the TensorFlow Whisper encoder, convert it to an INT8 .tflite model using post-training quantization (PTQ), and then use the ST Edge AI tool to convert it into the .nb format for NPU acceleration.
I followed this procedure to perform the quantization:
config = model.get_config()
encoder = model.get_encoder()
# Define a Keras model that takes input features and outputs encoder embeddings input_features = tf.keras.Input(shape=(config['num_mel_bins'], 2*config['max_source_positions']), dtype=tf.float32)
encoder_output = encoder(input_features)
encoder_model = tf.keras.Model(inputs=input_features, outputs=encoder_output) encoder_model.save("whisper_encoder_saved_model", save_format="tf")
def representative_data_gen():
for _ in range(10:(
data = np.random.normal(size=(1,config['num_mel_bins'],
2* config['max_source_positions'])).astype(np.float32)
yield [data]
def convert_and_quantize_to_tflite(model_path, output_tflite_path, representative_data_gen:(
model = tf.keras.models.load_model(model_path)
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen converter._experimental_disable_per_channel = True
converter._experimental_new_quantizer = False
# Ensure 8-bit asymmetric quantization
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# Keep input and output in float32 for compatibility
converter.inference_input_type = tf.float32
converter.inference_output_type = tf.float32
# Convert and save the TFLite model
tflite_model = converter.convert()
with open(output_tflite_path, "wb") as f:
f.write(tflite_model)
print(f"{output_tflite_path} saved successfully.")
convert_and_quantize_to_tflite("whisper_encoder_saved_model", "whisper_encoder_int8.tflite", representative_data_gen)
After generating the whisper_encoder_int8.tflite model, I used the following command to convert it to .nb format:
However, I encountered the following error:
I am unsure what is causing this error. Could it be an issue with how I generated the .tflite model, maybe because the encoder has 4 layers? Or is there a limitation with the ST Edge AI tool regarding tensor shapes or dimensions?
I would appreciate any insights or guidance on resolving this issue.
Thank you!
- Labels:
-
ST Edge AI Core
-
X-LINUX-AI
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-03-04 6:46 AM
Hello @Justin_wu ,
At first glance, the issue could be related to shape mismatches or dynamic shapes that are not interpretable.
It could be due to post-processing layers being removed by ST Edge AI, leading to errors, or certain layers not being supported on this model.
Additionally, it seems that the model quantization is done per-channel rather than per-tensor, which can significantly impact performance by running the model on the GPU instead of the NPU.
I will try to replicate to find where it comes from and update you.
Have a good day,
Julian
In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-03-05 8:46 PM
Hi @Julian E. ,
Thank you very much for your assistance. I truly appreciate your support. I have a quick question regarding the quantization process: Could you please clarify how you determine that the quantization is performed on a per-channel basis? My understanding is that if I set converter._experimental_disable_per_channel = True, the converter will perform per-tensor quantization. Is that correct?
Thank you again for your help.
Best regards,
Wu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-03-07 1:26 AM
Hello @Justin_wu,
You are correct, I misread.
Additionally, you can check it in neutron for example:
- If scale and zero_point are scalars (empty shape []), it's per-tensor quantization.
- If scale and zero_point have a shape like [C], it's per-channel quantization.
For exemple:
On the left is PerChannel, on the right is PerTensor
I plan to replicate your test, but for the moment I am blocked by my proxy...
I will update you if I get anything useful.
Have a good day,
Julian
In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
‎2025-03-10 11:17 PM
