cannot optimize and quantize simple MNIST keras model on ST developer cloud AI

SM19X · ‎2025-08-16

I get the following error.

Attached are the model and npz files for quantization.

Quantization with int8 shows following errors:

Executing with: {'model': '/tmp/quantization-service/280b5700-3751-4fba-a647-f57a80bf0b73/stm32_mnist.keras', 'data': None, 'input_type': tf.int8, 'output_type': tf.int8, 'optimization': <Optimize.DEFAULT: 'DEFAULT'>, 'output': '/tmp/quantization-service/280b5700-3751-4fba-a647-f57a80bf0b73', 'disable_per_channel': False}
Only h5 file is supported

2025-08-16 21:27:05.333202: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-16 21:27:05.333800: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-16 21:27:05.336349: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-16 21:27:05.344139: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1755379625.357383 30 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1755379625.361184 30 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-08-16 21:27:05.374756: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

.h5 quantization works only without any dataset (as npz).

On using the npz dataset, quantization fails without any error message.

Please help.

What should be the exact contents of the npz file for quantization?

What should be the model format for best compatibility?

Julian E. · ‎2025-08-18

Hello @SM19X,

The content of the npz file is explain in the dev cloud doc ( (i) icon on the right of the npz import).

The example code with random x data is as follow:

from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np

nb_test_files = 10

test_data_gen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_data_gen.flow_from_directory('./',
                                                   target_size=(28, 28),
                                                   batch_size=nb_test_files,
                                                   class_mode='categorical')
test_data = next(test_generator)
x_test, y_test = test_data


# or to have a simple file
np.savez("mydata.npz",x_test=x_test,y_test=y_test)

I tried opening your npz with np.load and got an error message saying it is empty. Maybe something went wrong.

I also could not use the quantize in the dev cloud, even without data...

You can try to regenerate a npz file with x data and y labels and try again in the dev cloud.

Please tell me if this work on your side.

In any case, you can also do it manually.

As of today, the quantization provided by ST is not something special.

For example, the following code should help you:

import tensorflow as tf
import numpy as np

# 1. Load your Keras model
model = tf.keras.models.load_model("stm32_mnist.h5")

# 2. Load your npz data
data = np.load("mydata.npz")
x_data = data["x_test"]   # replace with actual key in your npz file
y_data = data["y_test"]   # replace with actual key in your npz file

# 3. Define representative dataset generator for calibration
def representative_dataset():
    for i in range(len(x_data)):
        # yield one sample at a time, in a batch of size 1
        yield [x_data[i:i+1].astype(np.float32)]

# 4. Convert to TFLite with quantization
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset

# Optionally enforce full integer quantization (if hardware requires it)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

# 5. Convert model
tflite_quant_model = converter.convert()

# 6. Save to file
with open("model_quant.tflite", "wb") as f:
    f.write(tflite_quant_model)

print("Quantized TFLite model saved as model_quant.tflite")

(the npz used was created with the first piece of code)

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

Julian E. · ‎2025-08-18