ST Edge AI Dev Cloud Quantization not working

BM11 · ‎2025-07-31

Hi,

I've been using ST Edge AI Dev Cloud to benchmark my human activity recognition models on some ST platforms and I would like to do the same for the STM32N6570-DK but it only accepts quantized models and the quantization option throws and error:

Executing with: {'model': '/tmp/quantization-service/0f29903b-6961-4c97-bde9-83219e06f56f/gmp_wl_24_human_activity_recognition_WISDM.h5', 'data': None, 'input_type': tf.float32, 'output_type': tf.int8, 'optimization': <Optimize.DEFAULT: 'DEFAULT'>, 'output': '/tmp/quantization-service/0f29903b-6961-4c97-bde9-83219e06f56f', 'disable_per_channel': False}
Invalid value for argument `reduction`. Expected one of {'mean', None, 'sum', 'none', 'sum_over_batch_size', 'mean_with_sample_weight'}. Received: reduction=auto

2025-07-31 09:19:58.238638: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-07-31 09:19:58.272098: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/usr/local/lib/python3.9/site-packages/keras/src/optimizers/base_optimizer.py:86: UserWarning: Argument `decay` is no longer supported and will be ignored.
warnings.warn(

I have tried both with my custom models and the ones built by ST, but none seem to work.

Important note is that quantization was working about 2 months ago, but then suddenly stopped.

Is there a fix for this?

Thanks in advance,

Best regards.

hamitiya · ‎2025-07-31

Hello @BM11

I've deployed an update in ST Edge AI Developer Cloud in order to use either the latest version of Tensorflow or use legacy, as described in the link I've shared with you.

By ticking the "Use Keras Legacy" checkbox you should be able to quantize your model as you've done before. If you use a newer version of Keras when you export your model, you can try again without.

Feel free to confirm if it works now on your end

Best regards,

Yanis

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

hamitiya · ‎2025-07-31

Hello @BM11

ST Edge AI Developer Cloud quantization version uses tensorflow-cpu 2.18.0 (previously, it was 2.15.0) in order to be aligned with the latest version embedded in ST Edge AI Core.

It seems that some regressions are occuring based on tensorflow version.

I am investigating the issue.

It is possible that we have to add a new argument in order to use legacy. (see: What's new in TensorFlow 2.16 — The TensorFlow Blog) and https://github.com/tensorflow/tensorflow/issues/72388

Best regards,

Yanis

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

hamitiya · ‎2025-07-31

Hello @BM11

I've deployed an update in ST Edge AI Developer Cloud in order to use either the latest version of Tensorflow or use legacy, as described in the link I've shared with you.

By ticking the "Use Keras Legacy" checkbox you should be able to quantize your model as you've done before. If you use a newer version of Keras when you export your model, you can try again without.

Feel free to confirm if it works now on your end

Best regards,

Yanis

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

BM11 · ‎2025-08-01

Hello @hamitiya,

I can confirm that the human activity recognition models implemented by ST can be quantized when "Use Keras Legacy" option is selected, as they were before.

Unfortunately, my custom models with LSTM and GRU layers are now throwing a different error:

Executing with: {'model': '/tmp/quantization-service/493a3e00-da79-405d-a8c3-077fd43a2a59/gru_20_best_model.h5', 'data': None, 'input_type': tf.float32, 'output_type': tf.int8, 'optimization': <Optimize.DEFAULT: 'DEFAULT'>, 'output': '/tmp/quantization-service/493a3e00-da79-405d-a8c3-077fd43a2a59', 'disable_per_channel': False}
No data specified, enabling fake quantization
Converting original model to TFLite...
Variable constant folding is failed. Please consider using enabling `experimental_enable_resource_variables` flag in the TFLite converter object. For example, converter.experimental_enable_resource_variables = True/app/quantizer/cli.py:53:1: error: 'tf.TensorListReserve' op requires element_shape to be static during TF Lite transformation pass
res = quantize_from_local_file(
^
<unknown>:0: note: loc(fused["StatefulPartitionedCall:", "StatefulPartitionedCall"]): called from
/app/quantizer/cli.py:53:1: error: failed to legalize operation 'tf.TensorListReserve' that was explicitly marked illegal
res = quantize_from_local_file(
^
<unknown>:0: note: loc(fused["StatefulPartitionedCall:", "StatefulPartitionedCall"]): called from
<unknown>:0: error: Lowering tensor list ops is failed. Please consider using Select TF ops and disabling `_experimental_lower_tensor_list_ops` flag in the TFLite converter object. For example, converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]\n converter._experimental_lower_tensor_list_ops = False

2025-08-01 11:03:15.534537: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-08-01 11:03:15.535052: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-01 11:03:15.537778: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-08-01 11:03:15.545625: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1754046195.558915 8442 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1754046195.562749 8442 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-08-01 11:03:15.576329: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-08-01 11:03:16.905328: E external/local_xla/xla/stream_executor/cuda/cuda_driver.cc:152] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
W0000 00:00:1754046199.828443 8442 tf_tfl_flatbuffer_helpers.cc:365] Ignored output_format.
W0000 00:00:1754046199.828480 8442 tf_tfl_flatbuffer_helpers.cc:368] Ignored drop_control_dependency.
2025-08-01 11:03:19.828976: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: /tmp/tmp_mgydisa
2025-08-01 11:03:19.833684: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2025-08-01 11:03:19.833713: I tensorflow/cc/saved_model/reader.cc:147] Reading SavedModel debug info (if present) from: /tmp/tmp_mgydisa
I0000 00:00:1754046199.854580 8442 mlir_graph_optimization_pass.cc:401] MLIR V1 optimization pass is not enabled
2025-08-01 11:03:19.858423: I tensorflow/cc/saved_model/loader.cc:236] Restoring SavedModel bundle.
2025-08-01 11:03:19.937349: I tensorflow/cc/saved_model/loader.cc:220] Running initialization op on SavedModel bundle at path: /tmp/tmp_mgydisa
2025-08-01 11:03:19.961090: I tensorflow/cc/saved_model/loader.cc:466] SavedModel load for tags { serve }; Status: success: OK. Took 132117 microseconds.
2025-08-01 11:03:20.020399: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:268] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
loc(callsite(callsite(fused["TensorListReserve:", callsite("grurnn/gru_1/TensorArrayV2_1@__inference__wrapped_model_911"("/app/quantizer/cli.py":53:1) at callsite("/app/quantizer/quantize.py":63:1 at callsite("/app/quantizer/quantize.py":114:1 at callsite("/app/quantizer/helpers.py":47:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1238:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1190:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1754:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1732:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/convert_phase.py":205:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1655:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/functional.py":459:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/training.py":4026:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/base_layer.py":3457:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/base_serialization.py":61:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/layer_serialization.py":79:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/layer_serialization.py":106:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/model_serialization.py":53:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/save_impl.py":237:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saving_utils.py":159:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saving_utils.py":148:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":65:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/training.py":588:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":65:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/base_layer.py":1142:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":96:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/functional.py":514:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/functional.py":671:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/layers/rnn/base_rnn.py":557:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":65:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/base_layer.py":1142:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":96:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/layers/rnn/gru.py":654:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/backend.py":4990:1 at "/usr/local/lib/python3.9/site-packages/tf_keras/src/backend.py":4991:1)))))))))))))))))))))))))))))))))] at fused["StatefulPartitionedCall:", callsite("StatefulPartitionedCall@__inference_signature_wrapper_2448"("/app/quantizer/cli.py":53:1) at callsite("/app/quantizer/quantize.py":63:1 at callsite("/app/quantizer/quantize.py":114:1 at callsite("/app/quantizer/helpers.py":47:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1238:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1190:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1754:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1732:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/convert_phase.py":205:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1655:1 at "/usr/local/lib/python3.9/site-packages/tensorflow/python/saved_model/signature_serialization.py":168:1))))))))))]) at fused["StatefulPartitionedCall:", "StatefulPartitionedCall"])): error: 'tf.TensorListReserve' op requires element_shape to be static during TF Lite transformation pass
loc(callsite(callsite(fused["TensorListReserve:", callsite("grurnn/gru_1/TensorArrayV2_1@__inference__wrapped_model_911"("/app/quantizer/cli.py":53:1) at callsite("/app/quantizer/quantize.py":63:1 at callsite("/app/quantizer/quantize.py":114:1 at callsite("/app/quantizer/helpers.py":47:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1238:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1190:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1754:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1732:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/convert_phase.py":205:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1655:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/functional.py":459:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/training.py":4026:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/base_layer.py":3457:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/base_serialization.py":61:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/layer_serialization.py":79:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/layer_serialization.py":106:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/model_serialization.py":53:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saved_model/save_impl.py":237:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saving_utils.py":159:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/saving/legacy/saving_utils.py":148:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":65:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/training.py":588:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":65:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/base_layer.py":1142:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":96:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/functional.py":514:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/functional.py":671:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/layers/rnn/base_rnn.py":557:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":65:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/engine/base_layer.py":1142:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/utils/traceback_utils.py":96:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/layers/rnn/gru.py":654:1 at callsite("/usr/local/lib/python3.9/site-packages/tf_keras/src/backend.py":4990:1 at "/usr/local/lib/python3.9/site-packages/tf_keras/src/backend.py":4991:1)))))))))))))))))))))))))))))))))] at fused["StatefulPartitionedCall:", callsite("StatefulPartitionedCall@__inference_signature_wrapper_2448"("/app/quantizer/cli.py":53:1) at callsite("/app/quantizer/quantize.py":63:1 at callsite("/app/quantizer/quantize.py":114:1 at callsite("/app/quantizer/helpers.py":47:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1238:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1190:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1754:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1732:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/convert_phase.py":205:1 at callsite("/usr/local/lib/python3.9/site-packages/tensorflow/lite/python/lite.py":1655:1 at "/usr/local/lib/python3.9/site-packages/tensorflow/python/saved_model/signature_serialization.py":168:1))))))))))]) at fused["StatefulPartitionedCall:", "StatefulPartitionedCall"])): error: failed to legalize operation 'tf.TensorListReserve' that was explicitly marked illegal
error: Lowering tensor list ops is failed. Please consider using Select TF ops and disabling `_experimental_lower_tensor_list_ops` flag in the TFLite converter object. For example, converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]\n converter._experimental_lower_tensor_list_ops = False

I am aware that Tensorflow might not have full suppot for quantization of LSTM and GRU layers, so I am still investigating if this is an error on my side.

Thanks again for a fast response and a quick fix,

Best regards.

Julian E. · ‎2025-08-01

@BM11,

Yeah.. Unfortunately, LSTM and GRU are badly supported for now and if I remember correctly, they are not supported by the NPU. Which mean that even if you were not to get any error, it would not be accelerated by the NPU for now.

I noted your need.

All I can suggest you do would be to decompose the LSTM layer to the equivalent combination of simple layer.

At some point, LSTM model will be added to model zoo (I don't know exactly when) and the model were achieved doing what I just described. But it is not that simple...

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.