2021-03-13 06:11 AM
I loaded my mobilenetv3 model in STM32CubeAI but when I tried to choose the compression ratio of 4 and 8 to reduce size, the RAM and FLASH requirements are still the same.
Also, I optimized my model in tensorflow lite which results in ~2MB, but when I load it into STM32CubeAI, it still reads the original size which is 6MB?
Please help.
Solved! Go to Solution.
2021-03-17 01:13 AM
Hello,
X-CUBE-AI provides a support to compress only for the Dense layer in float, according some condition to limit the drop of accuracy.
For the optimization with TensorFlow, I suppose that you use TFLiteConverter with the default option to generate the tflite file:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model_quant = converter.convert()
In this case, only the weights are compressed (or quantized - float32 -> int8). This allows to improve the size of the TFLite file to deploy it on a Mobile. But it requires the specific implementation of the operators (i.e. hybrid op.) which allows to "quantize" on the fly the activations (input/output of the operators) or "dequantize" the weights before to apply the operation. For a IoT or Edge device this is not efficient (in term of processing time and memory usage). In this case, X-CUBE-AI "dequantize" off-line (during the code generation) the weights. This is why we retrieve the original size. Expected improvement of size is not conserved and you loss in accuracy.
For X-CUBE-AI, as for other Edge-AI-runtime, it is recommended to apply the TFLiteConverter options which allows to have a "full integer" quantized model (https://www.tensorflow.org/lite/performance/post_training_quantization)
def representative_data_gen():
for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
yield [input_value]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.float32, tf.uint8
converter.inference_output_type = tf.int8 # or tf.float32, tf.uint8
tflite_model_quant = converter.convert()
br,
Jean-Michel
2021-03-17 01:13 AM
Hello,
X-CUBE-AI provides a support to compress only for the Dense layer in float, according some condition to limit the drop of accuracy.
For the optimization with TensorFlow, I suppose that you use TFLiteConverter with the default option to generate the tflite file:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model_quant = converter.convert()
In this case, only the weights are compressed (or quantized - float32 -> int8). This allows to improve the size of the TFLite file to deploy it on a Mobile. But it requires the specific implementation of the operators (i.e. hybrid op.) which allows to "quantize" on the fly the activations (input/output of the operators) or "dequantize" the weights before to apply the operation. For a IoT or Edge device this is not efficient (in term of processing time and memory usage). In this case, X-CUBE-AI "dequantize" off-line (during the code generation) the weights. This is why we retrieve the original size. Expected improvement of size is not conserved and you loss in accuracy.
For X-CUBE-AI, as for other Edge-AI-runtime, it is recommended to apply the TFLiteConverter options which allows to have a "full integer" quantized model (https://www.tensorflow.org/lite/performance/post_training_quantization)
def representative_data_gen():
for input_value in tf.data.Dataset.from_tensor_slices(train_images).batch(1).take(100):
yield [input_value]
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.float32, tf.uint8
converter.inference_output_type = tf.int8 # or tf.float32, tf.uint8
tflite_model_quant = converter.convert()
br,
Jean-Michel