Why the flash and used ram used do not change when I use the dynamic range quantization for tensorflow lite model? (analyzing the neural network in STM32 X-Cube-AI)

HLU.1 · ‎2022-11-26

Hi! I am doing some experiements to analyze the neural network model. The model I am using is the HAR model example. You can see here: https://github.com/ausilianapoli/HAR-CNN-Keras-STM32

So I am doing some experiments regarding the quantization.

First I convert the model to a tensorflow lite model by

model = tf.keras.models.load_model('model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

But when I want to use the dynamic range quantization model using

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.conveert()
open("dynamic_model.tflite", "wb").write(tflite_model)

I load the two model in STM32 X-Cube AI and use the "Analyze" function. I found that all the results are the same. These two models use the same Flash and RAM.

I don't understand. I think for DRQ, the weights will be quantized, Can someone help me with this?

PS. I also expeimented the other two.

STM32 doesnot support Float16 quantization.

For Integer quantization, the RAM and Flash usage decreases.