2022-11-26 11:02 AM
Hi! I am doing some experiements to analyze the neural network model. The model I am using is the HAR model example. You can see here: https://github.com/ausilianapoli/HAR-CNN-Keras-STM32
So I am doing some experiments regarding the quantization.
First I convert the model to a tensorflow lite model by
model = tf.keras.models.load_model('model.h5')
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
But when I want to use the dynamic range quantization model using
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.conveert()
open("dynamic_model.tflite", "wb").write(tflite_model)
I load the two model in STM32 X-Cube AI and use the "Analyze" function. I found that all the results are the same. These two models use the same Flash and RAM.
I don't understand. I think for DRQ, the weights will be quantized, Can someone help me with this?
PS. I also expeimented the other two.
STM32 doesnot support Float16 quantization.
For Integer quantization, the RAM and Flash usage decreases.