2024-02-06 12:44 AM
Hi,
I have a self-trained image classification model with two classes with dynamic range quantization(unquantized model is about 1.5mb and it is 500kb after quantization). And this model takes too much flash and ram after benchmarking.
Is there any method for reducing the flash and ram consumption? The model I am using is mobilenet, should I use other model for image classification? Or is there any other model for recommendation?
Also note that STM32 developer cloud does not support quantization for my self-trained model so that I quantize the model by myself(using dynamic range quantization).
Thank you!
2024-02-06 02:01 AM
Hi,
The floating point models which are quantized with "Dynamic range quantization" are not efficiently supported. Only the "Full integer quantization" mode is supported. If you try to deploy a quantized model with "dynamic range quantization", the weights will be dequantized (int8 -> float32) off-line during the generation.
It is recommended to use a "full integer quantization" technique to quantize the floating-point model. Through the STM32 developer cloud, a quantize service with random data can be used to generate a fake quantized model allowing to bench the model on a given board and to have an idea of the final inference time and requested memory.
Note that this quantized model can be not used for real UC because it is fake quantized and should be quantized later with the real calibration data from the "original" data set (see https://www.tensorflow.org/lite/performance/post_training_quantization)
br,
jm
2024-02-07 05:51 PM
Hi jm:
Thank you so much for your prompt reply! I just tried full integer quantization with original trainning dataset. The model inference works well with floating point 32 quantized model, but when it comes to int8 model, model inference becomes weired since the softmax is somehow not applied to the output and the norm of output is not 1, like this output : output_data [[ 15 241]].(f32 works well, the output is [[0.00201043 0.99798954]]). Below is my code that converts a tfjs model to a int8 quantized model. If you need any additional information, please do not hesitate to reach me out, I am looking forward to your response! And Thank you in advance!
Best Regards
2024-02-07 05:54 PM
BTW, the source of my TFJS model is from "Teachable Machine", with input 96x96 greyscale image for best fit on microcontrollers.
2024-02-07 10:52 PM
A quick update:
This is shown after conversion:
fully_quantize: 0, inference_type: 6, input_inference_type: 9, output_inference_type: 9
saving the model!
Conversion complete!
I changed model type from uint8 to int8 and it works better but still gives false predictions, float 32 gives accurate prediction everytime.
2024-02-08 03:29 AM
Hello,
Thanks to share with us this quick update. Sript to quantize the model seems OK.
I suppose that when you change the model type from uint8 to int8, this is just the inference_inout/output_type parameters. If yes, normally this should be not impacted the final results. How you evaluate the quantized model vs original model?
Perhaps some tips:
- This is certaintly the case here, but for the data which are used to quantize the model it is important to use the same normalization process (here, normalization beetween [0,1[) which is used during the training of the original model.
- To facilitate the evaluation of the quantized model vs floating point model offline (on PC), in a fist time, we can keep the inference_input/output_type in float allowing to use directly w/o additional steps of quantization/dequantization the same preprocessing and post-processing to evaluate both models.
br,
jm