Showing results for 
Search instead for 
Did you mean: 

A self-trained model takes too much flash

Associate II


I have a self-trained image classification model with two classes with dynamic range quantization(unquantized model is about 1.5mb and it is 500kb after quantization). And this model takes too much flash and ram after benchmarking. 
Is there any method for reducing the flash and ram consumption? The model I am using is mobilenet, should I use other model for image classification? Or is there any other model for recommendation?

Also note that STM32 developer cloud does not support quantization for my self-trained model so that I quantize the model by myself(using dynamic range quantization). 

Thank you!

ST Employee


The floating point models which are quantized with "Dynamic range quantization" are not efficiently supported. Only the "Full integer quantization" mode is supported. If you try to deploy a quantized model with "dynamic range quantization", the weights will be dequantized (int8 -> float32) off-line during the generation.

It is recommended to use a "full integer quantization" technique to quantize the floating-point model. Through the STM32 developer cloud, a quantize service with random data can be used to generate a fake quantized model allowing to bench the model on a given board and to have an idea of the final inference time and requested memory.

Note that this quantized model can be not used for real UC because it is fake quantized and should be quantized later with the real calibration data from the "original" data set (see





Hi jm:

Thank you so much for your prompt reply! I just tried full integer quantization with original trainning dataset. The model inference works well with floating point 32 quantized model, but when it comes to int8 model, model inference becomes weired since the softmax is somehow not applied to the output and the norm of output is not 1, like this output : output_data [[ 15 241]].(f32 works well, the output is [[0.00201043 0.99798954]]). Below is my code that converts a tfjs model to a int8 quantized model. If you need any additional information, please do not hesitate to reach me out, I am looking forward to your response! And Thank you in advance!

Best Regards



import sys
import os
import shutil

# example1(float 32 quantization): python3 "/root/modelconverter/convert/224model_tm/model.json"
# example2(int 8 quantization): python3 "/root/modelconverter/convert/self_trained_tiny_model/model.json"
if len(sys.argv) <= 1:
sys.exit("ERROR: Must include path to input file as argument.\nEX: python {} models/model.json".format(sys.argv[0]))

fileName = sys.argv[1]
if fileName == "-h" or fileName == "--help":
print("Usage: {} <path/to/input/file.json>\n Output will save to converted/model.tflite".format(sys.argv[0]))

os.system("tensorflowjs_converter --input_format=tfjs_layers_model --output_format=keras {} converted/saved_model.h5".format(fileName))

import tensorflow as tf
import numpy as np
# Load a model using high-level tf.keras.* APIs
model = tf.keras.models.load_model('converted/saved_model.h5')

def load_image_224x3(image_path, size=(224, 224)):
image = tf.keras.utils.load_img(image_path, target_size=size)
image = tf.keras.utils.img_to_array(image)
image /= 255.0
image = np.expand_dims(image, axis=0)
return image

def load_image_96x1(image_path, size=(96, 96)):
image = tf.keras.utils.load_img(image_path, color_mode='grayscale', target_size=size)
image = tf.keras.utils.img_to_array(image)
image /= 255.0
image = np.expand_dims(image, axis=0)
return image

import random

def representative_dataset_gen():
class1_images = os.listdir('/root/modelconverter/convert/images_tm/Class 1')
class2_images = os.listdir('/root/modelconverter/convert/images_tm/Class 2')
num_calibration_steps = 150
all_images = (['/root/modelconverter/convert/images_tm/Class 1/' + name for name in class1_images] +
['/root/modelconverter/convert/images_tm/Class 2/' + name for name in class2_images])

for _ in range(num_calibration_steps):
image_path = random.choice(all_images)
image = load_image_96x1(image_path)
yield [image]

# Convert the model.
converter = tf.lite.TFLiteConverter.from_keras_model(model)

converter.optimizations = [tf.lite.Optimize.DEFAULT] # quantize the model
converter.representative_dataset = representative_dataset_gen
# This ensures that if any ops can't be quantized, the converter throws an error
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
# For full integer quantization, though supported types defaults to int8 only
converter.target_spec.supported_types = [tf.int8]
# These set the input and output tensors to uint8 (added in r2.3)
converter.inference_input_type = tf.uint8 # or tf.int8/tf.float32
converter.inference_output_type = tf.uint8 # or tf.int8/tf.float32

tflite_model = converter.convert()

# Save the model.
with open('../converted_model/int8_model_tiny.tflite', 'wb') as f:

directory = "converted"

print("Conversion complete!")
Associate II

BTW, the source of my TFJS model is from "Teachable Machine", with input 96x96 greyscale image for best fit on microcontrollers.

A quick update: 

This is shown after conversion:
fully_quantize: 0, inference_type: 6, input_inference_type: 9, output_inference_type: 9
saving the model!
Conversion complete!

I changed model type from uint8 to int8 and it works better but still gives false predictions, float 32 gives accurate prediction everytime. 

ST Employee


Thanks to share with us this quick update. Sript to quantize the model seems OK.

I suppose that when you change the model type from uint8 to int8, this is just the inference_inout/output_type parameters. If yes, normally this should be not impacted the final results. How you evaluate the quantized model vs original model?

Perhaps some tips:

- This is certaintly the case here, but for the data which are used to quantize the model it is important to use the same normalization process (here, normalization beetween [0,1[) which is used during the training of the original model.

- To facilitate the evaluation of the quantized model vs floating point model offline (on PC), in a fist time, we can keep the inference_input/output_type in float allowing to use directly w/o additional steps of quantization/dequantization the same preprocessing and post-processing to evaluate both models.