MMU disabled while generate nbg from tflite

CanY · ‎2025-06-10

Hello.

This problem has bothered me a lot.

I generated an nbg modle form my tflite model:
adam@Ubuntu22:~/STM32MPU_workspace/stedgeai/2.0/Utilities/linux$ ./stedgeai generate --model test.tflite --target stm32mp25
ST Edge AI Core v2.0.0-20049
PASS: 0%| | 0/2 [00:00<?, ?it/s]Galcore warning: MMU is disabled!

Model was successfully compiled to NBG: /home/adam/STM32MPU_workspace/stedgeai/2.0/Utilities/linux/stm32ai_output/test.nb
PASS: 100%
elapsed time (generate): 39.985s

I am not sure if the waring "MMU is disabled" meaning my model will works in low efficiency or not. But I definitely monitored the gpu load while inferenceing and the maximum occupancy rate is 2% only. Then I tried to changed batch size to 32 to improve gpu load. But the result is:
[ 9143.030372] [galcore]: GPU[0] core0 hang, automatic recovery.
[ 9143.030732] [galcore]: recovery done

I changed the batch size to 6, it's OK, and the up threshold is 6.

I thinks that may be something wrong about my model or my environment settings. That's the details:
input_image = tf.keras.Input(shape=(28, 28, 1), name='input_image')
tf.nn.conv2d, tf.transpose, tf.constant, tf.multiply, tf.add, tf.subtract , tf.squrat, tf.sqrt, tf.devide, tf.greater, tf.cast operators were used.

How can I solve it?

Thank you for your time and assistance.

Best regards,
CanY

Julian E. · ‎2025-06-11

Hello @CanY,

Galcore warning: MMU is disabled!

Because you generate the model on a host pc and not directly on the target, the emulation library is used to emulate the MPU env, but the MMU is disabled, or it would not work on the host pc.

It has no impact on the model that you will use on the mpu, this one has the MMU activated.

Concerning the other issue that you have, it seems that you are using tensorflow and not tensorflow lite, suggesting that your model is not 8 bits quantized.

On STM32MP2x boards, the quantization format supported by the NPU is exclusively 8-bits per-tensor asymmetric. For the best inference performance, please provide a model that follows this recommendation. If the model provided is using another quantization scheme, like per-channel quantization, the generated NBG model will run mainly on Graphics Processing Unit (GPU) instead of NPU increasing the inference time.

source: https://stedgeai-dc.st.com/assets/embedded-docs/stm32mpu_command_line_interface.html

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

View solution in original post

Julian E. · ‎2025-06-11

Hello @CanY,

Galcore warning: MMU is disabled!

Because you generate the model on a host pc and not directly on the target, the emulation library is used to emulate the MPU env, but the MMU is disabled, or it would not work on the host pc.

It has no impact on the model that you will use on the mpu, this one has the MMU activated.

Concerning the other issue that you have, it seems that you are using tensorflow and not tensorflow lite, suggesting that your model is not 8 bits quantized.

On STM32MP2x boards, the quantization format supported by the NPU is exclusively 8-bits per-tensor asymmetric. For the best inference performance, please provide a model that follows this recommendation. If the model provided is using another quantization scheme, like per-channel quantization, the generated NBG model will run mainly on Graphics Processing Unit (GPU) instead of NPU increasing the inference time.

source: https://stedgeai-dc.st.com/assets/embedded-docs/stm32mpu_command_line_interface.html

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

CanY · ‎2025-06-11

Hello @Julian E.

Thank you for your helpful reply.

I have some uncertain guesses and hope your help.

1. When batch size is 1, the gpu load is only 2%, reason is my model is small and cost little resource, and when I set batch size at 32, the GPU[0] core0 hang, the reason is graphics memory limitation? (As a fact, batch size = 6, gpu load = 6%; batch size = 7, gpu load = 100% hang)

2. I do using tensorflow and generate a h5 model, than convert it to tflite model. Do I should quantized my model first and than use stedgeai cmd to generate nbg? When I use the ST Edge Ai Developer Cloud, it helps me quantized my h5 model to tflite.

Thank you for your time and assistance.

Best regards,
CanY

Julian E. · ‎2025-06-11

Hello @CanY,

The steps are:

Define and train a tensorflow model (.h5)
Quantize your model from float32 to int8 (.tflite). You can use the dev cloud to do it and download the quantized model.
Generate the model with the stedgeai core

For your first point, I am not sure.

Please try to use a 8bits quantized model first and see what happens.

Also make sure to use a stable OpenSTLinux or X-LINUX-A.

Have a good day,

Julian

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.