2025-11-11 11:46 AM
Hi,
I’m working on benchmarking and deploying a 33 MB half-INT8 model. During benchmarking, the measured inference time in ST dev cloud was about 264 ms. However, after compiling the model locally using STEdgeAI and deploying it, the inference time was found to be 405 ms.
Additionally, when using the “Generate Code” option in the ST Dev Cloud, the generated artifacts—when deployed—showed the same inference time of 405 ms.
I have verified that the weight and activation memory allocations are consistent with those used in the ST Dev Cloud (please refer to the attached logs).
Could you please confirm whether this difference in inference time is expected, or if there is a known delta that should be addressed?
logs:
ST Dev cloud:
>>> stedgeai analyze --model arcface_halfint8.tflite --st-neural-art custom@/tmp/stm32ai_service/7369b431-60e1-4924-9783-5f76cbd6b229/profile-27ac5fbe-1304-4c95-9d4e-44268ad9580f.json --target stm32n6 --optimize.export_hybrid True --name network --workspace workspace --output output
ST Edge AI Core v2.2.0-20266 2adc00962
WARNING: Unsupported keys in the current profile custom are ignored: memory_desc
> memory_desc is not a valid key anymore, use machine_desc instead
>>>> EXECUTING NEURAL ART COMPILER
atonn -i "/tmp/stm32ai_service/7369b431-60e1-4924-9783-5f76cbd6b229/output/arcface_halfint8_OE_3_3_0.onnx" --json-quant-file "/tmp/stm32ai_service/7369b431-60e1-4924-9783-5f76cbd6b229/output/arcface_halfint8_OE_3_3_0_Q.json" -g "network.c" --load-mdesc "/app/stm32ai/Utilities/configs/stm32n6.mdesc" --load-mpool "/app/stm32ai/Utilities/linux/targets/stm32/resources/mpools/stm32n6.mpool" --save-mpool-file "/tmp/stm32ai_service/7369b431-60e1-4924-9783-5f76cbd6b229/workspace/neural_art__network/stm32n6.mpool" --out-dir-prefix "/tmp/stm32ai_service/7369b431-60e1-4924-9783-5f76cbd6b229/workspace/neural_art__network/"
--native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto-sched --optimization 3 --enable-virtual-mem-pools --Oshuffle-dma --Ocache-opt --cache-maintenance --Oauto-sched --Omax-ca-pipe 4 --output-info-file "c_info.json"
<<<< DONE EXECUTING NEURAL ART COMPILER
Exec/report summary (analyze)
-----------------------------------------------------------------------------------------------------------
model file : arcface_halfint8.tflite
type : tflite
c_name : network
options : allocate-inputs, allocate-outputs
optimization : balanced
target/series : stm32n6npu
workspace dir : workspace
output dir : output
model_fmt : ss/sa per tensor
model_name : arcface_halfint8
model_hash : 0xce8ba0b48ca91dc6b30f9b3d2ba14615
params # : 34,129,728 items (32.57 MiB)
-----------------------------------------------------------------------------------------------------------
input 1/1 : 'Input_34_out_0', f32(1x112x112x3), 147.00 KBytes, activations
output 1/1 : 'Dequantize_352_out_0', f32(1x512), 2.00 KBytes, activations
macc : 0
weights (ro) : 34,217,857 B (32.63 MiB) (1 segment) / -102,301,055(-74.9%) vs float model
activations (rw) : 5,577,856 B (5.32 MiB) (5 segments) *
ram (total) : 5,577,856 B (5.32 MiB) = 5,577,856 + 0 + 0
-----------------------------------------------------------------------------------------------------------
(*) 'input'/'output' buffers are allocated in the activations buffer
Computing AI RT data/code size (target=stm32n6npu)..
Compilation details
---------------------------------------------------------------------------------
Compiler version: 1.1.1-14
Compiler arguments: -i arcface_halfint8_OE_3_3_0.onnx --json-quant-file arcface_halfint8_OE_3_3_0_Q.json -g network.c --load-mdesc stm32n6.mdesc --load-mpool stm32n6.mpool --save-mpool-file stm32n6.mpool --out-dir-prefix neural_art__network/ --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto-sched --optimization 3 --enable-virtual-mem-pools --Oshuffle-dma --Ocache-opt --cache-maintenance --Oauto-sched --Omax-ca-pipe 4 --output-info-file c_info.json
====================================================================================
Memory usage information (input/output buffers are included in activations)
---------------------------------------------------------------------------------
flexMEM [0x34000000 - 0x34000000]: 0 B / 0 B ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
cpuRAM1 [0x34064000 - 0x34064000]: 0 B / 0 B ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
cpuRAM2 [0x34100000 - 0x34200000]: 1.000 MB / 1.000 MB (100.00 % used) -- weights: 0 B ( 0.00 % used) activations: 1.000 MB (100.00 % used)
npuRAM3 [0x34200000 - 0x34270000]: 448.000 kB / 448.000 kB (100.00 % used) -- weights: 0 B ( 0.00 % used) activations: 448.000 kB (100.00 % used)
npuRAM4 [0x34270000 - 0x342E0000]: 392.000 kB / 448.000 kB ( 87.50 % used) -- weights: 0 B ( 0.00 % used) activations: 392.000 kB ( 87.50 % used)
npuRAM5 [0x342E0000 - 0x34350000]: 447.125 kB / 448.000 kB ( 99.80 % used) -- weights: 0 B ( 0.00 % used) activations: 447.125 kB ( 99.80 % used)
npuRAM6 [0x34350000 - 0x343C0000]: 0 B / 448.000 kB ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
octoFlash [0x71000000 - 0x78000000]: 32.633 MB / 112.000 MB ( 29.14 % used) -- weights: 32.633 MB ( 29.14 % used) activations: 0 B ( 0.00 % used)
hyperRAM [0x90000000 - 0x92000000]: 3.062 MB / 32.000 MB ( 9.57 % used) -- weights: 0 B ( 0.00 % used) activations: 3.062 MB ( 9.57 % used)
Total: 37.952 MB -- weights: 32.633 MB activations: 5.319 MB
====================================================================================
Used memory ranges
---------------------------------------------------------------------------------
cpuRAM2 [0x34100000 - 0x34200000]: 0x34100000-0x34200000
npuRAM3 [0x34200000 - 0x34270000]: 0x34200000-0x34270000
npuRAM4 [0x34270000 - 0x342E0000]: 0x34270000-0x342D2000
npuRAM5 [0x342E0000 - 0x34350000]: 0x342E0000-0x3434FC80
octoFlash [0x71000000 - 0x78000000]: 0x71000000-0x730A1F90
hyperRAM [0x90000000 - 0x92000000]: 0x90000000-0x90310000
====================================================================================
Epochs details
---------------------------------------------------------------------------------
Total number of epochs: 147 of which 2 implemented in software
epoch ID HW/SW/EC Operation (SW only)
epoch 1 HW
epoch 2 -SW- ( QuantizeLinear )
epoch 3 HW
epoch 4 HW
epoch 5 HW
epoch 6 HW
epoch 7 HW
epoch 8 HW
epoch 9 HW
epoch 10 HW
epoch 11 HW
epoch 12 HW
epoch 13 HW
epoch 14 HW
epoch 15 HW
epoch 16 HW
epoch 17 HW
epoch 18 HW
epoch 19 HW
epoch 20 HW
epoch 21 HW
epoch 22 HW
epoch 23 HW
epoch 24 HW
epoch 25 HW
epoch 26 HW
epoch 27 HW
epoch 28 HW
epoch 29 HW
epoch 30 HW
epoch 31 HW
epoch 32 HW
epoch 33 HW
epoch 34 HW
epoch 35 HW
epoch 36 HW
epoch 37 HW
epoch 38 HW
epoch 39 HW
epoch 40 HW
epoch 41 HW
epoch 42 HW
epoch 43 HW
epoch 44 HW
epoch 45 HW
epoch 46 HW
epoch 47 HW
epoch 48 HW
epoch 49 HW
epoch 50 HW
epoch 51 HW
epoch 52 HW
epoch 53 HW
epoch 54 HW
epoch 55 HW
epoch 56 HW
epoch 57 HW
epoch 58 HW
epoch 59 HW
epoch 60 HW
epoch 61 HW
epoch 62 HW
epoch 63 HW
epoch 64 HW
epoch 65 HW
epoch 66 HW
epoch 67 HW
epoch 68 HW
epoch 69 HW
epoch 70 HW
epoch 71 HW
epoch 72 HW
epoch 73 HW
epoch 74 HW
epoch 75 HW
epoch 76 HW
epoch 77 HW
epoch 78 HW
epoch 79 HW
epoch 80 HW
epoch 81 HW
epoch 82 HW
epoch 83 HW
epoch 84 HW
epoch 85 HW
epoch 86 HW
epoch 87 HW
epoch 88 HW
epoch 89 HW
epoch 90 HW
epoch 91 HW
epoch 92 HW
epoch 93 HW
epoch 94 HW
epoch 95 HW
epoch 96 HW
epoch 97 HW
epoch 98 HW
epoch 99 HW
epoch 100 HW
epoch 101 HW
epoch 102 HW
epoch 103 HW
epoch 104 HW
epoch 105 HW
epoch 106 HW
epoch 107 HW
epoch 108 HW
epoch 109 HW
epoch 110 HW
epoch 111 HW
epoch 112 HW
epoch 113 HW
epoch 114 HW
epoch 115 HW
epoch 116 HW
epoch 117 HW
epoch 118 HW
epoch 119 HW
epoch 120 HW
epoch 121 HW
epoch 122 HW
epoch 123 HW
epoch 124 HW
epoch 125 HW
epoch 126 HW
epoch 127 HW
epoch 128 HW
epoch 129 HW
epoch 130 HW
epoch 131 HW
epoch 132 HW
epoch 133 HW
epoch 134 HW
epoch 135 HW
epoch 136 HW
epoch 137 HW
epoch 138 HW
epoch 139 HW
epoch 140 HW
epoch 141 HW
epoch 142 HW
epoch 143 HW
epoch 144 HW
epoch 145 HW
epoch 146 HW
epoch 147 -SW- ( DequantizeLinear )
====================================================================================
Requested memory size by section - "stm32n6npu" target
------------------------------- -------- ------------ ------ -----------
module text rodata data bss
------------------------------- -------- ------------ ------ -----------
network.o 22,332 183,633 0 0
NetworkRuntime1020_CM55_GCC.a 3,068 0 0 0
ll_aton_reloc_network.o 0 0 0 0
lib (toolchain)* 896 624 0 0
ll atonn runtime 6,990 2,244 0 29
------------------------------- -------- ------------ ------ -----------
RT total** 33,286 186,501 0 29
------------------------------- -------- ------------ ------ -----------
weights 0 34,217,857 0 0
activations 0 0 0 5,577,856
io 0 0 0 0
------------------------------- -------- ------------ ------ -----------
TOTAL 33,286 34,404,358 0 5,577,885
------------------------------- -------- ------------ ------ -----------
* toolchain objects (libm/libgcc*)
** RT AI runtime objects (kernels+infrastructure)
Summary - "stm32n6npu" target
--------------------------------------------------
FLASH (ro) %* RAM (rw) %
--------------------------------------------------
RT total 219,787 0.6% 29 0.0%
--------------------------------------------------
TOTAL 34,437,644 5,577,885
--------------------------------------------------
* rt/total
Creating txt report file network_analyze_report.txt
elapsed time (analyze): 21.367s
Local STedgeai compilation:
C:\ST\STEdgeAI\2.2\Utilities\windows>stedgeai generate --model ./arcface_halfint8.tflite --target stm32n6 --optimize.export_hybrid True --st-neural-art default@user_neuralart.json
ST Edge AI Core v2.2.0-20266 2adc00962
WARNING: Unsupported keys in the current profile default are ignored: memory_desc
> memory_desc is not a valid key anymore, use machine_desc instead
>>>> EXECUTING NEURAL ART COMPILER
C:/ST/STEdgeAI/2.2/Utilities/windows/atonn.exe -i "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/arcface_halfint8_OE_3_3_0.onnx" --json-quant-file "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_output/arcface_halfint8_OE_3_3_0_Q.json" -g "network.c" --load-mdesc "C:/ST/STEdgeAI/2.2/Utilities/configs/stm32n6.mdesc" --load-mpool "C:/ST/STEdgeAI/2.2/Utilities/windows/targets/stm32/resources/mpools/stm32n6.mpool" --save-mpool-file "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_ws/neural_art__network/stm32n6.mpool" --out-dir-prefix "C:/ST/STEdgeAI/2.2/Utilities/windows/st_ai_ws/neural_art__network/" --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto-sched --optimization 3 --enable-virtual-mem-pools --Oshuffle-dma --Ocache-opt --cache-maintenance --Oauto-sched --Omax-ca-pipe 4 --output-info-file "c_info.json"
<<<< DONE EXECUTING NEURAL ART COMPILER
Exec/report summary (generate)
----------------------------------------------------------------------------------------------------
model file : C:\ST\STEdgeAI\2.2\Utilities\windows\arcface_halfint8.tflite
type : tflite
c_name : network
options : allocate-inputs, allocate-outputs
optimization : balanced
target/series : stm32n6npu
workspace dir : C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws
output dir : C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output
model_fmt : ss/sa per tensor
model_name : arcface_halfint8
model_hash : 0xce8ba0b48ca91dc6b30f9b3d2ba14615
params # : 34,129,728 items (32.57 MiB)
----------------------------------------------------------------------------------------------------
input 1/1 : 'Input_34_out_0', f32(1x112x112x3), 147.00 KBytes, activations
output 1/1 : 'Dequantize_352_out_0', f32(1x512), 2.00 KBytes, activations
macc : 0
weights (ro) : 34,217,857 B (32.63 MiB) (1 segment) / -102,301,055(-74.9%) vs float model
activations (rw) : 5,577,856 B (5.32 MiB) (5 segments) *
ram (total) : 5,577,856 B (5.32 MiB) = 5,577,856 + 0 + 0
----------------------------------------------------------------------------------------------------
(*) 'input'/'output' buffers are allocated in the activations buffer
Computing AI RT data/code size (target=stm32n6npu)..
-> compiler "gcc:arm-none-eabi-gcc" is not in the PATH
Compilation details
---------------------------------------------------------------------------------
Compiler version: 1.1.1-14
Compiler arguments: -i C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\arcface_halfint8_OE_3_3_0.onnx --json-quant-file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\arcface_halfint8_OE_3_3_0_Q.json -g network.c --load-mdesc C:\ST\STEdgeAI\2.2\Utilities\configs\stm32n6.mdesc --load-mpool C:\ST\STEdgeAI\2.2\Utilities\windows\targets\stm32\resources\mpools\stm32n6.mpool --save-mpool-file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws\neural_art__network\stm32n6.mpool --out-dir-prefix C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_ws\neural_art__network/ --native-float --mvei --cache-maintenance --Ocache-opt --enable-virtual-mem-pools --Os --Oauto-sched --optimization 3 --enable-virtual-mem-pools --Oshuffle-dma --Ocache-opt --cache-maintenance --Oauto-sched --Omax-ca-pipe 4 --output-info-file c_info.json
====================================================================================
Memory usage information (input/output buffers are included in activations)
---------------------------------------------------------------------------------
flexMEM [0x34000000 - 0x34000000]: 0 B / 0 B ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
cpuRAM1 [0x34064000 - 0x34064000]: 0 B / 0 B ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
cpuRAM2 [0x34100000 - 0x34200000]: 1.000 MB / 1.000 MB (100.00 % used) -- weights: 0 B ( 0.00 % used) activations: 1.000 MB (100.00 % used)
npuRAM3 [0x34200000 - 0x34270000]: 448.000 kB / 448.000 kB (100.00 % used) -- weights: 0 B ( 0.00 % used) activations: 448.000 kB (100.00 % used)
npuRAM4 [0x34270000 - 0x342E0000]: 392.000 kB / 448.000 kB ( 87.50 % used) -- weights: 0 B ( 0.00 % used) activations: 392.000 kB ( 87.50 % used)
npuRAM5 [0x342E0000 - 0x34350000]: 447.125 kB / 448.000 kB ( 99.80 % used) -- weights: 0 B ( 0.00 % used) activations: 447.125 kB ( 99.80 % used)
npuRAM6 [0x34350000 - 0x343C0000]: 0 B / 448.000 kB ( 0.00 % used) -- weights: 0 B ( 0.00 % used) activations: 0 B ( 0.00 % used)
octoFlash [0x70580000 - 0x72780000]: 32.633 MB / 34.000 MB ( 95.98 % used) -- weights: 32.633 MB ( 95.98 % used) activations: 0 B ( 0.00 % used)
hyperRAM [0x90000000 - 0x91000000]: 3.062 MB / 16.000 MB ( 19.14 % used) -- weights: 0 B ( 0.00 % used) activations: 3.062 MB ( 19.14 % used)
Total: 37.952 MB -- weights: 32.633 MB activations: 5.319 MB
====================================================================================
Used memory ranges
---------------------------------------------------------------------------------
cpuRAM2 [0x34100000 - 0x34200000]: 0x34100000-0x34200000
npuRAM3 [0x34200000 - 0x34270000]: 0x34200000-0x34270000
npuRAM4 [0x34270000 - 0x342E0000]: 0x34270000-0x342D2000
npuRAM5 [0x342E0000 - 0x34350000]: 0x342E0000-0x3434FC80
octoFlash [0x70580000 - 0x72780000]: 0x70580000-0x72621F90
hyperRAM [0x90000000 - 0x91000000]: 0x90000000-0x90310000
====================================================================================
Epochs details
---------------------------------------------------------------------------------
Total number of epochs: 147 of which 2 implemented in software
epoch ID HW/SW/EC Operation (SW only)
epoch 1 HW
epoch 2 -SW- ( QuantizeLinear )
epoch 3 HW
epoch 4 HW
epoch 5 HW
epoch 6 HW
epoch 7 HW
epoch 8 HW
epoch 9 HW
epoch 10 HW
epoch 11 HW
epoch 12 HW
epoch 13 HW
epoch 14 HW
epoch 15 HW
epoch 16 HW
epoch 17 HW
epoch 18 HW
epoch 19 HW
epoch 20 HW
epoch 21 HW
epoch 22 HW
epoch 23 HW
epoch 24 HW
epoch 25 HW
epoch 26 HW
epoch 27 HW
epoch 28 HW
epoch 29 HW
epoch 30 HW
epoch 31 HW
epoch 32 HW
epoch 33 HW
epoch 34 HW
epoch 35 HW
epoch 36 HW
epoch 37 HW
epoch 38 HW
epoch 39 HW
epoch 40 HW
epoch 41 HW
epoch 42 HW
epoch 43 HW
epoch 44 HW
epoch 45 HW
epoch 46 HW
epoch 47 HW
epoch 48 HW
epoch 49 HW
epoch 50 HW
epoch 51 HW
epoch 52 HW
epoch 53 HW
epoch 54 HW
epoch 55 HW
epoch 56 HW
epoch 57 HW
epoch 58 HW
epoch 59 HW
epoch 60 HW
epoch 61 HW
epoch 62 HW
epoch 63 HW
epoch 64 HW
epoch 65 HW
epoch 66 HW
epoch 67 HW
epoch 68 HW
epoch 69 HW
epoch 70 HW
epoch 71 HW
epoch 72 HW
epoch 73 HW
epoch 74 HW
epoch 75 HW
epoch 76 HW
epoch 77 HW
epoch 78 HW
epoch 79 HW
epoch 80 HW
epoch 81 HW
epoch 82 HW
epoch 83 HW
epoch 84 HW
epoch 85 HW
epoch 86 HW
epoch 87 HW
epoch 88 HW
epoch 89 HW
epoch 90 HW
epoch 91 HW
epoch 92 HW
epoch 93 HW
epoch 94 HW
epoch 95 HW
epoch 96 HW
epoch 97 HW
epoch 98 HW
epoch 99 HW
epoch 100 HW
epoch 101 HW
epoch 102 HW
epoch 103 HW
epoch 104 HW
epoch 105 HW
epoch 106 HW
epoch 107 HW
epoch 108 HW
epoch 109 HW
epoch 110 HW
epoch 111 HW
epoch 112 HW
epoch 113 HW
epoch 114 HW
epoch 115 HW
epoch 116 HW
epoch 117 HW
epoch 118 HW
epoch 119 HW
epoch 120 HW
epoch 121 HW
epoch 122 HW
epoch 123 HW
epoch 124 HW
epoch 125 HW
epoch 126 HW
epoch 127 HW
epoch 128 HW
epoch 129 HW
epoch 130 HW
epoch 131 HW
epoch 132 HW
epoch 133 HW
epoch 134 HW
epoch 135 HW
epoch 136 HW
epoch 137 HW
epoch 138 HW
epoch 139 HW
epoch 140 HW
epoch 141 HW
epoch 142 HW
epoch 143 HW
epoch 144 HW
epoch 145 HW
epoch 146 HW
epoch 147 -SW- ( DequantizeLinear )
====================================================================================
Generated files (5)
------------------------------------------------------------------------------------
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\arcface_halfint8_OE_3_3_0.onnx
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\arcface_halfint8_OE_3_3_0_Q.json
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network.c
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_atonbuf.xSPI2.raw
C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network.h
Creating txt report file C:\ST\STEdgeAI\2.2\Utilities\windows\st_ai_output\network_generate_report.txt
elapsed time (generate): 60.889s