2025-06-18 6:49 AM
Hello,
I'm interested in understanding how much an NPU can accelerate a model. I reviewed the documentation for Neural Art and found that 38 TFLite operators are supported by the hardware:
ABS, ADD, AVERAGE_POOL_2D, BATCH_MATMUL, CAST, CEIL, CONCATENATION, CONV_2D, DEPTHWISE_CONV_2D, EQUAL, EXPAND_DIMS, FULLY_CONNECTED, HARD_SWISH, LEAKY_RELU, LOGICAL_AND, LOGICAL_NOT, LOGICAL_OR, LOGISTIC, MAX_POOL_2D, MUL, PACK, PAD, PRELU, (RE)QUANTIZE, RELU, RELU6, RESHAPE, RESIZE_NEAREST_NEIGHBOR (with coordinate_transformation_mode=asymmetric and nearest_mode=floor), SPACE_TO_DEPTH (with same input/output quantization), SPLIT, SPLIT_V, STRIDED_SLICE, SQUEEZE, SUB, TANH, TRANSPOSE, TRANSPOSE_CONV, UNPACK.
However, each operator has a different level of computational complexity. Therefore, the acceleration ratio provided by the NPU may vary depending on which operators are used. For example, a model that heavily uses CONV layers may benefit more from NPU acceleration than one that primarily uses ADD operations.
Do you have any documentation or benchmarks that show the computational intensity or NPU acceleration ratio for each of these supported operators, compared to running them on the CPU?