2025-07-10 11:30 AM - last edited on 2025-07-11 12:50 AM by mƎALLEm
Hi,
With the new 2.2 STEdgeAI update I have done some neural network model validations on the CM55 of the nucleo-n657x0-q board. I have also used the STLink-v3pwr to measure the power consumption while validating. What I wonder is if there is some detailed explanation on what the validation command does during its runtime, more specifically what happens on the target board.
Attached is a .stpm of a example validation using my custom model with 10 data samples, as can be seen 10 spikes are recorded with some activity between them. How can this activity be classified? Is a inference the time from the start of a spike until the next spike, or is a inference only the spike and the time between them is some other activity?
Thanks in advance,
Brian.
2025-07-15 6:39 AM
Hello @brianon ,
To understand better the graphs, it might be useful to increase the sampling rate to better see shorter patterns.
Most likely, the spikes you are seeing represent the inferences that are being executed on your 10 samples (on the whole, when NPU is "working", an increase in power consumption is noted + depending on the current "epoch" of the inference being executed, hardware units not used during this epoch are clock-gated (and consume less - so you can see variations in the "spikes" you mention).
The activity between the spikes are most likely UART messages sent from/to the host machine. For each inference:
Those two steps are rather time-hungry because data transmission is way slower than crunching numbers on the target MCU.
2025-07-16 9:50 AM
Hi @SlothGrill ,
Thank you for the answer, while this clarifies it a bit I'm still slightly unsure how to interpret the results. To clarify what I mean, attached is the graph of a 5 layer model instead (still using the STedgeAI validation command with 10 samples). As you might see there are 10 clusters of 5 spikes, each spike being the activity per layer. Attached is also the report produced by the validation.
The report claims a average time per sample of 73,88ms yet when checking the graph the first spike alone seems to be around that size, as if it ignores the remaining layers (See A in the picture below). The average time per sample when including callbacks on the other hand states 1380,10ms which seems to match the time from the first spike in the 5 spike series until the end of the more noisy activity (see B below). Am I correct in assuming that B in this case is the "DEVICE duration" from the validation which only includes model related callbacks while C is the UART/USB communication, or does the "DEVICE duration" also include UART/USB communication and would therefore be a combination of B and C? If B is "DEVICE duration", would it then be more appropriate to count B as a single inference rather than A?
Thanks in advance,
Brian
2025-07-17 5:12 AM - edited 2025-07-17 5:27 AM
Hello !
Sorry i was vastly mistaken as i thought you used the Neural-Art accelerator (so please ignore my boring part about NPU function above).
So, in your graph if you say A is the inference, then i would say that
For the 1380,10ms, i have to double check, but this one should be the sum of sending input tensor(s)/doing the inference -and sending intermediate data through uart-/sending the output tensor(s)
Anyway if you want a bit more detail, you can call the checker.py with --debug option, you will see an output with the UART messages (and a bit of description of what is going on).
I will be back to you when i get to understand more precisely what is going on there.
2025-07-17 9:33 AM
Thank you for the answer, looking forward to hearing more!
This is especially confusing because when I run the validation on the NPU with the same model the difference between duration including and excluding callback is almost negligible. Excluding callback the duration was 16,89ms while including callback it only reached 17,10ms. Compare this to the CM55 validation where it went from 73,88ms to 1380,10ms.
2025-07-30 9:30 AM
Hi @SlothGrill,
Do you have any updates on what exactly is happening during validation and what the difference is with and without callback? And why its so different between CM55 validation and NPU validation?
2025-07-31 5:25 AM - edited 2025-07-31 5:28 AM
Hello @brianon ,
sorry for the delay.
So, as exposed above:
What is strange in your data is that you say you have made a model with "only 5 layers" (and shared the stpm of it + the validation report). In fact, in the validation report there is around 700 operators implemented in software, which is not even close to "5 layers".
Could you please state what are your "5 layers" (it looks like they are not "standard layers" and as such the tool is not generating "only 5 nodes" in C :) )? Would you mind sharing the file you use as input for stedgeai ?
From what we can understand by comparing the report and the stpm,
Most likely, in "B" of your graph, there is the inference+UART transfers all around, but it is nearly impossible to untangle pure CPU inference time from UART transfer time because the sampling rate of the measures is way too small (i.e. you try to observe events that last 1ms by sampling one point each 10ms).
As proposed above, could you try doing the same experiment and try to make the sampling rate higher (i.e. high enough to observe the events duration you expect).
For the last point, when using Neural-Art, it is possible that the model is better supported by the tool for NPU than for the CPU. It may then end up with way less epochs than the number of nodes you have here. Thus, less time is spent sending data through UART -> durations w/ and w/o callbacks might be closer because of this.
Let us know if it makes sense...
Best regards.