cancel
Showing results for 
Search instead for 
Did you mean: 

Deploying Multiple X-CUBE-AI Models on STM32: Input Sharing & API Management Best Practices?

Stone_chan
Associate II

Hello everyone,
I’m currently deploying multiple models on STM32 using X-CUBE-AI (for example: first using an MLP for a coarse classification “Gas / Not Gas”, then a second model for detailed gas class identification). I’d like to ask about two main topics:

 

Question 1: If multiple models have the same input shape/size, can they “share the same input buffer”?

  • Both models have identical input dimensions, data type (float32), and layout (e.g. 1x60 vector).

  • I’d like to point both models’ ai_buffer.data to the same float in[60], then call ai_*_run sequentially (never in parallel).

  • Is this approach safe/supported in X-CUBE-AI? Could any model overwrite the input buffer during inference (theoretically it shouldn’t, but I’d like to confirm with ST/others)

 

Question 2: What is the best way to use and manage APIs with multiple models?

When two (or more) models coexist on the same MCU:

A. Keep completely separate weights[] and activations[] (independent), each with its own create/init/run/destroy.
B. Is it possible/recommended to share the activation memory (only when models run sequentially)? If yes, is this officially supported and safe?

Regarding ai_mnetwork (multi-model manager) and accessing the models’ native I/O descriptors:

  • My understanding: I can use ai_mnetwork_get_private_handle() to obtain each sub-model’s private handle, then call ai_<model>_inputs_get() / ai_<model>_outputs_get() to retrieve the native ai_buffer arrays, avoiding manual shape/format filling.

  • Is this the officially recommended workflow? Or is it better practice to use the report structure and manually fill shapes and formats?

 

 

Summary of What I’d Like to Confirm:

  1. Is input buffer sharing safe and supported when models are executed sequentially?

  2. For outputs with different formats (float, int, logits, one-hot), is it always recommended to rely on *_get_report() or *_outputs_get() instead of hardcoding?

  3. Can activations be shared safely between models if they are strictly serialized? Any official examples or caveats?

  4. Is using ai_mnetwork the recommended way to manage multiple models (vs. manually handling multiple sets of ai_buffer)?

Thanks a lot for any guidance, best practices, or references to official examples/docs!

Stone_chan_0-1755322441064.pngStone_chan_1-1755322465443.pngStone_chan_2-1755322476560.png

Stone_chan_3-1755322527546.png

Stone_chan_4-1755322539176.png

 

5 REPLIES 5
Stone_chan
Associate II

And How to use the  second  model , It would be better if there is a example code

Hello @Stone_chan,

 

I transmitted your questions internally.

 

In term of example code, the x-cube-n6-ai-hand-landmarks demo uses 2 models (hand detection + landmark), so you can look at the source code of this application: https://www.st.com/en/development-tools/stm32n6-ai.html

 

In the meantime, I also think that these documentations bring some information:

https://stedgeai-dc.st.com/assets/embedded-docs/embedded_client_stai_api.html 

https://stedgeai-dc.st.com/assets/embedded-docs/stneuralart_programming_model.html#multiple-models-support 

 

Have a good day,

Julian

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
jean-michel.d
ST Employee

Hello @Stone_chan,,

Is it in the context where the deployed modes are mapped on the ST Neural ART accelerator (STM32N6 target) or in a "classical" CortexM core (or MCU mode only)? I suppose the MCU mode.

Concerning the question 1:

- During the execution of a given model, in particular when the inputs/outputs buffers are allocated in the activations buffer, they can be overwritten during the inferences (overlay feature to optimize the RAM usage). They are used in place and can be used to store intermediate results. In the case where no inputs/outputs allocation option is used, input buffers are not re-used but the application has no way to know when the buffer is no more read.

Concerning the question 2:

- If the deployed models are executed sequentially (guarantee by the application), the activations buffer can be shared safely (activation buffer is considered as a scratch buffer). Weights (const data) are specific for each deployed models. For the input/output buffers, this approach can be the same.

Regarding ai_mnetwork (multi-model manager), this module is not really designed/supported for this purpose. This module is only used for aiSystemPerf/aiValidation test-application to support multiple models (sequential only mode).

Jean-Michel

Hi,  jean-michel.d

Now I in ApplicationTemp mode , code can generate ai_network_entry_t related structures and APIs , and there are no problems with execution

 this module is not really designed/supported for this purpose => So how do you recommend running model A before running model B?

hi , Julian E.

thank your relpy , I will go to learn more about the relevant information

Regards,