cancel
Showing results for 
Search instead for 
Did you mean: 

Quantized Gemma Model Inference on STM32MP257F-DK Board

ramkumarkoppu
Senior

Hi,

Could you share documentation or examples for running quantized foundational models (e.g. Google Gemma) on the STM32MP257F-DK—first in Python, then in C/C++ using the STM32MP2 NPU? Specifically:

  • Does the STM32MP2 NPU support transformer-based architectures, or is it limited to CNNs (like the STM32N6)?

  • Which inference frameworks are supported for GenAI on this platform? does this NPU ported by ST for llama.cpp ?

Sorry, couldn't find required info from STM32 MPU wiki pages.

Thanks!

 

 

4 REPLIES 4
JS_PWS
Associate

Hello

We are currently evaluating hardware options and have the same question. Can somebody from ST answer it here?

 

Thank you and best regards
Jan

Steven-LIN
Associate III

Hello,

I have the same question. Is it possible to run LLMs on the STM32MP2 series?

Additionally, what is the expected performance/inference efficiency?

Thanks!

VABRI
ST Employee

Hello, 

The NPU architecture of the STM32MP2 series does not support transformer based architecture models.
LLM models can be run using the CPU.

The framework supported by the X-LINUX-AI package are listed in this wiki page:
https://wiki.st.com/stm32mpu/wiki/Category:X-LINUX-AI_expansion_package

BR

 

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

An example of local running LLM on STM32MP257F-EV1 (as said, using CPU only).
https://www.linkedin.com/posts/danilopietropau_another-great-example-of-llm-on-stm32mp2-activity-7293222309333495809-Qe0d

Regards.

In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.
NEW ! Sidekick STM32 AI agent, see here