Quantized Gemma Model Inference on STM32MP257F-DK ...

Hi,

Could you share documentation or examples for running quantized foundational models (e.g. Google Gemma) on the STM32MP257F-DK—first in Python, then in C/C++ using the STM32MP2 NPU? Specifically:

Does the STM32MP2 NPU support transformer-based architectures, or is it limited to CNNs (like the STM32N6)?
Which inference frameworks are supported for GenAI on this platform? does this NPU ported by ST for llama.cpp ?

Sorry, couldn't find required info from STM32 MPU wiki pages.

Thanks!

0 REPLIES 0

Quantized Gemma Model Inference on STM32MP257F-DK Board