cancel
Showing results for 
Search instead for 
Did you mean: 

Quantized Gemma Model Inference on STM32MP257F-DK Board

ramkumarkoppu
Senior

Hi,

Could you share documentation or examples for running quantized foundational models (e.g. Google Gemma) on the STM32MP257F-DK—first in Python, then in C/C++ using the STM32MP2 NPU? Specifically:

  • Does the STM32MP2 NPU support transformer-based architectures, or is it limited to CNNs (like the STM32N6)?

  • Which inference frameworks are supported for GenAI on this platform? does this NPU ported by ST for llama.cpp ?

Sorry, couldn't find required info from STM32 MPU wiki pages.

Thanks!

 

 

0 REPLIES 0