2025-04-25 3:03 AM - edited 2025-04-25 7:37 AM
Hi,
Could you share documentation or examples for running quantized foundational models (e.g. Google Gemma) on the STM32MP257F-DK—first in Python, then in C/C++ using the STM32MP2 NPU? Specifically:
Does the STM32MP2 NPU support transformer-based architectures, or is it limited to CNNs (like the STM32N6)?
Which inference frameworks are supported for GenAI on this platform? does this NPU ported by ST for llama.cpp ?
Sorry, couldn't find required info from STM32 MPU wiki pages.
Thanks!