NPU and Chiplets: Where ST Could Go and What the MCU Market Really Needs

MKNY · ‎2025-09-18

Hi everyone,

First off, I want to thank STMicroelectronics for delivering a great product — the STM32N6. This is the first STM32 MCU with a built-in Neural-ART Accelerator (NPU), and it really changes the game for edge AI:

The NPU is a dedicated inference engine running up to ~1 GHz, delivering ~600 GOPS.

It enables on-chip AI processing without the need for external accelerators. Perfect for IoT, embedded systems, and power-sensitive applications.

Chiplet architectures and a new role for the NPU

Sooner or later, MCUs will move toward chiplet-based designs, where multiple dies are packaged together. In such architectures, the NPU could do more than accelerate AI — it could act as a data router:

With its stream-processing engine, AXI-4 interfaces, and DMA with 2D addressing and scatter-gather, the NPU can flexibly manage data flows.

Imagine a package with two MCUs, each with an NPU: one NPU could handle routing (redirecting interrupts or data streams to a less loaded CPU), while the other NPU remains dedicated to AI tasks.

Advantages:

Better yield and lower scrap rates — a faulty chiplet can be replaced without throwing away the entire package.

Flexible configurations — need more AI horsepower? Add another NPU chiplet.

Lower overhead — hardware routing reduces software bottlenecks and buffering.

The “blob” issue and why trust matters

STM32N6 on its own is already a very strong product. But what holds it back for many developers are the closed binary blobs.

I understand the reasoning — security, IP protection. But what kind of “security” are we talking about? In the MCU world, the biggest risk isn’t a hacker breaking in, it’s a developer making mistakes because of missing or incomplete documentation.

Without sufficient docs, developers are forced to guess or rely on unreliable sources — and that often leads to errors far more dangerous than what blobs are meant to prevent.

Developer trust and loyalty are built on openness and clarity. A “we built it, take it as is” approach doesn’t lead to long-term success.

ST also offers MPU lines (STM32MP1/MP2), which combine Cortex-A with external DDR. These boards are positioned closer to single-board computers....

But the single-board computers approach has its downsides:

External DDR requires complex routing, signal integrity tuning, higher power consumption, and more points of failure.

More layers on the PCB, more expensive BOM, trickier bring-up and initialization.

This makes sense where you actually need an OS, multimedia, or large RAM.

But when the task is control, IoT, real-time response, or low power, the MCU approach is superior:

Everything important is integrated — SRAM, Flash, peripherals.

Simpler schematics, fewer components, fewer board-level mistakes.

Lower power, higher reliability.

ST’s strength is in MCUs. The STM32N6 is a step in the right direction — bringing NPU integration and real on-chip AI to the MCU space.

But the future should be about chiplet architectures for scalability,

And most importantly, open, complete documentation that developers can rely on.

At the end of the day, all we want is to buy a good product. And STM32 remains the area where ST has its strongest position.

MKNY · ‎2025-09-18

Thank you for your understanding.