cancel
Showing results for 
Search instead for 
Did you mean: 

[X-cube-AI] Does stm32 F/H arm coretex-M7 series hardware has TPU(tensor processing unit) for hardware accelation ?

Yjang.1
Associate II

Hello.

I want to know does stm32 cortex m7 series has TPU (tensor processing unit) for hardware acceleration. I heard that TPU has a systolic array which is a network of processors that are responsible for performing computations and passing the results across the system. It consists of a large number of processing elements(PEs) that are arranged in arrays, as illustrated above. These arrays have a high degree of parallelism and are favourable for parallel computing.

A typical Tensor Processing Unit(TPU) has two systolic arrays of size 128*128, so i think it can reduces cycles/ macc.

But in x-cube-ai documents, it is difficult to provide off-line an accurate number of CPU cycles/MACC. However, rough estimations for a 32-bit floating point C-model is ~6 cycles/MACC for Arm cortex-M7.

I think stm32 cortex m7 is just CPU not hardware accelerator. Is it right?

I want to clear explanation for this issue.

Best regards.

0693W000008ywRRQAY.jpg 

4 REPLIES 4

Thank you for answer. The manuals are vast. If you don't mind, can you tell me where to look ?

Doesn't Figure 1 diagram the core and the attached optional units? ITM, DWT, ETM,etc

Tips, Buy me a coffee, or three.. PayPal Venmo
Up vote any posts that you find helpful, it shows what's working..

0693W000008ywRWQAY.jpgI see that stm32 cortex-M7 core doesn't have TPU unit and also much less ALU & MACs than GPU. So it can't be fast when using parallel computations. But I can't understand why cycles/ MACC is 6~10 when using 32b floating point datatypes even though coretex-M7 core uses 6-stage pipelines. Why too slow.. ?