2021-04-07 06:47 PM
Hello.
I want to know does stm32 cortex m7 series has TPU (tensor processing unit) for hardware acceleration. I heard that TPU has a systolic array which is a network of processors that are responsible for performing computations and passing the results across the system. It consists of a large number of processing elements(PEs) that are arranged in arrays, as illustrated above. These arrays have a high degree of parallelism and are favourable for parallel computing.
A typical Tensor Processing Unit(TPU) has two systolic arrays of size 128*128, so i think it can reduces cycles/ macc.
But in x-cube-ai documents, it is difficult to provide off-line an accurate number of CPU cycles/MACC. However, rough estimations for a 32-bit floating point C-model is ~6 cycles/MACC for Arm cortex-M7.
I think stm32 cortex m7 is just CPU not hardware accelerator. Is it right?
I want to clear explanation for this issue.
Best regards.
2021-04-07 07:37 PM
No
2021-04-08 07:00 PM
Thank you for answer. The manuals are vast. If you don't mind, can you tell me where to look ?
2021-04-08 07:11 PM
Doesn't Figure 1 diagram the core and the attached optional units? ITM, DWT, ETM,etc
2021-04-09 01:54 AM
I see that stm32 cortex-M7 core doesn't have TPU unit and also much less ALU & MACs than GPU. So it can't be fast when using parallel computations. But I can't understand why cycles/ MACC is 6~10 when using 32b floating point datatypes even though coretex-M7 core uses 6-stage pipelines. Why too slow.. ?