Appropiate board for real time sound classification

AlbertG · ‎2024-09-06

Hello,

I was trying to do sound classification with a STEVAL-STWINBX1. As using a NN directly in a microcontroller is impossible due to ram constrains, I decided to do a basic mel spectrogram (no full mfcc, I skip the dct and log stuff in order to speed up the process). Even when reducing the sample rate to around 16kHz, this chip is waaay to slow, like needs 4+ seconds for every second of sound, which makes it impossible to use in real time tasks (it is slow in every step of the process).And that's without the classifier, preprocessing only.

I was wondering what chip is capable of doing STFT fast enough to use in real time environments.

Thanks in advance,

Albert

PD: I am not sure this is the appropiate subforum. I apologize in advance if that is not the case.

liaifat85 · ‎2024-09-06

You can consider offloading some processing tasks to a more powerful companion chip, like an FPGA or a co-processor designed for signal processing. Another approach could be to preprocess the audio data on a more capable device (like a Raspberry Pi or a similar SBC) and then send it to the microcontroller for classification.

Chris21 · ‎2024-09-06

STM32H7 series is based on the 32-bit Arm Cortex®-M7 core, running at up to 600 MHz with up to 1.4 Mbytes of SRAM.

AScha.3 · ‎2024-09-06

Hi,

what is a "sound classification" ?

And if you do this on a PC with 8GB ram and 4 ...16 cores at 3 GHz - dont expect to run it on any chip with some 200KB ram and 200MHz speed on one core.

> what chip is capable of doing STFT fast enough to use in real time environments

Any cpu at 200MHz and some KB ram can do this - but its more like an 1K FFT in 1ms (at 16bit fixed point precision).

And you have to use optimized routines , for the selected cpu.

see for example :

https://github.com/Stenzel/FFT4CM4F

If you feel a post has answered your question, please click "Accept as Solution".

AlbertG · ‎2024-09-09

what is a "sound classification" ?

I grab sound from the onboard microphone, I preprocess it and run a tiny NN to classify what is listening into 4 diferent categories. The NN is very fast and reasonably accurate.

And if you do this on a PC with 8GB ram and 4 ...16 cores at 3 GHz - dont expect to run it on any chip with some 200KB ram and 200MHz speed on one core.

STM says it is possible to run real time sound classification with their AI microcontrollers. So some of their lineup should be able to do it.

Any cpu at 200MHz and some KB ram can do this - but its more like an 1K FFT in 1ms (at 16bit fixed point precision).
And you have to use optimized routines , for the selected cpu.
see for example :
https://github.com/Stenzel/FFT4CM4F

Ok, so I just shouldn't use CMSIS-DSP library directly? Because a 2048 point FFT with a hop size 512 needs like 8ms, which is mad slow. That's what I was doing. I will check your link, thanks.