How to recognize and classify sounds with the STWINKT1B and Qeexo ML --- MEMS&Sensors "On-Life" series

Eleon BORLINI · ‎2021-12-21

This article describes how to build acoustical neural networks by taking advantage of the physical properties of microphone data to extract information relevant to such classification tasks, and it will show how to perform sound recognition with Qeexo AutoML and explain some of the basics concepts of our feature stack. ST portfolio includes microphones and related platforms that are capable of capturing information necessary for the variety of classification tasks that can be performed on lightweight devices.

Table of the contents

Why it is important to perform Sound Recognition?
Sound recognition with STWIN and Qeexo ML: Project Description
Sensor Configuration and Project Implementation
Parameter Configuration
Conclusion
On life!

Why is important to perform Sound Recognition?

Sound Recognition is a technology based on traditional pattern recognition theories and signal analysis methods which is widely used in speech recognition, music recognition and many other research areas such as acoustical oceanography. Generally, microphones are regarded as sufficient sensing modalities as input to machine learning methods within these fields. Microphones are capable of capturing information necessary for the variety of classification tasks that can be performed on lightweight devices.

Sound recognition with STWinKT1B and Qeexo ML: Project Description

ST portfolio offers several platforms mounting the latest MEMS microphones, such as the digital microphone MP23DB01HP and the analog microphone MP23ABS1, which can also acquire ultrasound signals up to 80kHz. With this type of sensor, the platform Qeexo AutoML provides a diverse feature stack, taking advantage of the physical properties of microphone data to extract information relevant to such classification tasks. It offers a general use user-friendly interface for engineers who wants to perform sound recognition, or any other classification task on embedded devices. This article will show you how to perform sound recognition with Qeexo AutoML and explain some of the basics concepts of our feature stack: the processes discussed here are not specific to sound recognition, but are specifically applicable to it. For this example, we will use the STEVAL-STWINKT1B is a development kit and reference design that simplifies prototyping and testing of advanced industrial IoT applications such as condition monitoring and predictive maintenance. You can check here how to configure it.

Sensor Configuration and Project Implementation

To get started, navigate to the training page and select (or upload) the labeled training data that you want to use to build models for your embedded device. In the Sensor Selection page, you can select the microphone sensor, to choose the collected data, as shown here below:

There is also the possibility to use the automatic sensor and feature group selection, if you want to use additional sensor modalities or experiment with feature subgroups. If this is selected, the tool will automatically choose the sensor and feature groups that make the classes most distinct.
In the Inference Settings page, you can manually set up the instance length and the classification interval, or let the tool determines them by selecting Determine Automatically, as shown here below.

While the process is by design very straightforward, the details of some of the choices may appear ambiguous. We will focus here on some of the feature choices applicable to sound recognition.

Parameter configuration

Fast Fourier Transform (FFT)

Signals in the time domain are difficult for humans and computers alike to distinguish among similar sound sources. One of the most popular ways to transform raw sound data is the Fast Fourier Transform (FFT). Due to the constraints of embedded devices, the FFT is an efficient frequency decomposition technique. The process is described in the figure below.

For different classes, the signals differ in their magnitudes for a given frequency bin. For example, in the four pictures below sounds generated with different instruments have different distributions of the magnitudes among the frequencies 0-800 Hz; even with differences present up to 2000 Hz.

It is a common practice that audio AI training methods will take advantage of the increased class separability in this range to train the model through model training. The most recent audio AI processes, and the Qeexo AutoML tool too, doesn’t just use all of the FFT coefficients as input in training the model, but actually aggregate the coefficients to create sophisticated features. The specific groupings can be hand-picked during the model selection process to accommodate implementation constraints. To select the features groups, simply check the box(es) in the manual feature selection page as shown below.

Mel Frequency Cepstral Coefficients (MFCC)

One of the most used features for sound recognition -together with the FFT analysis- are the Mel Frequency Cepstral Coefficients (MFCC). The rational is that humans react differently to distinct ranges of frequencies. As a species, we are more capable of telling the difference in frequencies between a 50Hz and a 100 Hz signal, than that between 10050Hz and 10100 Hz. In other words, we are really bad at distinguishing high pitched sounds (and our ears detect sounds "logarithmically"). Therefore, in situations where you want to replicate a task performed by humans, such as voice separation, the difference when the frequency is low is the most important. The value of the signal properties decreases with increasing frequency. Mel scale comes into place here, by assigning more importance to the low frequency content and less to the high frequency content. The formula for converting from frequency to Mel score is:

We build a filter bank containing many triangular filters and apply them to our FFT features to rescale the signals again and convert them to the corresponding Mel scales. In the Mel spectrograms shown below, we can see that different classes' Mel spectrograms appear to have many differences, making them ideal inputs for training a classifier.

Qeexo AutoML also provides features generated from the coefficients of MFCC, that can be selected from the above shown manual selector. If desired, you can visualize the selected features through a UMAP plot by clicking the Visualize button shown in the Sensor Selection page and Feature Group Selection page.

Based on this discussion, it should be apparent that MFCC features will work well for tasks involving human speech. Depending on the task, it may be disadvantageous to include these MFCC features if it does not share similarities with human hearing. Qeexo AutoML performs automatic feature reduction, however, when automatic selection is enabled, so this does not need to be an active concern when training models. If the MFCC features are not highly separable for the task, assuming sufficient data is provided, they will be dropped from the final model during this process.

Conclusion

Qeexo AutoML not only provides model building functionality, but also present the details of the trained models. We provide evaluation metrics like confusion matrix, by-fold cross validation, ROC curve, and even support downloading the trained model to test it elsewhere. As mentioned earlier, we provide support for, but do not limit to microphone sensor usage for sound recognition. You are free to select any other provided sensors such as accelerometer and gyroscope. If these additional sensors don’t improve model performance, they won’t be included in the final device library, through the automated sensor selection process.

On life!

Finally, in the Model Settings page, you can pick the algorithm(s), choose whether to generate learning curve and/or perform hyperparameter tuning and click Start Training button to start. After the training is finished, a binary file will be generated and can be flashed to the device by clicking the Push to Hardware button. Once the process is finished, you can perform live tests on the model that was built, as shown in this video. Enjoy!

See partner page --> https://qeexo.com/
See reference article --> https://qeexo.com/sound-recognition-with-qeexo-automl/