cancel
Showing results for 
Search instead for 
Did you mean: 

How to import sounds to NanoEgde AI Studio

DmitryR
Senior

Good day,

 

recently I have watched a presentation telling that the NanoEdge AI Studio may be used for voice recognition. I have installed the latest version but when I came to the data importing stage it says that it may only import text and CSV files. My sound library that I would like to recognize is however in MP3 format. I could convert it to WAV but obviously not to text. Could anyone please tell me how to import already collected data to this software? 

 

With best regards,

Dmitry

3 REPLIES 3
Julian E.
ST Employee

Hi @DmitryR,

 

I don't think that you will get good results in NanoEdge, you could try to get your WAV file and cut them in small sample + convert into CSV. NanoEdge only take samples of the same size and not very big ~4096 values per sample.

You would also need to clean your data for the training: data with voice vs without.

And I would not guarantee the results.

 

I would advise looking at ST Model Zoo and Audio event detection in particular.

GitHub - STMicroelectronics/stm32ai-modelzoo: AI Model Zoo for STM32 devices · GitHub

NanoEdge uses simple machine learning models, but it does autoML (find the model for you). With model zoo the approach is to train a neural network and convert it to C code with the ST Edge AI Core.

Audio event detection is not really made for that, but I think you will get better results.

 

The best approach would be for you to train your own AI model of voice recognition and convert it with ST Edge AI Core again (you can use STM32CubeAI Studio for that, it is more friendly for users as it has an interface and you can easily validate the model directly on target in one click).

The difficulty here will not be to train a model; there are plenty of tutorial for that on google + chatGPT to find the kind of model to use. The main concern will be to use a model compatible with our ST Edge Core, to be able to generate the C code. Here is the list of supported layers:

https://stedgeai-dc.st.com/assets/embedded-docs/4.0.0/index.html#supported-layersoperators 

 

Have a good day,

Julian


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.

Dear @Julian E. ,

 

thank you. I already have enough short (1-2 seconds) samples. With sample rate of 8kHz it gives up to 16K samples. I have tested, Studio can import this much. 

 

I have checked model zoo. It is very rudimental, has even no such basic functions like FFT. Good for demo but not for real work.

 

Yes I probably could train my own model, but it is very time consuming process. I would like to use the Studio as it offers quick result. The only problem I see is a lack of data formats it accepts.

 

With best regards,

Dmitry

 

 

Hi @DmitryR,

 

I totally agree, one of the main advantages of NanoEdge is that it helps to create a poc very fast as most of the things are handled by the tool.

Our current solutions are more focused on vision for Model Zoo and vibration/current for NanoEdge but we see more and more demand on audio related stuff. So, we are working on it, especially for model zoo. 

I will share your need to better handle audio with model zoo, in terms of input data but also in terms of models.

 

And as you say, using model zoo would require doing the preprocessing/postprocessing by yourself.

 

Thank you for your comments,

Have a good day

Julian

 

 


In order to give better visibility on the answered topics, please click on 'Accept as Solution' on the reply which solved your issue or answered your question.