2026-03-16 4:48 AM
Good day,
recently I have watched a presentation telling that the NanoEdge AI Studio may be used for voice recognition. I have installed the latest version but when I came to the data importing stage it says that it may only import text and CSV files. My sound library that I would like to recognize is however in MP3 format. I could convert it to WAV but obviously not to text. Could anyone please tell me how to import already collected data to this software?
With best regards,
Dmitry
2026-03-16 6:36 AM
Hi @DmitryR,
I don't think that you will get good results in NanoEdge, you could try to get your WAV file and cut them in small sample + convert into CSV. NanoEdge only take samples of the same size and not very big ~4096 values per sample.
You would also need to clean your data for the training: data with voice vs without.
And I would not guarantee the results.
I would advise looking at ST Model Zoo and Audio event detection in particular.
GitHub - STMicroelectronics/stm32ai-modelzoo: AI Model Zoo for STM32 devices · GitHub
NanoEdge uses simple machine learning models, but it does autoML (find the model for you). With model zoo the approach is to train a neural network and convert it to C code with the ST Edge AI Core.
Audio event detection is not really made for that, but I think you will get better results.
The best approach would be for you to train your own AI model of voice recognition and convert it with ST Edge AI Core again (you can use STM32CubeAI Studio for that, it is more friendly for users as it has an interface and you can easily validate the model directly on target in one click).
The difficulty here will not be to train a model; there are plenty of tutorial for that on google + chatGPT to find the kind of model to use. The main concern will be to use a model compatible with our ST Edge Core, to be able to generate the C code. Here is the list of supported layers:
https://stedgeai-dc.st.com/assets/embedded-docs/4.0.0/index.html#supported-layersoperators
Have a good day,
Julian
2026-03-16 7:21 AM
Dear @Julian E. ,
thank you. I already have enough short (1-2 seconds) samples. With sample rate of 8kHz it gives up to 16K samples. I have tested, Studio can import this much.
I have checked model zoo. It is very rudimental, has even no such basic functions like FFT. Good for demo but not for real work.
Yes I probably could train my own model, but it is very time consuming process. I would like to use the Studio as it offers quick result. The only problem I see is a lack of data formats it accepts.
With best regards,
Dmitry
2026-03-16 7:44 AM
Hi @DmitryR,
I totally agree, one of the main advantages of NanoEdge is that it helps to create a poc very fast as most of the things are handled by the tool.
Our current solutions are more focused on vision for Model Zoo and vibration/current for NanoEdge but we see more and more demand on audio related stuff. So, we are working on it, especially for model zoo.
I will share your need to better handle audio with model zoo, in terms of input data but also in terms of models.
And as you say, using model zoo would require doing the preprocessing/postprocessing by yourself.
Thank you for your comments,
Have a good day
Julian