2021-09-12 02:22 PM
I'm running your acoustic_scene_classification.ipynb, available in FP-AI-SENSING1, the Middleware ASC (acoustic scene classification):
https://www.st.com/en/embedded-software/fp-ai-sensing1.html
When I try to import and convert data, I get the following error warning:
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-9-87cd88dbb12e> in <module>
1 dataset_dir = './Dataset'
2 meta_path = path = os.path.join(dataset_dir, 'TrainSet.txt')
----> 3 fileset = np.loadtxt(meta_path, dtype=str)
4
5 # 3 classes : 0 indoor, 1 outdoor, 2 in-vehicle
C:\ST\Anaconda3\lib\site-packages\numpy\lib\npyio.py in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, like)
1063 raise ValueError("Wrong number of columns at line %d"
1064 % line_num)
-> 1065
1066 # Convert each value according to its column and store
1067 items = [conv(val) for (conv, val) in zip(converters, vals)]
C:\ST\Anaconda3\lib\site-packages\numpy\lib\_datasource.py in open(path, mode, destpath, encoding, newline)
192 """
193
--> 194 ds = DataSource(destpath)
195 return ds.open(path, mode, encoding=encoding, newline=newline)
196
C:\ST\Anaconda3\lib\site-packages\numpy\lib\_datasource.py in open(self, path, mode, encoding, newline)
529 _fname, ext = self._splitzipext(found)
530 if ext == 'bz2':
--> 531 mode.replace("+", "")
532 return _file_openers[ext](found, mode=mode,
533 encoding=encoding, newline=newline)
OSError: ./Dataset\TrainSet.txt not found.
....................................................................................................................
TrainSet.tx is inside the Dataset folder, as you settled.
Did you fix this issue ?
Thanks
Solved! Go to Solution.
2021-09-23 12:32 PM
Could you please comment the statement in the ASC pack: " Slice Data into frames
Each frame will contain 16,896 samples (32 * 512 + 512) to create a 32 column spectrogram with n_fft=1024 and hop_length=512"
You are using 30-second audio data, sampled at 16 kHz in your file. Why did you choose this particular shape for each frame ?
Thanks,
M
2021-09-24 01:23 AM
the main reason is that framing was the best trade off between model accuracy and RAM footprint on the platform implementation. Increasing the number of columns (to cover longer time slices) leads to -> bigger spectrograms -> bigger input to the NN -> more trainables -> bigger RAM footprint.
2021-09-24 02:01 AM
If I work with 5- second time waves, should I decrease the number of columns ? E.g. from your 32 to 10 ?
2021-09-24 02:14 AM
This can be an option to minimize the input shape size of the CNN network
but You would need also to retrain the model provided with your data set
let me know your findings !
2021-10-28 05:13 AM
Well, I left the number of columns at 32 in my model with a dataset of 74 sounds, each one long 5 seconds. Each mel-spectrogram covers 1,024 sec in this way. I have now my TFlite file and the inference runs in my CPU too.
I have a NUCLEO -F401RE board coupled with an X-NUCLEO-CCA02M2 and the audio streaming example from MEMSMIC1.
Is there a template for classifying audio streaming with the NN ?
Thank you !
2021-11-02 06:31 AM
hello
What type of template are you referring to ? The FP already provides a NN that classifies audio streaming data.
Of course this is not a template per say, but you can use it to derive your own example with your own data set and your own NN or ML.
best regards
L
2021-11-28 03:24 PM
Did you ever prepare a guideline like this one https://wiki.st.com/stm32mcu/wiki/AI:How_to_perform_motion_sensing_on_STM32L4_IoTnode but for ASC ?
I have my own two-classes NN, keras model and the B-L475E-IOT01.
Thanks
M