2019-09-10 02:40 AM
Hello,
I am doing a project where I will be implementing a trained neural network (trained with Keras) onto a STM32F746-DISCOVERY board with X-Cube AI. The goal is to train the network to recognize audio samples converted into spectrograms. This would mean that on the microcontroller, I would need to convert the audio input into spectrogram images, and then input that into the neural network for recognition.
Does anyone have any good sources or similar projects regarding either creating spectrograms on a STM32 microcontroller or a good image recognition project on an STM32 MCU using x-cube AI?
Thank you!
Solved! Go to Solution.
2019-12-03 06:54 AM
Hello @Gerardo Trotta ,
There is no direct link between the ram size reported by the CubeAI and the application stack size. However, you can use the aiSystemPerformance application in X-CUBE-AI to evaluate the stack size requirement for NN inference. See section 9.3 Embedded C-model run-time performance in UM2526. The run-time performance will report the 'used stack'.
But of course, your project stack size requirement will be greater than the NN inference stack size. I would recommend doing a stack size analysis in your project.
For the tables, if memory is not an issue for you (the H7 has more memory than the L4), fell free to use the runtime tables generation function in ST_AI_AudioPreprocessing. For example:
#include "feature_extraction.h"
#define SAMPLE_RATE 16000U /* Input signal sampling rate */
#define FFT_LEN 2048U /* Number of FFT points. Must be greater or equal to FRAME_LEN */
#define NUM_FRAMES 14U /* Number of columns in spectrogram */
#define FRAME_LEN FFT_LEN /* Window length and then padded with zeros to match FFT_LEN. */
#define HOP_LEN 1024U /* Number of overlapping samples between successive frames. */
#define NUM_MELS 128U /* Number of mel bands */
float32_t pInBuffer[FRAME_LEN]; /* 8.0 KB */
float32_t pOutColBuffer[NUM_MELS]; /* 0.5 KB */
float32_t pOutMelSpectrogram[NUM_MELS * NUM_FRAMES]; /* 7.0 KB */
float32_t pSpectrScratchBuffer[FFT_LEN]; /* 8.0 KB */
float32_t pWindowFuncBuffer[FFT_LEN]; /* 8.0 KB */
uint32_t pMelFilterStartIndices[NUM_MELS]; /* 0.5 KB */
uint32_t pMelFilterStopIndices[NUM_MELS]; /* 0.5 KB */
float32_t pMelFilterCoefs[2020]; /* 7.9 KB */ /* Size given by S_MelFilter.CoefficientsLength */
/* Allocate buffers and structures */
arm_rfft_fast_instance_f32 S_Rfft; /* 24 B */
MelFilterTypeDef S_MelFilter; /* 48 B */
SpectrogramTypeDef S_Spectr; /* 28 B */
MelSpectrogramTypeDef S_MelSpectr; /* 8 B */
LogMelSpectrogramTypeDef S_LogMelSpectr; /* 16 B */
/*
* Python equivalent:
* librosa.feature.melspectrogram(y=y, sr=16000, n_mels=128, hop_length=1024, center=False)
*/
void Preprocessing_Init(void)
{
/* Init window function */
if (Window_Init(pWindowFuncBuffer, FRAME_LEN, WINDOW_HANN) != 0)
{
printf("Init error\n");
exit(1);
}
/* Init RFFT */
arm_rfft_fast_init_f32(&S_Rfft, FFT_LEN);
/* Init Mel filter */
S_MelFilter.pStartIndices = pMelFilterStartIndices;
S_MelFilter.pStopIndices = pMelFilterStopIndices;
S_MelFilter.pCoefficients = pMelFilterCoefs;
S_MelFilter.NumMels = NUM_MELS;
S_MelFilter.FFTLen = FFT_LEN;
S_MelFilter.SampRate = SAMPLE_RATE;
S_MelFilter.FMin = 0.0;
S_MelFilter.FMax = S_MelFilter.SampRate / 2.0;
S_MelFilter.Formula = MEL_SLANEY;
S_MelFilter.Normalize = 1;
S_MelFilter.Mel2F = 1;
MelFilterbank_Init(&S_MelFilter);
/* Init Spectrogram */
S_Spectr.pRfft = &S_Rfft;
S_Spectr.Type = SPECTRUM_TYPE_POWER;
S_Spectr.pWindow = pWindowFuncBuffer;
S_Spectr.SampRate = SAMPLE_RATE;
S_Spectr.FrameLen = FRAME_LEN;
S_Spectr.FFTLen = FFT_LEN;
S_Spectr.pScratch = pSpectrScratchBuffer;
/* Init MelSpectrogram */
S_MelSpectr.SpectrogramConf = &S_Spectr;
S_MelSpectr.MelFilter = &S_MelFilter;
}
void AudioPreprocessing_Run(int16_t *pInSignal)
{
/* Create melspectrogram */
for (uint32_t frame_index = 0; frame_index < NUM_FRAMES; frame_index++)
{
buf_to_float_normed(pInSignal + (frame_index * HOP_LEN), pInBuffer, FRAME_LEN);
MelSpectrogramColumn(&S_MelSpectr, pInBuffer, pOutColBuffer);
/* Reshape col into pOutMelSpectrogram */
for (uint32_t i = 0; i < NUM_MELS; i++)
{
pOutMelSpectrogram[i * NUM_FRAMES + frame_index] = pOutColBuffer[i];
}
}
}
Regards,
Guillaume
2019-12-06 07:37 AM
Hello @Gln .
Very clear. Thank you.
I'm debbugging now, and notice a weird situation. In aiConvertInputFloat_2_Int8,
if bufferPtr->meta_info is null, how do we transform and scale from float32 to ai_8 ?
int aiConvertInputFloat_2_Int8(const char *nn_name, const int idx,
ai_float *In_f32, ai_i8 *Out_int8)
{
if( AI_HANDLE_NULL == net_ctx[idx].handle)
{
return -1;
}
ai_buffer * bufferPtr = &(net_ctx[idx].report.inputs[0]);
ai_buffer_format format = bufferPtr->format;
int size = AI_BUFFER_SIZE(bufferPtr);
ai_float scale ;
int zero_point ;
if (AI_BUFFER_FMT_TYPE_Q != AI_BUFFER_FMT_GET_TYPE(format) &&\
! AI_BUFFER_FMT_GET_SIGN(format) &&\
8 != AI_BUFFER_FMT_GET_BITS(format))
{
return -1;
}
if (AI_BUFFER_META_INFO_INTQ(bufferPtr->meta_info)) {
scale = AI_BUFFER_META_INFO_INTQ_GET_SCALE(bufferPtr->meta_info, 0);
if (scale != 0.0F)
{
scale= 1.0F/scale ;
}
else
{
return -1;
}
zero_point = AI_BUFFER_META_INFO_INTQ_GET_ZEROPOINT(bufferPtr->meta_info, 0);
} else {
return -1;
}
for (int i = 0; i < size ; i++)
{
Out_int8[i] = __SSAT((int32_t) roundf((float)zero_point + In_f32[i]*scale), 8);
}
return 0;
}