Spectrogram recognition with X-Cube AI on STM32F746

ALiss · ‎2019-09-10

Hello,

I am doing a project where I will be implementing a trained neural network (trained with Keras) onto a STM32F746-DISCOVERY board with X-Cube AI. The goal is to train the network to recognize audio samples converted into spectrograms. This would mean that on the microcontroller, I would need to convert the audio input into spectrogram images, and then input that into the neural network for recognition.

Does anyone have any good sources or similar projects regarding either creating spectrograms on a STM32 microcontroller or a good image recognition project on an STM32 MCU using x-cube AI?

Thank you!

Gln · ‎2019-12-03

Hello @Gerardo Trotta ,

There is no direct link between the ram size reported by the CubeAI and the application stack size. However, you can use the aiSystemPerformance application in X-CUBE-AI to evaluate the stack size requirement for NN inference. See section 9.3 Embedded C-model run-time performance in UM2526. The run-time performance will report the 'used stack'.

But of course, your project stack size requirement will be greater than the NN inference stack size. I would recommend doing a stack size analysis in your project.

For the tables, if memory is not an issue for you (the H7 has more memory than the L4), fell free to use the runtime tables generation function in ST_AI_AudioPreprocessing. For example:

#include "feature_extraction.h"
 
#define SAMPLE_RATE 16000U /* Input signal sampling rate */
#define FFT_LEN      2048U /* Number of FFT points. Must be greater or equal to FRAME_LEN */
#define NUM_FRAMES     14U /* Number of columns in spectrogram */
#define FRAME_LEN  FFT_LEN /* Window length and then padded with zeros to match FFT_LEN. */
#define HOP_LEN      1024U /* Number of overlapping samples between successive frames. */
#define NUM_MELS      128U /* Number of mel bands */
 
float32_t pInBuffer[FRAME_LEN];                      /* 8.0 KB */
float32_t pOutColBuffer[NUM_MELS];                   /* 0.5 KB */
float32_t pOutMelSpectrogram[NUM_MELS * NUM_FRAMES]; /* 7.0 KB */
float32_t pSpectrScratchBuffer[FFT_LEN];             /* 8.0 KB */
float32_t pWindowFuncBuffer[FFT_LEN];                /* 8.0 KB */
uint32_t  pMelFilterStartIndices[NUM_MELS];          /* 0.5 KB */
uint32_t  pMelFilterStopIndices[NUM_MELS];           /* 0.5 KB */
float32_t pMelFilterCoefs[2020];                     /* 7.9 KB */ /* Size given by S_MelFilter.CoefficientsLength */
 
/* Allocate buffers and structures */
arm_rfft_fast_instance_f32 S_Rfft;          /* 24 B */
MelFilterTypeDef           S_MelFilter;     /* 48 B */
SpectrogramTypeDef         S_Spectr;        /* 28 B */
MelSpectrogramTypeDef      S_MelSpectr;     /*  8 B */
LogMelSpectrogramTypeDef   S_LogMelSpectr;  /* 16 B */
 
/*
 * Python equivalent:
 * librosa.feature.melspectrogram(y=y, sr=16000, n_mels=128, hop_length=1024, center=False)
 */
 
 
void Preprocessing_Init(void)
{
  /* Init window function */
  if (Window_Init(pWindowFuncBuffer, FRAME_LEN, WINDOW_HANN) != 0)
  {
    printf("Init error\n");
    exit(1);
  }
 
  /* Init RFFT */
  arm_rfft_fast_init_f32(&S_Rfft, FFT_LEN);
 
  /* Init Mel filter */
  S_MelFilter.pStartIndices = pMelFilterStartIndices;
  S_MelFilter.pStopIndices  = pMelFilterStopIndices;
  S_MelFilter.pCoefficients = pMelFilterCoefs;
  S_MelFilter.NumMels   = NUM_MELS;
  S_MelFilter.FFTLen    = FFT_LEN;
  S_MelFilter.SampRate  = SAMPLE_RATE;
  S_MelFilter.FMin      = 0.0;
  S_MelFilter.FMax      = S_MelFilter.SampRate / 2.0;
  S_MelFilter.Formula   = MEL_SLANEY;
  S_MelFilter.Normalize = 1;
  S_MelFilter.Mel2F     = 1;
  MelFilterbank_Init(&S_MelFilter);
 
  /* Init Spectrogram */
  S_Spectr.pRfft    = &S_Rfft;
  S_Spectr.Type     = SPECTRUM_TYPE_POWER;
  S_Spectr.pWindow  = pWindowFuncBuffer;
  S_Spectr.SampRate = SAMPLE_RATE;
  S_Spectr.FrameLen = FRAME_LEN;
  S_Spectr.FFTLen   = FFT_LEN;
  S_Spectr.pScratch = pSpectrScratchBuffer;
 
  /* Init MelSpectrogram */
  S_MelSpectr.SpectrogramConf = &S_Spectr;
  S_MelSpectr.MelFilter       = &S_MelFilter;
 
}
 
void AudioPreprocessing_Run(int16_t *pInSignal)
{
  /* Create melspectrogram */
  for (uint32_t frame_index = 0; frame_index < NUM_FRAMES; frame_index++)
  {
    buf_to_float_normed(pInSignal + (frame_index * HOP_LEN), pInBuffer, FRAME_LEN);
    MelSpectrogramColumn(&S_MelSpectr, pInBuffer, pOutColBuffer);
    /* Reshape col into pOutMelSpectrogram */
    for (uint32_t i = 0; i < NUM_MELS; i++)
    {
      pOutMelSpectrogram[i * NUM_FRAMES + frame_index] = pOutColBuffer[i];
    }
  }
}

Regards,

Guillaume

Gerardo Trotta · ‎2019-12-06

Hello @Gln .

Very clear. Thank you.

I'm debbugging now, and notice a weird situation. In aiConvertInputFloat_2_Int8,

if bufferPtr->meta_info is null, how do we transform and scale from float32 to ai_8 ?

int aiConvertInputFloat_2_Int8(const char *nn_name, const int idx,
                               ai_float *In_f32, ai_i8 *Out_int8)
{
  if( AI_HANDLE_NULL == net_ctx[idx].handle)
  {
      return -1;
  }
  ai_buffer * bufferPtr   = &(net_ctx[idx].report.inputs[0]);
  ai_buffer_format format = bufferPtr->format;
  int size  = AI_BUFFER_SIZE(bufferPtr);
  ai_float scale ;
  int zero_point ;
 
  if (AI_BUFFER_FMT_TYPE_Q != AI_BUFFER_FMT_GET_TYPE(format) &&\
    ! AI_BUFFER_FMT_GET_SIGN(format) &&\
    8 != AI_BUFFER_FMT_GET_BITS(format))
  {
      return -1;
  }
  if (AI_BUFFER_META_INFO_INTQ(bufferPtr->meta_info)) {
      scale = AI_BUFFER_META_INFO_INTQ_GET_SCALE(bufferPtr->meta_info, 0);
      if (scale != 0.0F)
      {
         scale= 1.0F/scale ;
      }
      else
      {
        return -1;
      }
      zero_point = AI_BUFFER_META_INFO_INTQ_GET_ZEROPOINT(bufferPtr->meta_info, 0);
  } else {
      return -1;
  }
 
  for (int i = 0; i < size ; i++)
  {
    Out_int8[i] = __SSAT((int32_t) roundf((float)zero_point + In_f32[i]*scale), 8);
  }
  return 0;
}