Extracting 24-bit Audio from INMP441: Very Low Amplitude and High Distortion Issue

crazyblueer · ‎2025-03-26

Hello gurus,

I am in desperate need of your help. I am trying to extract audio from an INMP441 microphone using a Nucleo STM32F4RE11 and store it as a WAV file on an SD card. According to the datasheet, the microphone outputs signed PCM 24-bit audio in MSB-first order, wrapped in a 32-bit frame.

To handle this, I have configured the I²S "Data and Frame Format" to "32 Bits Data on 32 Bits Frame", attempting to extract the 24-bit data from the 32-bit frame and convert it to Little Endian format. However, I am facing issues with the extracted audio:

While I can recognize my voice's timbre, the amplitude is extremely low, and the waveform appears very weak.
When I previously tested "16 Bits Data on 32 Bits Frame", my voice sounded unnaturally deep, almost like a male voice.

I suspect there may be an issue with my bit extraction or size allocation. Here's a brief summary of my setup:

I declared an int32_t variable to store the raw I²S data and used the following function to record audio:

HAL_I2S_Receive_DMA(&hi2s3, (int32_t *)data_i2s, sizeof(data_i2s)/4);

However, I am concerned that this might be incorrect since the function prototype expects a uint16_t * pointer. Given that I’m handling int32_t data, I’m unsure of the correct way to manage this typecast.
For extracting the 24-bit data, I have performed bit-shifting and swap the bytes to store in Little Endian format, but I’m not sure that I’ve done it correctly because the extracted audio I got is so low.

I have attached screenshots and code snippets for better context. I would greatly appreciate any insights on how to debug this issue.

Thank you so much!

void write2wave_file(uint8_t *data, uint16_t data_size) {

uint32_t temp_number;

myprintf("Writing...\n");

static int first_time = 0;

if (first_time == 0) {

fres = f_write(&fil, (void *)wav_file_header, sizeof(wav_file_header), (UINT *)&temp_number);

if (fres != FR_OK) {

myprintf("Header write error: %d\n", fres);

f_close(&fil);

return;

}

first_time = 1;

}

int sample_read = data_size / sizeof(int32_t); // 32-bit container for 24-bit data

for (int i = 0; i < sample_read; i+=2) {

int32_t raw_data = ((int32_t *)data)[i] >> 8;



uint8_t swapped_data[3];

swapped_data[2] = (raw_data>>0) & 0xFF; // LSB

swapped_data[1] = (raw_data >> ‌‌ & 0xFF; 

swapped_data[0] = (raw_data >> 16) & 0xFF;

int retry = 0;

while (retry < 3) {

fres = f_write(&fil,swapped_data, 3, (UINT *)&temp_number); // Write 3 bytes per sample

if (fres == FR_OK) break;

myprintf("Write error %d, retry %d...\n", fres, retry + 1);

HAL_Delay(10);

retry++;

}

if (fres != FR_OK) {

myprintf("Write failed after retries: %d\n", fres);

f_close(&fil);

return;

}

}

wav_file_size += (sample_read / 2) * 3;

myprintf("how much: %d\n", wav_file_size);

// Periodically flush to prevent corruption

static int write_count = 0;

write_count++;

if (write_count % 10 == 0) {

f_sync(&fil);

}

}

crazyblueer · ‎2025-04-15

So sorry STM32 said I ran out of private messages so I couldn't reply to you via private dm, I hope you can see my response here.

The reason I chose mono channels is that, based on my research, separate mono microphones provide better flexibility and accuracy for sound source localization, particularly when working with microphone arrays. Techniques like GCC-PHAT or MUSIC require synchronized input from multiple mics positioned at known distances. Stereo setups are mainly optimized for audio playback or basic two-channel recording, and are not ideal for spatial processing beyond those two channels. I’ve also noticed that many published studies and open-source projects rely on mono-channel configurations for similar reasons.

Thank you so much for your thoughtful questions — I really appreciate them!

I am currently storing my data in int16_t samples and any processing length is within 512, for example FFT_LENGTH with 50% overlapping.

However, to be completely honest, I’m still quite new to hardware-level optimization, especially when it comes to RAM allocation, synchronization, and system resource planning on STM32. I haven’t yet set up a fully systematic approach to memory allocation for this project, so your questions are pushing me to think more carefully about it.

If you don’t mind, I’d love it if you could point me toward the most important questions or principles I should be asking myself when working with STM32 hardware for real-time audio applications. That way, I can start doing more structured research and make better decisions moving forward. Thank you again for asking me in such a constructive way!

alfsch · ‎2025-04-16

Hi,

just i was reading, on U5 series cpu :

>seems issue with processing delay, for NN=128, at 8KHz, it takes almost 16-17ms for processing 16ms of audio (128 samples), need to select mcu with more processing power.

And you have also a H7 board avail ?

You should check at first, how much time your processing needs - and you need it real time, continuous or not?

(Time you could check with DWT , or if in ms range , with systick. )

+

Again: using both channels of an I2S can be stereo, but can also be just 2 channels with separate mics.

crazyblueer · ‎2025-04-17

Thank you so much! I am having a look at it. Yes I have a board STM32H755ZI-Q. I am trying to configure the SAI and I2S on it but so far lots of bugs! I can't even get proper receiving character data through usart3 :( the printing I got is just continuous white spaces or "?" characters. Even though I just configured everything like I did in the old STMF32F4