2025-03-20 7:47 AM
Hi there,
this is my first post here and also the first time that I mess around with STM32 processors.
A few months ago I decided to start a new project for realtime audio processing with DSP
and the STM32H7 series seemed to me a good candidate for what I want to do.
So I bought a development board with STM32H743, a couple flash chips and a few CS4272 codecs.
I started the whole project, i wired everything and started writing the firmware.
I used as a guide a few youtube videos and soon I came up with the base code.
The problem is that maybe I have some buffer synchronization issues because I get distorted sound.
Here are some details about the project. Clock set at 480 MHz, the CS4272 is set as standalone in slave mode and connected with I2S. The audio frequency is set at 48Khz and the dataframe is 24bits at 32 bit. I have verified that all clocks are correct. The levels of the audio signal are within specs. I use circular buffer with DMA in a array of words.
At first I tried to add some reverberation on the audio signal and that worked better than what I expected. The problem is when I try to do a IIR convolution on the signal. This is where I start to hear the distorted sound. I have tried lowering the buffer lengths and also the IIR buffer but nothing really changes.
Can someone guide me with troubleshooting this? I am using STM32CubeIDE with ST-LINK debugger.
Regards.
2025-03-20 9:12 AM
No specific problem description (means: no source code), so only general advice possible:
*** the H7 is pretty powerful, but all the stuff people want from audio DSPs these days, I'd say some dedicated audio DSP with lots of "hardware accelerators" might be better for the job.
2025-03-20 11:36 AM
So I wrote a reply and it never appeared.
Anyway I will send the code again.
#define BUFFER_SIZE 8
#define FILTER_TAP_NUM 256
#define SAMPLING_FREQUENCY_HZ 48000.0f
__attribute__((aligned(32))) int32_t adcData[BUFFER_SIZE];
__attribute__((aligned(32))) int32_t dacData[BUFFER_SIZE];
static volatile __attribute__((aligned(32))) int32_t *inBufPtr;
static volatile __attribute__((aligned(32))) int32_t *outBufPtr;
static float firdata [FILTER_TAP_NUM];
static int firptr [FILTER_TAP_NUM];
static int fir_w_ptr = 0;
float Calc_FIR (float inSample) {
float inSampleF = inSample;
float outdata = 0;
for (int i = 0; i < FILTER_TAP_NUM; i++) {
outdata += (firdata[i]*cabinetIR[firptr[i]]); // cabinetIR is the FFT
firptr[i]++;
}
firdata[fir_w_ptr] = inSampleF;
firptr[fir_w_ptr] = 0;
fir_w_ptr++;
if (fir_w_ptr == FILTER_TAP_NUM) fir_w_ptr=0;
return outdata;
}
void Process_HalfBuffer() {
// Input samples
static float leftIn = 0.0f;
static float leftProcessed = 0.0f;
// Loop through half of audio buffer (double buffering), convert int->float, apply processing, convert float->int, set output buffers
for (uint16_t i = 0; i < (BUFFER_SIZE/2); i += 2) {
/*
* Convert current input samples (24-bits) to floats (two I2S data lines, two channels per data line)
*/
// Extract 24-bits via bit mask
inBufPtr[i] &= 0xFFFFFF;
inBufPtr[i + 1] &= 0xFFFFFF;
// Check if number is negative (sign bit)
if (inBufPtr[i] & 0x800000) {
inBufPtr[i] |= ~0xFFFFFF;
}
if (inBufPtr[i + 1] & 0x800000) {
inBufPtr[i + 1] |= ~0xFFFFFF;
}
// Normalise to float (-1.0, +1.0)
leftIn = (float) inBufPtr[i] / (float) (0x7FFFFF);
/*
* Apply processing
*/
//leftProcessed = leftIn; // Passthru
//leftProcessed = (1.0f - wet) * leftIn + wet * Do_Reverb(leftIn); // Reverb
//x = *DWT_CYCCNT;
leftProcessed = Calc_FIR(leftIn);
//y = *DWT_CYCCNT;
//cycles = y - x;
leftProcessed *= 1.5f; // Volume
// Ensure output samples are within [-1.0,+1.0] range
if (leftProcessed < -1.0f) {
leftProcessed = -1.0f;
} else if (leftProcessed > 1.0f) {
leftProcessed = 1.0f;
}
// Scale to 24-bit signed integer and set output buffer
outBufPtr[i] = (int32_t)(leftProcessed * 0x7FFFFF);
}
dataReadyFlag = 0;
}
void HAL_I2SEx_TxRxHalfCpltCallback(I2S_HandleTypeDef *hi2s)
{
inBufPtr = &(adcData[0]);
outBufPtr = &(dacData[0]);
//Process_HalfBuffer();
dataReadyFlag = 1;
}
void HAL_I2SEx_TxRxCpltCallback(I2S_HandleTypeDef *hi2s)
{
inBufPtr = &(adcData[BUFFER_SIZE/2]);
outBufPtr = &(dacData[BUFFER_SIZE/2]);
//Process_HalfBuffer();
dataReadyFlag = 1;
}
int main(void)
{
/* USER CODE BEGIN 1 */
/* USER CODE END 1 */
/* MPU Configuration--------------------------------------------------------*/
MPU_Config();
/* MCU Configuration--------------------------------------------------------*/
/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
HAL_Init();
/* USER CODE BEGIN Init */
/* USER CODE END Init */
/* Configure the system clock */
SystemClock_Config();
/* USER CODE BEGIN SysInit */
/* USER CODE END SysInit */
/* Initialize all configured peripherals */
MX_GPIO_Init();
MX_DMA_Init();
MX_SPI4_Init();
MX_I2S3_Init();
/* USER CODE BEGIN 2 */
/* USER CODE END 2 */
/* Infinite loop */
/* USER CODE BEGIN WHILE */
CS4272_Init();
HAL_I2SEx_TransmitReceive_DMA(&hi2s3, (uint16_t *)&dacData[0], (uint16_t *)&adcData[0], BUFFER_SIZE);
while (1)
{
if(dataReadyFlag) {
Process_HalfBuffer();
}
}
/* USER CODE END 3 */
}
And here is a screenshot of the waveform.
2025-03-20 11:53 AM
So I added a few lines to measure the cycles for the calculation, using the DWT_CYCCNT.
I changed the BUFFER_SIZE to 256 and FILTER_TAPS to 256 and the cycles for the Calc_FIR() function are ~45700.
And this is the resulting waveform.
If I lower the BUFFER_SIZE to 8 then this is the resulting waveform. Cycles are ~50000.
2025-03-20 12:19 PM
Hi,
1. whats your optimizer setting ? ( -O2 is fine, i use it always)
2. never use float in real time calculation - what you want to gain from float or double ? Your DAC anyway just converts 16 or 18 bits real, so do everything as int16_t , until its working perfect. Then maybe, try getting the last 10 dB S/N out of it, if calculation time is fast enough.
3. "DSP" you can not expect from a non-dsp-cpu , just "close - to " -- and only using INT calculations, done in CCM ram.
So try with these basic changes - and tell...
2025-03-20 1:20 PM
Based on your number 3. That means that I have chosen the wrong processor model to do what I want to do!!! Maybe the STM32F411 is a better candidate, because I just saw that it supports DSP while the STM32H743 doesnt support DSP (I just noticed it!!!!)!
2025-03-20 1:50 PM
That's not correct:
Also H7's CPU can operate at a higher clock rate.
2025-03-20 2:16 PM - edited 2025-03-20 2:46 PM
No, the H7 at 400 or 600MHz (H7S3) will be much faster than a F411.
The difference to a "real" DSP is ...
https://en.wikipedia.org/wiki/Digital_signal_processor
...and just see, whats in the "cpu" (-> soc: Snapdragon 636) of my 6y old $160 mobile phone, for only the audio processing there is a "small" DSP : Hexagon 680 , running at 500MHz , with full 4 cores;
these could run a Linux system on their own, without the other main cpu's on the chip.
Just read a little about this "small" DSP,
...then you know, whats the difference "cpu with some dsp instructions" to a "dsp with VLIW instuctions and many (!) ALUs on each of its cores....
The more recent versions -> Hexagon 698 DSP capable of 15 trillion operations per second (TOPS)
Compare this to fast cpu like H7 , doing 500 million OPS .... speed is about 0,003 % of the Hexagon698.
And its just the "helping co-processor" for audio etc , for the 8 ARM main cores at 2 GHz or so.