How to fix audio stuttering issues in this implementation?

ankes · ‎2025-09-26

Hi all,

So, after almost half a year of arduous blood, sweat and tears I have managed to scrape together "something that works". It consists of a custom schematic and PCB powered by an STM32F042C6T6 and an Si4705 radio tuner (from Skylabs).

The STM32 uses the internal clock circuitry that is configured as depicted here. The CRS sync source is not shown here, but it is set to USB, so that the chip runs completely without any external crystals. The USB 2.0 is full-speed since the STM32 is only full-speed:

The Si4705 currently uses a precise, external 32,768 kHz crystal oscillator (SG-3030CM from Seiko) that only requires a by-pass capacitor. This crystal oscillator is not connected to the STM32 in any way.

The tuner chip has a prescaler and precise REFCLK adjustment so it can, if required, support a frequency range from 31130 Hz up to 140,89 MHz. Currently these settings are in their default values due to the crystal oscillator.

The STM32 commands the Si4705 via I2C, without DMA, and this communication works just fine. The relevant settings are shown below:

The STM32 also uses I2S and DMA to transfer data, and its settings are as shown here:

There are also several other settings that are related, and which can all be adjusted:

The Si4705 has a programmable sampling rate and bit resolution for I2S, and they are currently set to 44117 Hz (as per "Real Audio Frequency" from the STM32's I2S) and 16 bits (per channel)
The DMA buffer in STM32 is 352 bytes long
The I2S DMA transfer is initiated with value of 176 "16-bit data lengths" (whatever that means in reality)
There is a single EP1 IN that uses double-buffering, and the two PMA areas involved are both 176 bytes each
At half-complete and full-complete DMA interrupt the STM32 transfers either 176 bytes from the start of the DMA buffer, or 176 bytes from the half-point of the buffer
The USB Audio 1.0's Audio Type I Format is 2 channels, sub-frame size of 2, bit resolution of 16 and frequency of 44,1 kHz
The USB Audio's Audio Streaming Endpoint uses 176 bytes as max packet size, and interval of 1

With these in place, I have managed to obtain "almost acceptable" audio from the tuner device, recorded it, and attached both an MP3 and a WAV sample here.

However, I am not exactly sure which of all the various settings I should try to adjust in order to overcome the clear skipping/stuttering that the audio has. I have tried fiddling with all the above settings but no matter what I try I only seem to get worse results. Clearly I do not fully understand how all these various settings work together to produce the end result.

The code and schematics are available in https://github.com/anttikes/usb-fm-radio

If I look at the Wireshark output I see an almost-clear recurring phenomenon: the device first sends approximately 22 frames with 1760 packet data length (10 microframes), then several frames with 1056, 880 or 1584 data lengths, before resuming with 1760 ones. Even these "shorter" packets claim to have 10 microframes in then. This repeats around every 22-24 frames but sometimes there's up to 30 "clean" frames before the shorter ones occur again.

This looks like a a bus saturation issue to me, with the host side unable to keep up. What kind of things I could do in order to try solving this? I know I am not far off from the result.

ankes · ‎2025-10-03

After some creative tinkering with the clock settings shown below

I managed to obtain an exact 48 kHz "real audio frequency" for the I2S. This has reduced stuttering quite a bit but I still hear occasional blips here and there.

I believe the only way to get even cleaner sound is to go lower-level and ditch HAL. I am aware that the "real audio frequency" might not be exactly what I get, and this may contribute to the overall scheme of things but I am running low on flash & RAM when using HAL like this so I'd need to do it anyway.

However, it seems the "low-level USB" and "HAL PCD" in the STM32F0 HAL driver are intertwined in a way which doesn't seem obvious at first. For example, the "HAL_PCD_EP_Transmit" function makes a call to "USB_EPStartXfer" which makes perfect sense (HAL -> LL -> Registers) but then inside the "USB_EP_StartXfer" there are calls to things like "PCD_GET_ENDPOINT" and "PCD_CLEAR_BULK_EP_DBUF" which clearly reside in the "HAL side of things", despite the fact that it's just a macro defined in a header file, and not "source code", per se. I would've put it into "stm32f0xx_ll_usb.h" and named it differently.

But that's a topic for a different question I guess.

View solution in original post

AScha.3 · ‎2025-09-27

Hi,

I think you have a synchronization problem, because: who is the master, giving the clock rate for the i2s? The STM CPU internal pll clock.

But as this is USB device, the master clock for the USB is coming from the PC!

So you get a big problem: you can't change the PC to device timing, but you have to adjust the I2S sampling rate dynamically to fit it to the average data rate that is coming from the PC.

Something like resampling....

Is needed.

If you feel a post has answered your question, please click "Accept as Solution".

ankes · ‎2025-09-27

Thanks @AScha.3,

I clarify that this is a radio tuner setup. The radio chip is tuned to an FM station, receives audio transmission, ADC's it internally, and provides that data via I2S to the STM32. The tuner chip can be configured with either 8-, 16-, 20- or 24-bit resolution and programmable sampling frequency between 32000 and 48000 samples per second. The I2S on the STM32 only supports 16-, 24- and 32-bit resolutions, although I am aware that I could do some re-sampling during the I2S interrupt routines if I wanted to e.g. use the 8-bit resolution on the tuner chip.

As far as I know, USB Audio 1.0 doesn't seem to provide any sort of "rate feedback" data from the audio source to the host PC. This means that even if I were to e.g. split the I2S WS signal into a timer on the STM32 (in addition to the tuner chip itself), and this way keep accurate track of the "real sampling frequency" then there's no way for me to report this to the host.

I am also considering an approach where I would use a HSE crystal on the STM32, enable the MCO (or a timer), and use that to provide a reference clock for the tuner chip. This would prevent any "clock drift" between the STM32 and the SG-3030CM but I do not know if this would even help with the problem.

The tuner chip also supports analog output so I could, if I wanted to, create a new PCB where the analog audio traces go to the STM32 and its ADC is then used to digitize the data. This would entirely remove any clock drift between the chips.

However, my first task at hand is to identify the root cause of the problem. Why is the host receiving smaller-than-defined packages every 24-30 frames in Wireshark? I have tried switching USB ports and using a different cable but this did not affect the outcome.

ankes · ‎2025-09-27

I have managed to progress further on this matter.

Initially I had the DMA buffer sized so that it was able to hold 2 ms worth of audio data (352 bytes), and then at HT and TC interrupts I would send the first half, and then the second half. The USB Audio endpoint wMaxPacketSize was set accordingly to 176 bytes, and a bInterval value of 0x01.

I then made an adjustment so that the DMA buffer is now 4 ms long (705 bytes) and at each interrupt I again transfer half. The USB Audio endpoint wMaxPacketSize was sized up accordingly to 352 bytes and bInterval raised to 0x02.

This reduced the stutter to almost imperceptible, as shown in the attached to samples.

In Wireshark, the packet is now staying near-constant 1760, with an occasional blip down to 1408 (which is then audible as a stutter). I must be getting close. If only I just could understand better what the system is doing as a whole...

AScha.3 · ‎2025-09-27

Puh, at first: can you set the radio chip to do 48k/16b stereo?

As this is the standard for Windows or Linux mixer's and working fine, have to send every 1ms a packet with 48 sample per channel, so 48x2x2 bytes each block and it matches the 48k 16bit stereo perfect.

Remaining problem is how to sync the radio chip to this.....afaik the most simple way is: you get the set 48k rate from the radio, but write at the rate the USB is requesting the 96 words to the USB send buffer, no matter, you have really 48 sample ready or one less or one too much.

So this is the resampling for the poor, just double one sample to get the buffer full or cut one and throw it away. So next samples will match the USB speed, until the drift of the clocks will need correction again.

If you feel a post has answered your question, please click "Accept as Solution".

ankes · ‎2025-09-28

I have set the tuner to 48k / 16b stereo but this did not fully resolve the issue. The approach is not optimal either as the "real audio frequency" in STM32CubeIDE is now 48387 Hz, and the tuner chip has a max value of 48000.

When I look at Wireshark now things have now turned around, and I mainly get isochronous packets with data length of 1920. This seems correct as (48000 * 2 * 2) / 1000 yields 192, and the USB driver on the host computer uses 10 ms buffers.

Occasionally, however, I see a solitary packet with data length of 2304. I interpret this so that the device is now sending too much data compared to what the host is expecting. This extra size is also audible as a small blip in the audio stream.

I'll try fiddling with various buffering strategies to see how they work.

I have also managed to push "work so far" into GitHub, so help yourself and visit https://github.com/anttikes/usb-fm-radio to see the ugly details.

AScha.3 · ‎2025-09-28

483xx is too much away from 48000,

Try to get a better clock tree setting.

Maybe you have to use a HSE crystal, 12.288 MHz, typical for 48k.

Or a more complex CPU with more adjustment in the clock tree.

If you feel a post has answered your question, please click "Accept as Solution".

ankes · ‎2025-09-28

I'll try fiddling with the clock settings, too, thanks for the tip.

For using a HSE I'd need to design and manufacture a new board version as the current design is (in hindsight, maybe foolishly) using PF0 and PF1 for the I2C bus.

Note that my penultimate, maybe-impossible final goal is to allow the end user to select the sampling frequency from the host side, and use SAMPLING_FREQ SET_CUR instruction to convey the selection to the STM32. I am not sure if this is even possible to do, and even if it is I'll probably have to do a lot of clock math in order to fully understand what design or configuration adjustments I will need to make it happen.

I am also looking at the USB Audio 2.0 specification. Rumor has it it this spec has better support for the concept of a "clock" inside the audio function.

AScha.3 · ‎2025-09-28

On this F042 you cannot do much , its a 2$ cpu, so what you expect ?

Not even a more complex PLL here, see my setting on H743 for 44.1kHz generation from 8M HSE :

giving: very close to 44k1

also 48k can be quite good:

But still you have to do some re-sync to the PC "48kHz" USB clock.

Because the crystal in PC and any crystal you use here, will drift - about 0.001% typical, but NOT zero.

So it never can work "perfect", except you can adjust the sample rate by a PLL (stepless ! ) or use a dedidacted chip , that can do it or use other USB mode (UAC2 might work, at hi-speed mode).

But afaik no STM cpu can do this, even the most complex, or there are no working drivers .

+

>allow the end user to select the sampling frequency

Thats no problem, if you accept, it will never work really, just showing the "value" will work. Wish vs. Reality .

There are XMOS chips, that doing what you want, used in the hi-end USB sound devices.

Look for them, if you want it work without "clicks" or distortion. https://www.xmos.com/usb-multichannel-audio/

But i think, nothing at a 2$ level can do the complex task here.

+

If you want try with 48k I2S on this tiny chip, try using a HSE crystal, 12.288 MHz , HSI 48M for the USB and PLL x3 ( ~ 37 MHz core clock); check/try clocktree setting this in Cube and see: will Cube tell then 48k rate on I2S with 48k setting ? If ok, try to use some thin wires and your soldering skills and set I2C on ...other pins, HSE crystal 12.288M and 2 x12pF caps (or something similar) as HSE.

+

Seems the PIC32MZ can do I2S -> USB , also in Hi-speed.

https://www.mouser.de/new/microchip/microchip-pic32mz-mcus/?srsltid=AfmBOooU4qHckqTSwNwb1oO794CJV9xTWvV5lKTToaB1QnA3EYQfvpBe

If you feel a post has answered your question, please click "Accept as Solution".

ankes · ‎2025-10-03

After some creative tinkering with the clock settings shown below

I managed to obtain an exact 48 kHz "real audio frequency" for the I2S. This has reduced stuttering quite a bit but I still hear occasional blips here and there.

I believe the only way to get even cleaner sound is to go lower-level and ditch HAL. I am aware that the "real audio frequency" might not be exactly what I get, and this may contribute to the overall scheme of things but I am running low on flash & RAM when using HAL like this so I'd need to do it anyway.

However, it seems the "low-level USB" and "HAL PCD" in the STM32F0 HAL driver are intertwined in a way which doesn't seem obvious at first. For example, the "HAL_PCD_EP_Transmit" function makes a call to "USB_EPStartXfer" which makes perfect sense (HAL -> LL -> Registers) but then inside the "USB_EP_StartXfer" there are calls to things like "PCD_GET_ENDPOINT" and "PCD_CLEAR_BULK_EP_DBUF" which clearly reside in the "HAL side of things", despite the fact that it's just a macro defined in a header file, and not "source code", per se. I would've put it into "stm32f0xx_ll_usb.h" and named it differently.

But that's a topic for a different question I guess.