USB Audio and I2S MCLK? Independent Word Clocks!

Torsten Jaekel · ‎2017-10-30

Posted on October 30, 2017 at 23:15

I work (again) on USB Audio on STM boards. It works, I get the audio from PC to board (or vice versa), so far OK. FW works, example/demo code is helpful, no issues from a SW coding point of view.

My issue is: How to deal with two independent audio clocks?

The DAC on board gets MCLK (Word Clock) from MCU (or something else). USB audio comes in, e.g. every 1 ms there is a new audio packet, e.g. 192 bytes (48 stereo samples) when 48 KHz sample rate, 16bit stereo. But this clock is generated by the PC. The host generates this 1 ms period, independently of MCU board (and its I2S/DAC there).

So, at the end, USB and DAC on board are running with different, completely independent and not synchronized clocks.

It results in following:

it is a bit tricky to synchronize the USB with the Audio DAC, the DMAs for it, it depends when USB kicks on, when it will get a packet with audio frames
this is completely unrelated to each other: USB frames can be ready when the DMA HALF_FULL or FULL happen at the same time, if we are not smart - there might be a race condition (think also about preemptive RTOS or interleaved interrupts) - OK, possible to handle and solve this 'USB kicks in in a random way and never in 'sync' with DAC DMAs'.
but biggest issue is the clock itself: both clocks are never identical, they drift over a period of time!
It means: USB In can be a bit faster as the DAC will play out the audio (or vice versa). So, periodically you can get an 'overrun', you get more audio samples as DAC can play (due to the clock setting).
And this drift is caused by two parts: the PC might not be accurate (a bit off or it jitters a bit) in terms of nominal Word Clock, and the MCU I2C PLL, clock generation is not 'accurate', even it could be different from board to board. Or the MCU MCLK clock has jitter (unknown if and how much).

OK, you can try to tweak the PLL configuration, but still: both clocks are never really in sync, never identical as 48.0000xxx KHz and no way to avoid that we get more audio samples as are played via DAC (or vice versa).

There are only two options to solve the problem.

What would be your suggestion? How would you deal with two independent Audio Word Clocks?

OK, if USB is a bit too fast - and a new USB frame (e.g. with 48 stereo samples) does not fit - drop it. This is my current approach and it seems to work acceptable: on regular music I do not hear an effect.
But it is actually not really nice (if you would analyze LineOut audio and you expect a perfect sine wave sent from host PC - it will have some 'phase jumps', e.g. for me approx. every 10 seconds a piece of audio is missing.
OK, let's trim the PLL. Reconfigure the I2C PLL in MCU (if MCU provides the MCLK for DAC). Hoping it will be possible to increment/decrement a PLL divider for I2S PLL without stalling the clock or generating clock glitches or phase jumps.
My concerns are these:

The PLL can be set just with integer dividers, not so fine granular as with a PLL able to take fractional dividers. OK, it might result is a larger frequency change (audio pitch shift at the end), but potentially still small enough not to realize by human ear.
My thinking is: assuming it will work, set PLL slower or faster depending on situation what happens on the relation between USB input buffers and DAC output buffers - it will add jitter! (for sure):
I do not lose anymore audio samples but the audio clock, the MCLK and therefore the output signal will see small changes on frequency. Here in simple case: jumping between two MCLK frequencies, always toggling around the nominal word clock but not anymore with 'correct nominal' clock. Let's assume, instead of the accurate 48KHz it toggles now between 47.8 and 48.2 KHz (with question how often/fast).

My fear is: this additional jitter - will it be audible? Could the audio sound start 'breathing'?

(eliminating jitter from USB is not the issue here, fine, with large enough buffers (and delay) - we can assume USB in is jitter free)

What would be your option and opinion?

a) drop a small frame (or 'merge' with the very latest, in case USB is a bit faster)?
b) adjust the PLL and create an 'adaptive clock recovery'? (a feedback from USB 'clock' to MCLK generation for DAC)

From a HW point of view, the actual correct solution would be this IMO:

the USB should be the master clock: generate a clock from the incoming ISO-chronous (1 KHz) USB clock and use it in order to generate the MCLK for the DAC (e.g. via a PLL chip)
use a real PLL which allows also fractional dividers so that the fine tuning of the MCLK is much more fine granular, the jitter much smaller (changing more seldom), very smooth and not so drastic (as on MCU I2S PLL with integer dividers and uncertainty how the clock will behave on MCU)
or: use a real USB-audio receiver chip, as USB-to-I2S and connect the DAC and this chip via I2S (the SAI peripherals), instead of MCLK from MCU - I2S clocks come from external clock source, e.g. here the USB bridge.

Conclusion:

As it is on many of the DISCOVERY boards, where I2S clock is generated by MCU, even MCU and board allows to implement USB Audio - there remains the issue with the fact, that the audio Word Clocks are not in sync. It is not possible to sync really both in a way that the audio would be free of artefacts over a longer period of time (my goal is: at least 5 minutes free of any lost audio frame, no buffer over/underruns during 5 minutes of sound). We had to cope with discarding same samples (or filling a gap) or we had to add 'artifical' jitter. Having 5 minutes the Word Clock 'in sync' would need a quite accurate clock configuration and clock stability which is quite impossible to achieve (with MCUs).

Please, tell me your thoughts and how you have solved such an issue. Thank you.

Many regards

Torsten

S.Ma · ‎2017-10-30

Posted on October 31, 2017 at 05:05

I am no expert in audio (more on displays), so my noobs thinking is to take the crappiest situation which is the CD Player.

The rotation of the CD is far from perfect respective to a USB clock, and bumps can lose audio track for a short time. What they do is a simple FIFO (in here maybe miliseconds to reduce lag, to calculate)?

Otherwise, as a possible patch, a more complex way would be to drop samples during silence intervals (not audible).

If USB is a composite device and not only a pure audio, the USB clock would not be so much of a reference?

Thoughts?

Torsten Jaekel · ‎2017-11-02

Posted on November 02, 2017 at 23:40

Sure, a CD player will have jitter (variation of rotation speed). To compensate jitter is 'easy', as you have mentioned: just use a FIFO with clock in and clock out, just large enough to compensate the largest variation (OK, it will add delay as drawback).

But what would you do if the rotation speed is far off, e.g. you would 'hit the break' on the CD (touch it with fingers, like slow down an old vinyl, like the disk jokeys do)? In this case, less audio samples are provided (and the sound pitch goes down). This issue cannot be resolved. Even having a feedback, like a PLL: try to trim the speed as needed for the USB clock and try to bring both 'in sync' (speed up again), does not make it 'perfect'. The feedback control might 'jitter' around some corner speeds/clocks.

Never mind, the only approach I could imagine as: 'trim the I2S PLL inside MCU' - does not work, unfortunately. It generates even more distortions (clicks). I guess, when touching a PLL setting - you had to stop the clock or it would stop and re-sync. So, stopping I2S clock (Word Clock) or having phase jumps, glitches, drop outs on clock signal ... makes it even worse.

The only solution for now: drop an USB audio frame if USB is faster as the player (DAC).

BTW: the dropping cannot just be done when audio is silent. I has to be done when internal buffers are full (when it happens). Otherwise, yes, you are smart: 'drop frames just during silent periods', results in: a) you had to buffer so much audio frames (assume several seconds to find a part of silence) that it does not fit into memory, b) you had to decode, analyze the audio just to find where is the 'gap of silence'. It can make it tough and will need a lot of processing power. And if really in all kinds of songs we will find such a 'gap' is questionable, it might be dependent of the sound itself (sometimes good, sometimes ugly).

Instead of dropping an audio frame, I could imagine to overwrite the previous one with the current one (which is dropping the previous one) but with 'merging' the sample words: you mix both together. But also ugly because it can generate other frequencies, phase jumps ... in audio signal. It could avoid clicks and eventually the very short time limited frequency change, artificial noise and pitch shift we cannot hear.

It remains: in case of having two independent clock domains (USB or the CD player and the other is the DAC), there is not a Master Word Clock - we had to live with audio distortions due to clock drifts: 'what is too much in order to fit into output pipeline must be discarded' - 'full is full'.

USB:

Sure, USB can be composite (what means: two different devices on one USB link, e.g. Audio + UART or Audio + Memory Device + Mouse etc.).

USB Audio uses ISO-chronous transfer (on consumer PCs): it means: the host will generate a clock (e.g. 1 KHz from a 48 MHz USB Master Clock) and sends every 1 ms a USB frame containing audio (this periodical transmission is the clock, there is not really a clock signal in USB). This is the way how common USB Audio works.

There are also other USB modes, e.g. Asynchronous (or Adaptive): here, a return channel, the feedback would tell the host: 'hey, you are too fast, throttle down a bit'. Find here:

http://www.hifi-advice.com/blog/audiophile-insights/digital-info/usb-audio-synchronous-asynchronous/

Just: these modes need support, in the audio device (USB device) as well as on the host, potentially a special driver for the PC.

Therefore, the audio professionals, the recording, mastering studios ... use always a Word Clock Master, a special accurate clock generator and they feed the clock to all other devices which are now in sync with the Master Clock. But for simple, consumer based, USB based audio systems - there is not such an approach (and would not work to have a separate clock distribution via wires).

BTW: in network based audio, e.g. DANTE, AES67, it is possible to provide this Master Word Clock to all the devices.

The bigger the clock difference or drift (even depending on temperature) the worse the audio (more dropped or missing audio frames). Therefore, sometimes audiophile people think: 'I need an atomic clock' (on both sides to do it right). Sure, if host PC would play with atomic clock reference, and DAC will decode and play with own but also atomic clock reference - the audio drop out happens much seldom, maybe just once a day, or maybe once a year (depends on the atomic clock stability and accuracy).

But it will still happen, for sure. Even atoms are not so accurate and there is this quantum theory, Heisenberg Uncertainty Principle: nothing can be really identical and in sync (over a longer or infinite period of time).

Again: the only potential solution is still: one side had to adjust its clock. And there has to be a feedback mechanism to tell or figure out when and how to trim. Unfortunately, it does not seem to work to trim the I2S PLL inside STM MCUs seamlessly (a quite challenging clock design is needed inside the silicon which would make it expensive, and just needed when audio is actually used).