2017-10-30 03:15 PM
I work (again) on USB Audio on STM boards. It works, I get the audio from PC to board (or vice versa), so far OK. FW works, example/demo code is helpful, no issues from a SW coding point of view.
My issue is: How to deal with two independent audio clocks?
The DAC on board gets MCLK (Word Clock) from MCU (or something else). USB audio comes in, e.g. every 1 ms there is a new audio packet, e.g. 192 bytes (48 stereo samples) when 48 KHz sample rate, 16bit stereo. But this clock is generated by the PC. The host generates this 1 ms period, independently of MCU board (and its I2S/DAC there).
So, at the end, USB and DAC on board are running with different, completely independent and not synchronized clocks.It results in following:
OK, you can try to tweak the PLL configuration, but still: both clocks are never really in sync, never identical as 48.0000xxx KHz and no way to avoid that we get more audio samples as are played via DAC (or vice versa).
There are only two options to solve the problem.
What would be your suggestion? How would you deal with two independent Audio Word Clocks?My fear is: this additional jitter - will it be audible? Could the audio sound start 'breathing'?
(eliminating jitter from USB is not the issue here, fine, with large enough buffers (and delay) - we can assume USB in is jitter free)What would be your option and opinion?
From a HW point of view, the actual correct solution would be this IMO:
Conclusion:
As it is on many of the DISCOVERY boards, where I2S clock is generated by MCU, even MCU and board allows to implement USB Audio - there remains the issue with the fact, that the audio Word Clocks are not in sync. It is not possible to sync really both in a way that the audio would be free of artefacts over a longer period of time (my goal is: at least 5 minutes free of any lost audio frame, no buffer over/underruns during 5 minutes of sound). We had to cope with discarding same samples (or filling a gap) or we had to add 'artifical' jitter. Having 5 minutes the Word Clock 'in sync' would need a quite accurate clock configuration and clock stability which is quite impossible to achieve (with MCUs).Please, tell me your thoughts and how you have solved such an issue. Thank you.Many regardsTorsten2017-10-30 09:05 PM
I am no expert in audio (more on displays), so my noobs thinking is to take the crappiest situation which is the CD Player.
The rotation of the CD is far from perfect respective to a USB clock, and bumps can lose audio track for a short time. What they do is a simple FIFO (in here maybe miliseconds to reduce lag, to calculate)?
Otherwise, as a possible patch, a more complex way would be to drop samples during silence intervals (not audible).
If USB is a composite device and not only a pure audio, the USB clock would not be so much of a reference?
Thoughts?
2017-11-02 04:40 PM
Sure, a CD player will have jitter (variation of rotation speed). To compensate jitter is 'easy', as you have mentioned: just use a FIFO with clock in and clock out, just large enough to compensate the largest variation (OK, it will add delay as drawback).
But what would you do if the rotation speed is far off, e.g. you would 'hit the break' on the CD (touch it with fingers, like slow down an old vinyl, like the disk jokeys do)? In this case, less audio samples are provided (and the sound pitch goes down). This issue cannot be resolved. Even having a feedback, like a PLL: try to trim the speed as needed for the USB clock and try to bring both 'in sync' (speed up again), does not make it 'perfect'. The feedback control might 'jitter' around some corner speeds/clocks.Never mind, the only approach I could imagine as: 'trim the I2S PLL inside MCU' - does not work, unfortunately. It generates even more distortions (clicks). I guess, when touching a PLL setting - you had to stop the clock or it would stop and re-sync. So, stopping I2S clock (Word Clock) or having phase jumps, glitches, drop outs on clock signal ... makes it even worse.
The only solution for now: drop an USB audio frame if USB is faster as the player (DAC).BTW: the dropping cannot just be done when audio is silent. I has to be done when internal buffers are full (when it happens). Otherwise, yes, you are smart: 'drop frames just during silent periods', results in: a) you had to buffer so much audio frames (assume several seconds to find a part of silence) that it does not fit into memory, b) you had to decode, analyze the audio just to find where is the 'gap of silence'. It can make it tough and will need a lot of processing power. And if really in all kinds of songs we will find such a 'gap' is questionable, it might be dependent of the sound itself (sometimes good, sometimes ugly).
Instead of dropping an audio frame, I could imagine to overwrite the previous one with the current one (which is dropping the previous one) but with 'merging' the sample words: you mix both together. But also ugly because it can generate other frequencies, phase jumps ... in audio signal. It could avoid clicks and eventually the very short time limited frequency change, artificial noise and pitch shift we cannot hear.It remains: in case of having two independent clock domains (USB or the CD player and the other is the DAC), there is not a Master Word Clock - we had to live with audio distortions due to clock drifts: 'what is too much in order to fit into output pipeline must be discarded' - 'full is full'.
USB:
Sure, USB can be composite (what means: two different devices on one USB link, e.g. Audio + UART or Audio + Memory Device + Mouse etc.).USB Audio uses ISO-chronous transfer (on consumer PCs): it means: the host will generate a clock (e.g. 1 KHz from a 48 MHz USB Master Clock) and sends every 1 ms a USB frame containing audio (this periodical transmission is the clock, there is not really a clock signal in USB). This is the way how common USB Audio works.There are also other USB modes, e.g. Asynchronous (or Adaptive): here, a return channel, the feedback would tell the host: 'hey, you are too fast, throttle down a bit'. Find here:http://www.hifi-advice.com/blog/audiophile-insights/digital-info/usb-audio-synchronous-asynchronous/
Just: these modes need support, in the audio device (USB device) as well as on the host, potentially a special driver for the PC.
Therefore, the audio professionals, the recording, mastering studios ... use always a Word Clock Master, a special accurate clock generator and they feed the clock to all other devices which are now in sync with the Master Clock. But for simple, consumer based, USB based audio systems - there is not such an approach (and would not work to have a separate clock distribution via wires).
BTW: in network based audio, e.g. DANTE, AES67, it is possible to provide this Master Word Clock to all the devices.The bigger the clock difference or drift (even depending on temperature) the worse the audio (more dropped or missing audio frames). Therefore, sometimes audiophile people think: 'I need an atomic clock' (on both sides to do it right). Sure, if host PC would play with atomic clock reference, and DAC will decode and play with own but also atomic clock reference - the audio drop out happens much seldom, maybe just once a day, or maybe once a year (depends on the atomic clock stability and accuracy).
But it will still happen, for sure. Even atoms are not so accurate and there is this quantum theory, Heisenberg Uncertainty Principle: nothing can be really identical and in sync (over a longer or infinite period of time).Again: the only potential solution is still: one side had to adjust its clock. And there has to be a feedback mechanism to tell or figure out when and how to trim. Unfortunately, it does not seem to work to trim the I2S PLL inside STM MCUs seamlessly (a quite challenging clock design is needed inside the silicon which would make it expensive, and just needed when audio is actually used).