
On this page:
What About CD-Audio Needs Improvement?
Audio Signal Recovery with DORIS
What About CD-Audio Needs Improvement?
If you've been out of the CD-Audio-bashing loop until now, we definitely appreciate your curiosity in getting this far into the discussion. To the careful listener, CD-quality audio is not on a par with a live performance or with the quality of sound offerred by state-of-the-art recording media -- and while this has been the center of an ongoing discussion in audiophile circles since the mid-80's, many in the mainstream audio industry use terms like "CD-quality sound" as if this was the pinnicle of audio perfection. For recordings mastered originally in 16-bit / 44.1khz digital audio, it places very strict limitations on the mastering process, due to limited dynamic range, and requires severe filtering near the top of the audible frequency spectrum to avoid worse problems. And while a carefully mastered CD recording can sound quite good, and is reasonably suited to a number of listening environments, the effect of lessening digital noise and relaxing the filters in your audio system can open up new dimentions in your perception and enjoyment of music.
Anyone who has listened to audio at lower-than-CD rates, such as with 8-bit PCM samples or lower sampling rates, has heard what happens when these noises are brought more dramatically into the audio band. Still, some may argue that the noise inherent in the typical 16-bit D/A (digital-to-analog) oversample/filter process is at a low enough level as to be imperceptible. Certainly this digital format's limitations were originally chosen to put noise just beyond the fringes of our perceptability. The problem here is that digitally-created noise tends to make itself more obvious than more natural sound distortion types. Analog distortions tend to be mostly harmonic to the signal itself, and thus hard to detect subjectively, but digital distortions are made more obvious by their anharmonic relationship with the original signal. That is, digital distortions are related to a combination of the incoming signal, the bit depth (quantization threshold) and the sampling rate, and as two of these three are unrelated to the original signal, their effect stands out. These digital distortions steal energy from the music, and distract the listener from cues which the brain uses to localize sound -- reducing the clarity of the music and the depth of the soundstage in which the music is presented.
The most visible result of this can be seen on a plot of a low-level digital signal, or on the output of an oscilliscope. The distortions introduced by the digitization process add a sawtooth edge to the original signal, adding a high-frequency buzzing or fuzz to a pure tone. This noise is often not loud enough in relation to the main signal to make itself obvious, but instead results in an overall perception of "harshness" or simple "unnaturralness" in the sound, or results in a listener's general fatigue after prolonged listening. It also results in an perceived overemphasis on the existing noise in the original signal, and a shifting in the natural harmonic balance of instruments. This latter component is particularly noticable with instruments such as bells, which naturally have a sound rich in high-frequency harmonics.
The subjective effect of these distortions, which is made more prominent in stereo listening, is that they make it harder for the listener to accurrately discern a sound's location. This makes it more difficult for a listener to perceive the correct distance to the sound and to distinguish between the originator of the sound and its reflections within the recording space. For example, when listening to a recording of accoustic instruments made in a concert hall, the listener should be able to clearly distinguish the location of the sources of the sounds, both left to right and front to back, and to get a clear audible image of the space around and between the instruments and in the hall. While this realistic soundstage is approached with higher bitrate digital audio, the subtle audio queues which create the feeling of space get blurred in recordings made at lower bitrates. This makes it harder for the listener to discern the differences in location between the instrument and the echoes of that instrument and the location of other instruments in the hall. The net effect of this is that instruments appear to be collapsed in space, as if the musicians were sitting on top of each other on a very small stage, rather than distinct and playing in a large room.
Audio Signal Recovery with DORIS
The DORIS algorithms can be thought of as an improvement on traditional oversampling and resolution enhancement methods, at least in the braodest terms: It is a way to turn a signal of one bit depth and sampling rate into one of a higher bit depth and/or sampling rate, thus "higher resolution." The focus here is on the conversion from CD rates (16 bit stereo, 44.1kHz) to that of the current high-resolution audio standards (24 bit / 96 kHz or better), though the process itself is extensable to any digital format.
It is important to stress at this point that the effects of DORIS processing are limited to extracting the musical signal cleanly from a digital recording. No attempt is made to otherwise "improve upon" the recording with changes to equalization, synthesized ambience or other effects. It also cannot compensate for mastering problems, such as those with the original recording equipment, microphone placement, background noises or the performance being recorded. Most of those sorts of choices are best left to the artists and recording engineers themselves.
Traditional oversampling and resolution enhancement algorithms are variants on the same theme: a higher sampling rate and bit resolution is chosen, new samples at the higher rate are calculated between the original samples, usually by drawing lines between the original samples, then the result is run through a variety of filters to reject noise outside the audio band. Consider the test signal shown at right: this is the result of digitally sampling a sinewave near the limits of a given medium's bit-depth. For those familiar with common CD audio tests, this is often seen as an undithered 16-bit sampling of a sinewave at -90.1dbFS, a sinewave of peak amplitude 1 in this system.
The top trace is the raw signal, before oversampling and filtering. Note the "stair-step" effect created by quantization limits of the sampling: Since the digital medium does not have enough bits of resolution to represent the signal in more detail, the sinewave is represented exclusively as an oscillation of the values -1, 0 and 1.
If a typical digital audio processor were fed this signal, it would connect these existing sample points at a higher sampling rate and pass it through a low-pass filter, resulting in the middle trace. Those who are familiar with tests of digital-to-analog converters may recognize that sort of image: if that signal were to appear on the analog output of a D/A converter in a review, the reviewer might say that it had produced a "perfect sine wave". This is because it shows that there is very little analog noise introduced by the D/A converter itself, and it is faithfully reproducing the filtered digital signal. The mathemeticians among us may cringe, however, to call that a "perfect sine wave". Although it may contain very little noise above the audio band, the difference between that data and a true sine wave is noise within the audio band. That noise was generated by the loss of resolution when that signal was first converted to the digital domain.
The third trace shows the digital signal after processing with DORIS. The DORIS approach to solving this problem is to analyze the incoming signal to identify portions of the signal which have been introduced by the digitization process itself. Keeping only the portion of the signal which is not related to the noise of digitization, DORIS can then recreate this signal at any bit-depth and sampling rate.
The subjective result of DORIS-processed audio is a sound which is more musical, has a lower noise floor, and preserves the clarity and subtle audio clues with which to better perceive the music and the space around it.