speakers for 24/96 audio


is it correct to assume that 24/96 audio would be indistinguishable from cd quality when listened to with speakers with a 20khz 3db and rapid hi frequency roll-off?

Or more precisely, that the only benefit comes from the shift from 16 to 24 bit, not the increased sample rate, as they higher freq content is filtered out anyhow?

related to this, which advice would you have for sub $5k speakerset with good higher freq capabilities for 24/96 audio?

thanks!
mizuno

Showing 10 responses by almarg

Irvrobinson 6-30-11:
I still haven't heard a piano recording superior to the ancient Telarc CD of Malcom Frager playing Chopin.
If you can find it, try Wilson Audio WCD-9129, Chopin's Piano Sonata No. 3 in B Minor, Op. 58 (and other shorter works), performed by Hyperion Knight. The best reproduction of solo piano in my experience, and it's on a 1991 redbook cd!

Best regards,
-- Al
BTW, I should have added to my previous post that I am in agreement with all of the technical points Kijanki has made, which I think have been very well presented. An additional point which I don't think has been mentioned is that brickwall anti-aliasing filters introduce some degree of ripple into the passband frequency response characteristics, as I understand it.

The audible significance of all of the effects that have been mentioned, though, is perhaps unanswerable in a definitive manner, given the extent to which those effects tend to be overshadowed by variability in recording engineering and quality.

Best regards,
-- Al
07-01-11: Shadorne
Of course, in a studio the signals are manipulated - this creates the need for even greater dynamic range (24 bit or 144 dB) - not that they will necessarily have better S/N but they may want to boost some sounds by 20 dB or so and may apply digital filters (the accuracy of said filters improves significantly if you have more bits)
Excellent point!
06-29-11: Kijanki
... Nyquist-Shannon theorem requires infinite amount of terms (samples). Fixing it with sin(x)/x works poorly for short bursts around 1/2 of the sampling frequency. Sound of instruments producing continuous sound might be not affected (like flute) but anything with transients will sound wrong (piano, percussion instr. etc).

06-30-11: Kijanki
Closer you get to Nyquist frequency the more samples you need to properly reconstruct original waveform - not possible to do for short high frequency sounds.

07-01-11: Shadorne
Not so. The waveform is perfectly reconstructed. The mathematics are quite rigorous. The main issue with digital is

1. Anti alias filtering (higher frequencies must be eliminated prior to ADC or they can fold in)
2. Jitter

Both of the above add spurious non musical signals. Both can be managed.
In theory Kijanki is correct. An infinitely long series of samples is required for the mathematics to work out perfectly. The consequences of that will be most significant for spectral components that are transient and that approach the Nyquist frequency (i.e., half the sample rate).

The extent to which that may be audibly significant on most recordings is probably conjectural. The Wilson Audio cd I referenced, among many others, leads me to believe that in general it is not a major factor as a practical matter.

Shadorne is of course correct, IMO, in emphasizing the significance of anti-alias filtering and jitter.

Best regards,
-- Al
Thanks, Kijanki. The one thing I would question in your comment is the word "huge." I'm sure that a suitably chosen test waveform comprising a very short burst of high frequency energy, and put through a 44.1kHz a/d + d/a, can result in an error that will appear huge when viewed on an appropriate time scale. But as the saying goes the proof is in the pudding, and I've felt amazed at times at how good SOME cd's that contain a lot of transient high frequency energy can sound.

Best regards,
-- Al
Kijanki & Shadorne, you're both basically right but you're referring to different things.

Shadorne is alluding to the fact that a low pass reconstruction filter will smooth out the steps and restore an essentially perfect sine wave, if the original analog input was a sine wave at a frequency slightly less than the Nyquist rate (or lower). Of course, the filter itself may have significant side effects, but that is another subject.

Kijanki was alluding to the fact that if the analog input is a brief transient lasting for a limited number of samples and having spectral components approaching the Nyquist frequency, then the mathematics won't work out ideally no matter how ideal the reconstruction process is. Which is correct, although as I said earlier whether or not that may be audibly significant with worst case material (e.g., high frequency percussion) is probably a matter of conjecture. Admittedly, the video does not directly relate to Kijanki's point.

As far as the relation between low sample rates and quantization noise is concerned, while lower sample rates would obviously result in coarser steps in the sampled (unreconstructed) waveform, I think that Irv is basically correct to the extent that the reconstruction process can be accomplished ideally. However, given the possible effects on high frequency transients that we've been discussing, that may result from having a limited number of samples, and given the non-idealities of real-world filters, I suppose there could be some second-order relation between sample rate and quantization noise. It's been a long time since I took the relevant courses. :-)

Best regards,
-- Al
Shadorne, I am in essential agreement with your last post. As I said earlier:
An infinitely long series of samples is required for the mathematics to work out perfectly. The consequences of that will be most significant for spectral components that are transient and that approach the Nyquist frequency (i.e., half the sample rate).

The extent to which that may be audibly significant on most recordings is probably conjectural. The Wilson Audio cd I referenced, among many others, leads me to believe that in general it is not a major factor as a practical matter.
I particularly second your statement that:
... the graphical representation of waveforms and the "digital staircase" form one of the biggest and most enduring audiophile myths that analog is inherently better than digital.
BTW, the following excerpts from the technical notes accompanying the Wilson Audio cd I referenced (which I indicated provides the best reproduction of solo piano in my experience) may be of general interest:
The recorded perspective of the piano in this recording is close, as though the 9' Hamburg Steinway in being played for you in your living room. Of course the actual recording was not made in a living room! Instead, the great room of Lucasfilm's Skywalker Ranch, with its incredibly low noise floor and fully adjustable acoustics, was used.... A pair of Sennheiser MKH-20 omni microphones were employed ... amplified by two superb pure class-A microphone preamps custom-built for Wilson Audio by John Curl. MIT cable carried the balanced line level signal to Wilson Audio's Ultramaster 30 ips analog recorder. Subsequent digital master tapes were made through the Pygmy A/D converter on a Panasonic SV-3700.
Best regards,
-- Al
Irv, keep in mind that it is generally accepted that signal can be perceived at levels that are significantly below the level of random broadband noise that may accompany the signal. 15db or more below, iirc. So amplifier noise floor is not really a "floor" below which everything is insignificant.

Also, quantization noise is significantly correlated with the signal, at low signal levels, and is therefore perceived as distortion rather than noise. Dithering will minimize that effect, but it has its limitations and my understanding is that it is often not properly applied.

That said, I think we are all in agreement that the main usefulness of 24 bits is in the creation of the recording.

Regards,
-- Al
Hi Bryon,

Interesting question, and an interesting paper, which I read through. It strikes me as very intelligently and knowledgeably written, and I see no obvious flaws in the details he presents. And intuitively it does strike me as plausible that our ability to resolve timing-related parameters might be somewhat better than what would be suggested by the bandwidth limitations of our hearing mechanisms.

However, looking at his paper from a broader perspective I have several problems with it:

1)He has apparently established that listeners can reliably detect the difference between a single arrival of a specific waveform, and two arrivals of that waveform that are separated by a very small number of microseconds. I have difficulty envisioning a logical connection between that finding, though, and the need for hi rez sample rates. There may very well be one, but I don’t see it.

2)By his logic a large electrostatic or other planar speaker should hardly be able to work in a reasonable manner, much less be able to provide good reproduction of high speed transients, due to the widely differing path lengths from different parts of the panel to the listener’s ears. Yet clean, accurate, subjectively "fast" transient response, as well as overall coherence, are major strengths of electrostatic speakers. The reasons are fairly obvious – very light moving mass, that can start and stop quickly and follow the input waveform accurately; no crossover, or at most a crossover at low frequencies in the case of electrostatic/dynamic hybrids; freedom from cone breakup, resonances, cabinet effects, etc. So it would seem that the multiple arrival time issue he appears to have established as being detectable under certain idealized conditions can’t be said on the basis of his paper to have much if any audible significance in typical listening situations.

3)More generally, it seems to me that there are so many theoretical, practical, recording-dependent, and equipment-dependent variables that would have to be reckoned with and controlled in any attempt to make a meaningful comparison involving hi rez vs. redbook sample rates, that reaching a definitive conclusion about the degree to which this particular factor may be audibly significant under real-world listening conditions is probably not possible.

All best regards,

--Al
Hi Bryon,

Your question about the audibility of jitter that is on a time scale far shorter than the temporal resolution of our hearing is a good one. The answer is that we are not hearing the nanoseconds or picoseconds of timing error itself. What we are hearing are the spectral components corresponding to the FLUCTUATION in timing among different clock periods (actually, among different clock half-periods, since both the positive-going and negative-going edges of S/PDIF and AES/EBU signals are utilized), and their interaction with the spectral components of the audio.

For example, assume that the worst case jitter for a particular setup amounts to +/- 1 ns. The amount of mistiming for any given clock period will fluctuate within that maximum possible 1 ns of error, with the fluctuations occurring at frequencies that range throughout the audible spectrum (and higher). That is all referred to as the "jitter spectrum," which will consist of very low level broadband noise (corresponding to random fluctuation) plus larger discrete spectral components corresponding to specific contributors to the jitter.

Think of it as timing that varies within that +/- 1 ns or so range of error, but which varies SLOWLY, at audible rates.

All of those constituents of the jitter spectrum will in turn intermodulate with the audio data, resulting in spurious spectral components at frequencies equal to the sums of and the differences between the frequencies of the spectral components of the audio and the jitter.

If you haven't seen it, you'll find a lot of the material in this paper to be of interest (interspersed with some really heavy-going theoretical stuff, which can be skimmed over without missing out on the basic points):

http://www.scalatech.co.uk/papers/aes93.pdf

Malcolm Hawksford, btw, is a distinguished British academician who has researched and written extensively on audiophile-related matters.

One interesting point he makes is that the jitter spectrum itself, apart from the intermodulation that will occur between it and the audio, will typically include spectral components that are not only at audible frequencies, but that are highly correlated with the audio! He also addresses at some length the question of how much jitter may be audible.

So to answer your last question first, no, I don't think that the audibility of jitter on a nanosecond or picosecond scale has a relation to the plausibility of Kunchur's claim.

As far as point no. 1 in my previous post is concerned, yes I think that the quote you provided about closely spaced peaks being merged together does seem to provide a logical connection between his experimental results and a rationale for hi rez sample rates. It hadn't occurred to me to look at it that way. So that point would seem to be answered.

Best regards,
-- Al
What is the principal advantage of higher sampling rates, if it is not better temporal resolution?
Yes, as Shadorne indicated the principal advantage is that it dramatically relaxes the rolloff requirements for anti-aliasing filters (in the recording process) and reconstruction filters (in the playback process). Or it makes it possible to avoid the use of techniques that have been used to relax those requirements, which have their own tradeoffs (e.g., oversampling + noise shaping).

It should be kept in mind that not only will 44.1kHz sampling be unable to capture signal frequencies at or above 22.05kHz, but the a/d converter used in the recording process must not be exposed to those frequency components. Otherwise "aliasing" will occur, resulting in those ultrasonic frequencies appearing in the digital data as audible frequencies.

Therefore an a/d converter that doesn't use oversampling or other special techniques must be preceded by a low pass filter that is flat to 20kHz, but has rolled off to the point of inaudibility in about 1/10th of an octave, at 22.05kHz. That is an EXTREMELY sharp rolloff, and, besides being expensive to manufacture, that kind of filter can have the sonic effects Kijanki described above in his post of 6/27, and the effect described in my second post of 6/30.

In contrast, 96kHz sampling would make it possible to allow more than a full octave for the same rolloff to occur (at 48kHz rather than 22.05kHz).

Similar considerations apply to the playback process, with respect to the "reconstruction filter," which refers to a low pass filter used to eliminate the stepped character of the d/a converter device's output.

Best regards,
-- Al