If you don't have a wide sweet spot, are you really an audiophile?


Hi, it’s me, professional audio troll. I’ve been thinking about something as my new home listening room comes together:

The glory of having a wide sweet spot.

We focus far too much on the dentist chair type of listener experience. A sound which is truly superb only in one location. Then we try to optimize everything exactly in that virtual shoebox we keep our heads in. How many of us look for and optimize our listening experience to have a wide sweet spot instead?

I am reminded of listening to the Magico S1 Mk II speakers. While not flawless one thing they do exceptionally well is, in a good room, provide a very good, stable stereo image across almost any reasonable listening location. Revel’s also do this. There’s no sudden feeling of the image clicking when you are exactly equidistant from the two speakers. The image is good and very stable. Even directly in front of one speaker you can still get a sense of what is in the center and opposite sides. You don’t really notice a loss of focus when off axis like you can in so many setups.

Compare and contrast this with the opposite extreme, Sanders' ESL’s, which are OK off axis but when you are sitting in the right spot you suddenly feel like you are wearing headphones. The situation is very binary. You are either in the sweet spot or you are not.

From now on I’m declaring that I’m going all-in on wide-sweet spot listening. Being able to relax on one side of the couch or another, or meander around the house while enjoying great sounding music is a luxury we should all attempt to recreate.
erik_squires
I don't agree with that mijostyn.  Imaging comes from both volume cues (predominant in most multi-channel studio recordings by far), and timing from a proper stereo microphone setup which is rather uncommon.  This is a long post, but all relevant.


With good dispersion and non-symmetric toe in, you can get reasonably accurate volume cues over a wider range.  That provides two significant mechanisms for location,  1) Relative volume level,  and 2) Frequency dependent head shading. 


What you can't compensate for is timing, but there are two issues, a) Was timing even captured, and b) Can timing be conveyed with speakers in a traditional two channel audio setup, both because of the extreme accuracy needed in head placement, and the inability to prevent sound from one speaker reaching the opposite ear.

0.1" of head miss position = 1. 6 degrees of timing inaccuracy0.5"  = 8.2 degrees2"  = 32.7 degrees


So lets say you are sitting 10 feet back from the center line of your speakers at 60 degrees. A 2 degree toe-in difference only represents about 3.8 degrees of image movement, and the movement will be true for all sounds. I.e. the image shifts left or right.  If the toe-in is symmetric, 3.8 degrees represents moving your head left-right about 3".  At 5 degree toe-in difference, you are looking at 10 degree offset, and about 7" of side-side head movement (14 inches total range). You just moved from the best seats, to pretty good seats.

Of course much of this is all literally fuzzy anyway. When you have your speakers at 60 degrees, head shading to both ears creates an improper center image. You may have recorded timing information, but because you have no cross-talk cancellation, you have a secondary timing event about 0.2 seconds later confusing the brain on whether that is the event, an echo, etc. The singer (continuous tones) are properly placed, but perhaps a bit fuzzy due to aforementioned issues of shadowing for volume, and the drum hit off to the side, gets confused in the false secondary timing event.

Oh, so it is easy ... ya no. There is one other huge issue in capturing timing difference in stereo microphones. You are now playing back the same signal delayed in time between two speakers. Guess what that does when it hits the head?  Filtering!  Comb filtering effects will be evident and significant as the fixed timing delay reinforces and cancels depending on the frequency.   Oh, but it gets even better ... I mean worse. Where timing only contributed spatial cues at <1,500 Hz, those new comb filtering effects you generated are now across the frequency range. You think you widened the stereo image, but really you created an auditory illusion of space that is not representative of the timing recorded.  The timing becomes a level difference perception.  *** Note that now, head accuracy becomes far less critical ***


And just to be clear, stereo speakers attempting to reproduce timing can't place the image outside the speakers (see crosstalk above).  Of note also, timing only really works at <1,500Hz, and predominantly <1,000Hz.  So to all those "phase" "phase" "phase" people, less posting, more learning, and for those buying or making speakers, keep the crossover out of the 200-1500Hz range if you can.

So what can be done?
- Signal processing akin to noise cancellation, but in this case, to reduce cross-talk
- Headphones with signal processing to replicate the body functions (head shading, reflections, etc) that are lost without an audio field.

Audio2design, thank you for your post. You are mostly right. It is all about timing and volume. You are also probably right about certain situations.
The vast majority of recording is done multi micing, not with stereo microphones. Then it becomes all about volume differentials between the channels, to where the sound was mixed. Now the timing event becomes paramount and that can happen only when your head is equidistant from the speakers that are properly balance (volume) Unless you prefer to go the ambisonic route. Your central nervous system was designed to work with head shading. It increases the volume differential between the ears allowing more accurate location of the threat. Timing also changes. In order to produce an accurate image you have to be equidistant from speakers balance correctly and both speakers have to have the exact same frequency response curve. Very few systems meet all these criteria and do not image as well as is theoretically possible. Yes, the way the recording was done influences all of this. 
With a good system one can sit comfortable in a chair and enjoy an accurate image. If you move side to side enough you will hear the center image melt. With line source speakers you can move all the way to a side wall and the instruments mixed to the other side will still be loud and clear coming from that side as if you were at a concert but the center image will be vague. With point source speaker the volume drops off much more acutely with distance so the center image shifts entirely to the side you are on including instruments in the center channel but mixed a little to the opposite side.

I use line source ESLs which have been digitally corrected and produce identical frequency response curves. I frequently have to adjust the balance with different records a few dB to improve the focus, something you would never notice in most systems because the image specificity is just not there. Volume and timing have to match up!
As you would expect some recordings produce better images than others. Mono records can not be listened to from the listening position.
It sounds like you are listening through a crack in a door, weird. I sit off center when I listen to mono. Everything opens up. 
I have listened to corrected point source speakers particularly a friends Watt/Puppy JL Audio subwoofer system and dead on center it produces a beautiful miniature image. Move off center and it falls apart as you would expect. 
It is sort of the exact opposite of what the OP says, the more noticeable the sweet spot the better the system. If you can not differentiate the exact center from two feet over your system is not imaging. Some people may be happier this way. Ignorance is bliss.
 
Your central nervous system was designed to work with head shading. It increases the volume differential between the ears allowing more accurate location of the threat. Timing also changes. In order to produce an accurate image you have to be equidistant from speakers balance correctly and both speakers have to have the exact same frequency response curve. Very few systems meet all these criteria and do not image as well as is theoretically possible. Yes, the way the recording was done influences all of this.


I think we are predominantly in agreement and acoustic cross-talk cancellation is an area of both academic and professional research for me.  Of note, the speakers I PM'ed you about have some ability to correct frequency response both direct and reflected.

The volume differential from head shading is critical, at >1,500Hz, but practically within limited range, more than just a perfect sweet spot, you can achieve this if that is your goal. It is conditional on speaker dispersion, or when you correct you will create as many problems as you solve.

W.R.T. timing, the current literature, and consensus on whether timing in recordings is accurately portrayed in a stereo speaker setup is debatable and the argument is leaning towards how the timing information is perceived is not what was captured. The reasons I illustrated above, but the biggest being crosstalk and filtering due to reinforcement and cancellation from the same sound having different arrival times. This is best illustrated by comparing timing panning using speakers and headphones, both with narrow band (<=1000Hz) and wider band signals. Lots of trade-offs too, going wider on the speakers, can improve extraction of timing detail, but screws up other location aspects and hurts the center image. Go narrower and you get a more accurate center image. The reality is 2 channel via speakers is imperfect. Signal processing will get us closer to reality, but uphill commercial struggle, and has its technical issues. More speakers just increases cross-talk issues, but more speakers working under concepts of ambisonics has the potential to move us forward.



Audio2design, thank you for your post. You are mostly right. It is all about timing and volume. You are also probably right about certain situations.
The vast majority of recording is done multi micing, not with stereo microphones. Then it becomes all about volume differentials between the channels, to where the sound was mixed. Now the timing event becomes paramount and that can happen only when your head is equidistant from the speakers that are properly balance (volume) Unless you prefer to go the ambisonic route. Your central nervous system was designed to work with head shading. It increases the volume differential between the ears allowing more accurate location of the threat. Timing also changes. In order to produce an accurate image you have to be equidistant from speakers balance correctly and both speakers have to have the exact same frequency response curve. Very few systems meet all these criteria and do not image as well as is theoretically possible. Yes, the way the recording was done influences all of this.
Half truth....

The missing half is in acoustic science and called the first frontwave law related to the different  possible thresholds timing  of direct and reflected waves and their interpration by the ears..

Imaging is not first a fact in digital recording tech. but in acoustic first...

I created my own mechanical equalizer for balancing the timing of the  different  waves  without microphone... It worked so well my imaging i call depth imaging fill the room...My measured standard is the range of the human voice and his timbre perceived by the ears...Not a a set of very narrow testing frequencies for a very minute location of the head using a mic... 


 Then imaging is FIRST : timing + the law of the first wavefront..
After that you can speak of timing+volume ...

missing this point is complete reversal and misunderstanding of the phenomena...

Acoustic neurophysiology is FIRST  recording engineering second for the explanation....  
Post removed