Audio2design, thank you for your post. You are mostly right. It is all about timing and volume. You are also probably right about certain situations.Half truth....
The vast majority of recording is done multi micing, not with stereo microphones. Then it becomes all about volume differentials between the channels, to where the sound was mixed. Now the timing event becomes paramount and that can happen only when your head is equidistant from the speakers that are properly balance (volume) Unless you prefer to go the ambisonic route. Your central nervous system was designed to work with head shading. It increases the volume differential between the ears allowing more accurate location of the threat. Timing also changes. In order to produce an accurate image you have to be equidistant from speakers balance correctly and both speakers have to have the exact same frequency response curve. Very few systems meet all these criteria and do not image as well as is theoretically possible. Yes, the way the recording was done influences all of this.
The missing half is in acoustic science and called the first frontwave law related to the different possible thresholds timing of direct and reflected waves and their interpration by the ears..
Imaging is not first a fact in digital recording tech. but in acoustic first...
I created my own mechanical equalizer for balancing the timing of the different waves without microphone... It worked so well my imaging i call depth imaging fill the room...My measured standard is the range of the human voice and his timbre perceived by the ears...Not a a set of very narrow testing frequencies for a very minute location of the head using a mic...
Then imaging is FIRST : timing + the law of the first wavefront..
After that you can speak of timing+volume ...
missing this point is complete reversal and misunderstanding of the phenomena...
Acoustic neurophysiology is FIRST recording engineering second for the explanation....