The way the brain locates the source of a sound is very complex and involves comparing what comes in to both ears. The brain considers the timing difference between the sound reaching the two ears, the amplitude difference, and the spectral difference (how the balance of frequencies) of sound predominantly hitting one side of the face vs. the sound reach the opposite ear by diffracting around the head. A sound emanating from the left speaker is located there by such comparison of volume, timing, and frequency shift.
There are recording mixing algorithms that create fake cues to fool the brain to locate sound well outside of the speaker, and sometimes behind the listener. This is a pretty complex operation that includes sending out of phase signals from one speaker to cancel the sound from the other to achieve the desired manipulation of timing, volume, and frequency shift. A commercial product called QSound was used in a number of releases that had remarkably wide soundstages. My favorite example is Roger Waters’ “Amused to Death.”
Sometimes I will hear something well outside the speakers and even behind me for a brief moment, like the whipcrack mentioned above. I suspect it is a case of such false cues being accidentally generated by room reflections and the recording.

