Electrical/mechanical representation of instruments and space

Help, I'm stuck at the juncture of physics, mechanics, electricity, psycho-acoustics, and the magic of music.

I understand that the distinctive sound of a note played by an instrument consists of a fundamental frequency plus a particular combination of overtones in varying amplitudes and the combination can be graphed as a particular, nuanced  two-dimensional waveform shape.  Then you add a second instrument playing, say, a third above the note of the other instrument, and it's unique waveform shape represents that instrument's sound.  When I'm in the room with both instruments, I hear two instruments because my ear (rather two ears, separated by the width of my head) can discern that there are two sound sources.  But let's think about recording those sounds with a single microphone.  The microphone's diaphragm moves and converts changes in air pressure to an electrical signal.  The microphone is hearing a single set of air pressure changes, consisting of a single, combined wave from both instruments.  And the air pressure changes occur in two domains, frequency and amplitude (sure, it's a very complicated interaction, but still capable of being graphed in two dimensions). Now we record the sound, converting it to electrical energy, stored in some analog or digital format.  Next, we play it back, converting the stored information to electrical and then mechanical energy, manipulating the air pressure in my listening room (let's play it in mono from a single full-range speaker for simplicity).  How can a single waveform, emanating from a single point source, convey the sound of two instruments, maybe even in a convincing 3D space?  The speaker conveys amplitude and frequency only, right?  So, what is it about amplitude or frequency that carries spatial information for two instruments/sound sources?  And of course, that is the simplest example I can design.  How does a single mechanical system, transmitting only variations in amplitude and frequency, convey an entire orchestra and choir as separate sound sources, each with it's unique tonal character?  And then add to that the waveforms of reflected sounds that create a sense of space and position for each of the many sound sources?

Still thinking about this.  Let me give another example. 

Like a synthesizer, you could combine a series of pure, sinusoidal, tones in a particular combination of fundamental and overtones with varying amplitudes and fine-tune it until it sounds like a flute.  Why do we hear the sound of a flute instead a bunch of sine waves at varying frequencies and amplitudes?  Why do we hear the whole instead of the components?

To reverse the example, is there a single whole sound that is the sound of the components of an orchestra, if you get my meaning?  Why do we hear the single sound as the sound of many instruments?
Hi 77jovian,

First, kudos on your thoughtful question.

IMO, though, the answer is fairly simple. When we listen to an orchestra, or some other combination of instruments and/or vocalists, what our hearing mechanisms are in fact hearing is a combination of sine waves and broadband sounds ("broadband sounds" being a combination of a vast number of sine waves), both of which of course vary widely from instant to instant in terms of their amplitudes, timings, and phase relationships.

So to the extent that the recording and reproduction chains are accurate, what is reproduced by the speakers corresponds to those combinations of sine waves and broadband sounds, and our hearing mechanisms respond similarly to how they would respond when listening to live music.

Best regards,
-- Al
I think it’s an excellent question and actually a question that has a whole lot to do with the questions I’ve been asking on another thread: What is the audio signal in the system prior to the point where the speakers produce the acoustic waveform of the entire orchestra? AND how do better speaker cables, better power cords better fuses, vibration isolation affect the “audio signal,” whatever it is.
The microphone acts no different than your ear drum or speaker cone. How a single cone produces overtones is simple. Say you are listening to a 20 hz tone. The speaker cone moves back and forth 20 times a second. For a 100 hz tone it’s a hundred times a second.

What about the two tones played at the same time to produce a different sound? As the cone moves forward 20 times per second, it also moves back and forth 100 times per second. (Wave your hand back and forth with your arm still. Then move your arm while your are waving your hand. Then walk forward while waving your hand and moving your arm. The air displaced is a pressure representation of the combined motion).

The combination of pressure waves creates one wave at any point in time and over a certain time period it contains all the other waves (overtones) creating a particular sound. Add them all together as a function of time and there is your orchestra in your living room.
A few things to consider here:

  1. You get placement information of instruments via two methods, volume and timing. If the right side is louder for a particular instrument, then that is where you perceive the instrument is coming from. That is the mechanism for placement of sounds that are continuous. The other mechanism is arrival time (between the left and right ear). That is used for transient sounds. A clap will arrive to one ear slightly before the other. Your auditory processing system is able to time that to pretty fine resolution and give you an idea of where it came from. Some posit that arrival time also plays a factor in placement of continuous sounds.
  2. The electromechanical system only stores and transmits sound waves (pressure variation). It does not transmit instruments, etc. It is your brain, knowing what a grouping of sounds mean, that is able to extract instruments and place them.
A single speaker will not convey on its own, any sense of space, but a room may create those cues (accurate or not), and your brain doesn't like an information vacuum so it will try to map what it hears onto what it knows.

I think you meant frequency and time domain. Amplitude is part of either frequency or time domain.
77jovian commits an all too common flaw in logic which since no one studied logic no one caught. Except me, of course.

Hint: "two ears"-
When I'm in the room with both instruments, I hear two instruments because my ear (rather two ears, separated by the width of my head)

Two ears. Got it?

Then, inexplicably:
But let's think about recording those sounds with a single microphone.

Wait- what?!?! 

Need I say more? Really?

Yeah, yeah. I could answer the one mic question too. But fix the first one first, okay?
Um, first, some instruments don't have a lot of energy in the fundamental.

But otherwise, you may be very interested in Head Related Transfer Functions.

@77jovian You may find the following writeup to be instructive. (Coincidentally, btw, as you had also done it uses the example of a flute for illustrative purposes):


Note particularly the figure in the section entitled "Spectra and Harmonics," which depicts the spectrum of a note being played by a flute.

To provide context, a continuous pure sine wave at a single frequency (which is something that cannot be generated by a musical instrument) would appear on this graph as a single very thin vertical line, at a point on the horizontal axis corresponding to the frequency of the sine wave.

The left-most vertical line in the graph (at 400 Hz) represents the "fundamental frequency" of the note being played by the flute. The vertical lines to its right represent the harmonics. The raggedy stuff at lower levels represents the broadband components I referred to earlier. Note this statement in the writeup:

... the spectrum is a continuous, non-zero line, so there is acoustic power at virtually all frequencies. In the case of the flute, this is the breathy or windy sound that is an important part of the characteristic sound of the instrument. In these examples, this broad band component in the spectrum is much weaker than the harmonic components. We shall concentrate below on the harmonic components, but the broad band components are important, too.

Now if a second instrument were playing at the same time, the combined spectrum of the two sounds at a given instant would look like what is shown in the figure for the flute, plus a number of additional vertical lines corresponding to the fundamental and harmonics of the second instrument, with an additional broadband component that is generated by the second instrument summed in. ("Summed" in this case refers to something more complex than simple addition, since timing and phase angles are involved; perhaps "combined" would be a better choice of words). And since when we hear those two instruments in person our hearing mechanisms can interpret that complex spectrum as coming from two different instruments, to the extent that information is captured, preserved, and reproduced accurately in the recording and playback processes our hearing mechanisms will do the same when we hear it in our listening room.

Best regards,
-- Al

So, I’ll ask again, how is the audio signal in cables and electronics affected by external forces such as RF and vibration as well as by better cables? And what IS the audio signal? Anybody! Is it electrons? Photons? Current? Voltage? An electromagnetic wave? Something else? That’s really what the OP is talking about. Don’t be shy!
Post removed 

As per Geoff’s questions:

Sometimes I think of a given system hardware as a ‘doorway’ that the signal will try to go through unscathed, but is compromised along the way by 2 things at least: noise and distortion. Sometimes I wonder what it might be like if we could, say, just flip a switch and reduce All noise that could actually affect the signal, regardless of source, by an infinite amount and listen to the result. I imagine everybody, if they could do it, would be like blown away at not only the quality of reproduction, but also struck I think by how Everything sounds the Same (no more obvious differences anymore between brands, price ranges, tubes and ss, digital and analog, wiring, fuses, directionality of same, etc, etc)...I’m thinking it might all sound amazing and all of it sound overwhelmingly similar in doing so...far more like the real thing and all that.

I just know I can’t prove it, lol. But, the thought does keep coming back to me on occasion.

Q: Can mega-expensive speaker wires that have been successfully implemented into a system, for example, be thought of as simply the result of a happy, ‘random accident’ of the interplay between system, wires and (most importantly here) *noise*? IOW, could the idea of eliminating all noise mean the elimination of any need by anyone for pricey wires at all? AND, can these same cables then be thought of as justified only in systems **that are dominated by noise** (currently all systems)?

AFAIC, you’re asking the right questions, I just don’t have the right answers.

Roberttcan 10-26-2019
I don't think that is what the OP is talking about at all. Not even close.

I could be wrong, but I am pretty sure the op is asking how just one amplitude varying signal in the time domain can represent a whole orchestra and all its instrument.

The answer to that is it doesn't convey the whole orchestra, the brain extracts all the instruments out of the signal based on pattern recognition.


I too am pretty certain that Geoff's questions have nothing to do with what the OP is asking.

-- Al

I could be wrong, but I don’t think that is what the OP is talking about at all. Not even close.

I could be wrong, but I am pretty sure the op is asking how just one amplitude varying signal in the time domain can represent a whole orchestra and all its instruments.

The answer to that is it doesn’t convey the whole orchestra, the brain extracts all the instruments out of the signal based on pattern recognition.

>>>>That is patently absurd. Yes, you’re right, you could be wrong.
The poor guy asked how we're able to reproduce the sound of an original performance so well that we're able to hear not only the original performers in their original locations but the room acoustics as well. That was the question, right?

Why then are we off onto electrons and RFI?

The chief offender loves to misquote Richard Feynman, who once actually did say not being able to explain something in simple terms means you really don't understand it yourself.

Of course before answering any question it always helps to know just what the question is. So how about it? Maybe rethink your question, figure out what exactly it is you really want to know, and then ask it. Soon. Before the loon gets us off into morphic fields and space travel.
No, actually that’s not what Richard Feynman famously said. That’s not even close. What he said was,

“If I could explain it to the average person they wouldn’t have given me a Nobel prize.”

I hate to judge before all the facts are in but it certainly looks like millercarbon is the average person he was talking about. 🤡
Post removed 

The next step in this, is the issue of a full-range single drive unit (per channel) v. two, three or more drivers.  The explanations given by almarg and gs5556 point to why it's difficult to design a fully satisfactory single unit, though multiple drivers have a different set of downsides, which go beyond the purview of this thread.

Talking of purviews, gk is clearly trying to hijack yet another thread for his own inscrutable purposes.

Thank you for the very helpful comments. 

Here's what I distill from the comments.  The graph of a waveform with axes of frequency and amplitude can describe a sound, but it's the changes in the waveform over time that provide meaning to the sound.  So the envelope of a note over time that includes, perhaps, a percussive attack and it's changing frequencies and their amplitudes as the note decays that provide clues to what instrument is making the sound.  And the listener's brain recognizes the "flute-iness" of the sound over time, or whatever instrument is playing.
OK, I get that.  But I'm not sure that adequately explains how we discern multiple instruments from a single changing waveform or the position of the instruments in 3D space.  I'll have to think more about that.
I acknowledge those of you who have pointed out the role of two ears spaced apart from each other.  But I think there must be more to it.  After all, there are guys like David Pack, a wonderful musician and record producer who has been totally deaf in one ear for the last 40 years.  His work is not devoid of spatial information.  Nor is a single, full-range speaker incapable of all spatial presentation.

Apologies to those of you who think this is a simple-minded discussion.  I find it deeply profound and an astonishingly complex interaction of physical and cognitive phenomena, worthy of reflection.  Well, there are those who stand at the rim of the Grand Canyon and think, "hey, it's a water-carved canyon, so what?"
This discussion reminds me how awesome is technology that is capable of reproducing this extraordinarily intricate physical process to a degree that sublime enjoyment is possible. 

Conversely, it's equally awesome to realize that the brain is capable of perceiving and appreciating this magic from the cheapest,]flimsiest nickel-sized speaker in a cheap cell phone.  Maybe some people who are satisfied with the sound from their cell phone just have vivid pattern-recognition skills.

Finally, as to the ad hominem posts, you should be ashamed.

Post removed 
If one were take someone from the hinterlands who’d never heard anything but natural sounds and played them the finest HiFi in the world, they would have no idea what so ever that the sound was anything but some chromy illuminated beast.

If they then heard the same piece in another location with different hardware but same distortions, they would be astounded that two such beasts had the same song.

As usual, Mr Kait gets it wrong. What Mr. Feynman actually said is “Hell, if I could explain it to the average person, it wouldn’t have been worth the Nobel prize.”

Oh, and Kait's questions are so far off the mark that one has to wonder about everything he ever writes.
That is what I said, Mr. Eels. Try to calm down. 🤪 Ever heard of Valium?

But getting back to the question posed by the OP, what the speakers produce is a function of the electronics, cabling, power cord, fuse, room treatments - everything. So obviously the ability to produce the full orchestra with all the details including the venue acoustic information in a coherent audio waveform without the usual distortion and noise is a huge challenge. I know what some of you are thinking - What noise and distortion? 😳
Post removed 
I know that’s what you’re thinking. You’re wrong. As usual these days, if you don’t mind my saying so too much, Mr. Bluster. 🤡
Post removed 
Post removed 
Getting back to the actual issue at hand, the microphone is as "miraculous" (or more so) than the driver--it registers all the different characteristics of all the instruments playing at one moment, and sends that down the chain.  Ultimately, the speaker decodes that.  If the capability of the microphone is understood, that surely sheds light, in reverse, on the driver.
Microphones suffer from the same problems as loudspeakers:
Non-linear response, resonances, temperature sensitivity, aging, power handling, etc.

The microphone attempts to create an electrical analog of pressure changes on the diaphragm just as the loudspeaker attempts to create a pressure analog of the electrical signal on its terminals.