Basic measurements are only a benchmark, an objective standard, but how something SOUNDS is purely subjective and has to take into account intangibles like combined elements in the system, the room acoustics, speaker placement, and the listener, right?
Yes and no. In most cases, audio gear is designed to have its own performance or you would never be able to assemble any system. Or trust any reviewer whatsoever, right? For example, a source device such as a DAC has a low impedance of say 100 to 200 Ohm. The pre-amp then has an impedance of at least 10X that. This way you get full voltage transfer which is what we want.
On the other hand, there are tube amps with high output impedance which then interact with the frequency response of the speaker. This causes tonality to shift in the system not because of anything good, but because of poor design. A solid state amp will have well below 1 ohm impedance as to eliminate this effect. You could hear these effects using either listening tests (if the person is properly trained and difference large enough), or measurements.
In vast majority of cases though, the modular aspect of audio allows us to independently test and evaluate a component by itself. Measurements are much more powerful in this regard because audiophiles as a group are terrible at detecting non-linear artifacts. But even for things like speakers where distortions are apparent, we are a) mostly alike when it comes to preferences and b) non-professionally trained listeners including reviewers and dealers are terrible at providing consistent and proper feedback. Please see this formal study:
Indeed, the subjective data from audio reviewers is so bad that you need 10 times as many of them to equal properly and formally trained speaker listeners! That is the problem with subjective remarks from audiophiles or audiophile press. It is so unreliable that it is not worth paying attention to. The same study by the way shows that listener preference is similar among a dozen different listener classes:
See how the ranking of each speaker did not matter (different colors) regardless of who was listening to it (X axis). Green speaker for example was bad no matter who was evaluating it, in controlled tests that is.
If you want to see a more detailed explanation of that, I have done a video on it: