He also knew that a new piece of equipment might sound spectacular at the onset, only to become fatiguing after a few hours or even days, no matter how "good" the measured data were.
This fatigue aspect is an issue with audio.. Not just with audio, but I digress.
This is where describing measurements as good, bad, or anything else is incorrect. It is data.
What can be read into the data matters.
The characteristics of amplification which contribute to fatigue may be measured and therefore predicted.
You mention THD amongst other things - yes, and some aspects are pleasing, and others are grating to the brain. (And some serve to mask certain issues in the recording process, but that’s another topic).
This is perhaps one reason why measurements are preferred over blind testing..
As for blind testing, however messy it is even at the best of times, I would lower the threshold to exclude the enjoyment or pleasing factor. Does it sound different? is a more realistic objective.
Measurements provided by Amir indicate to me which bits of gears I may or may not enjoy owning. Others have different preferences. The data in itself is neither good or bad - it is information.