I have 2 big head-scratchers with your assessments.
1) Your premise is that hearing is unreliable, but you used hearing in a test as the final conclusion.
2) Again, "hearing is unreliable" and yet the changes heard usually follow quite a strict guideline. I can list out these guidelines again but I’ve done a couple of times already. Look for #1-7 above.
Example. Why does Class A amplifier sound best after 120 mins, and not class D.
You are making a critical mistakr of detective work, that is following the clues that support your theory instead of changing theory to fit the evidence.

