A few thoughts which help explain this for me.
Part of what's going on is that repeated listening gradually shifts what the recording is for you. Mechanical reproduction doesn't just copy an original—it creates a new object with its own presence. An audiophile who has heard the 1955 Gould Goldbergs hundreds of times isn't consulting a document of a performance; that recording, sonic grain and all, is the musical thing they know.
This makes sense perceptually. You can't actually hear "the interpretation" apart from the sonic medium delivering it. Tempo, phrasing, dynamics—all of that reaches you through microphone placement, room acoustics, and mastering decisions. A warm, rolled-off analog recording doesn't just make a Bach cello partita sound different; it shapes which expressive qualities you detect in the first place. The sound isn't a window onto the music, it's constitutive of what you hear.
The exceptions —horrid performance in great sound, legendary performance in poor sound—are telling. At the extremes, one dimension dominates and the other recedes. But most recordings occupy a wide middle range of competent-to-excellent performances in competent-to-excellent sound, and there the two dimensions genuinely interact.
Add to these what happens with serious comparative listening: once you've heard twenty versions of the same concerto, you start noticing that your preferences track sonics as much as interpretation. At that point, admitting that sonic character is doing real evaluative work—not just providing transparent access to some purer aesthetic object hiding behind it – is hard to avoid as the conclusion.

