FFT is a mathematical transformation from time to frequency domain. Our ears respond to frequencies over time. The details of how they do that is a whole different story. chervokas seems to have a good handle on it.
Not to derail the thread, but one of the cool things about our hearing that's different than a Fourier Transform is that, yes, we process the separate frequency components of the complex wave separately, but we process them at the same time that we process the timing, because, at least up to about 5kHz, the nerve impulses generated by the movement of the stereocilia are phase locked to the input wave, so our hearing is simultaneously using information about which location on the basilar membrane/which hair cells/which neurons are being activated and also the phase locked/timing neural spike pattern of that activity.

