The first musical instrument is our gesturing body...
Our body gesture on the members scale and on the throat/mouth scale ...
The two gesture are synchronized then as frogman said the rythm is fundamental...
The rythm is not merely something flowing in physical time but something creating his own time dimension...
The fist musical instrument is not physical object but body parts synchronising in something which is not speech as we know it now nor singing as we know it now in a separate way but the two as one...
Two feet and legs can synchronise with a bone sticking etc
Speech and music were conjoined twin never naturally separated but artificially separated by specialization...
it is why poetry register made us conscious about the deep root of language in music ...
Prose register is only the peak of language iceberg...
Methodologically Saussure advocated for the arbitrary of signs maxim , but he guessed that sounds in language are also motivated by meaning in his study about onomatopea...
Language is way less known than our science think it is...
The greatest linguist since Panini is not even translated in English by the way : Gustave Guillaume which opuses goes near 30 volumes and more to come in edition right now ... ( i studied it 35 years ago )
In the same way acoustics is a deep science which revolution is ongoing right now...
but all this is out of topic here ...