All you have to do is capture the audio at the listening position and make an audio recording, embedded in the video.
Are you suggesting you capture audio then dub the video with it? I thought I’d ask, because I tried video-ing my system and the playback sounded like absolute trash, even played through the very same system. I just used the stock photo app on my Pixel. Apologies if this is an inane question.

