The output_sound.mp4 is the solution for webcam video part of TASK 3.
The given video was processed first by boosting monocular depth network which was very much time consuming(1 hour for every 5 seconds of video). Hence I had to interrupt it and compile just that video file of 12sec (took me more than 2 hours). Later a faster (and obviously of lower quality) video was obtained using MiDaS to process video by splitting it into images.
The 12sec video is named as Given_video_high_quality_mapped_12sec.mp4
The full video is named as Given_Video_outputMerged.mp4
UPDATE: The full video exceeded size limits of github..... So kindly view the video using the link: https://drive.google.com/file/d/1bG-9uoWxKGVj72ApuHcXcFvl_L73cYfw/view?usp=sharing Sorry for the inconvenience!!