Show-Segmentation

GSoC 2020 project with Red Hen Lab. The goal is to create an algorithm that can automatically split the videos into named show and dated show. Identify anchor/show names or recognizes what show it is. Using the IMDb dataset (Will discuss in the proposal) to identify show names from anchor names (identified using MSCeleb). We Split the newsscape videos into smaller segments, one for each TV show, with annotations of show title, channel, time, date. Currently, the most time-consuming process in the program is that of going frame by frame and extracting faces. We can speed up using multi-threading. More information about the project statement can be found here.

Mentors: Sasi Kiran, Francis Steen

Blog detailing the research and working can be found at edoates84.github.io

Usage

Clone the repo to your machine

git clone https://github.com/EdOates84/Show-Segmentation-2020.git

Install the required python packages using either of these commands

pip install numpy pandas matplotlib opencv-python scikit-learn face_recognition wikipedia

pip install -r requirements.txt

Download the anchors-encodings pickle and place it in this location.

Show-Segmentation-2020/final_celeb_detection/final_pickles/anchors-with-TV-encodings.pickle

Download the IMDB Datasets and place it in this location.

Show-Segmentation-2020/IMDB_Datasets/name.basics.tsv
Show-Segmentation-2020/IMDB_Datasets/title.basics.tsv
Show-Segmentation-2020/IMDB_Datasets/title.principals.tsv
Show-Segmentation-2020/IMDB_Datasets/name.akas.tsv
Show-Segmentation-2020/IMDB_Datasets/name.crew.tsv

Navigate to ShowSegmentation/final_usable_code/

cd Show-Segmentation-2020/final_usable_code

segment_video.py takes 3 inputs, the path to the input video, path to the output location and a flag --verbose.

python3 segment_video.py path/to/input/video.mp4 path/to/store/output --verbose

Make sure that the input video's name follows RedHenLab's Rosenthal dataset's format. Here's an example

1980-06-03_0000_US_00020088_V0_U2_M9_EG1_DB.mp4

Singularity Usage

Setup

**This is for those using the singularity image (segmentation_production.simg) on the CWRU HPC Cluster.

Connect to the CWR VPN.
Login to the cluster using your CWR ID and your credentials. Example:

ssh [email protected]

Navigate to the project's location on the cluster.

cd /mnt/rds/redhen/gallina/Singularity/Show_Segmentation_2020/final_usable_code

Request a computing node using

srun --mem=16gb --pty /bin/bash

Load singularity 2.5.1 to your environment using

module load singularity

I have made segment_video.py for testing and segment_Rosenthal.py for final production. After setup, read the Testing section or the Production section according to the requirement.

Testing

segment_video.py is made to work on a single video file. It takes 3 inputs (in this order)

path/to/input/video.mp4
path/to/output/directory (where the output will be stored)
--verbose (an optional flag which will make the program print progress statements like 'done extracting faces', 'done clustering faces' etc.)

The main command is of the form

singularity exec -B /mnt ../show_segmentation_2020.img python3 segment_video.py {INPUT_VIDEO_PATH} {OUTPUT_PATH} {--verbose}

The Rosenthal dataset is present at /mnt/rds/redhen/gallina/Rosenthal, we can take some video file from this as our input.
An example command for the file 1998-01/1998-01-01/1998-01-01_0000_US_00019495_V3_VHS50_MB20_H17_WR.mp4 is

singularity exec -B /mnt ../show_segmentation_2020.img python3 segment_video.py /mnt/rds/redhen/gallina/Rosenthal/1998/1998-01/1998-01-01/1998-01-01_0000_US_00019495_V3_VHS50_MB20_H17_WR.mp4 mnt/path/to/output/directory --verbose

Production

segment_Rosenthal.py is made to work recursively on all the video files present in /mnt/rds/redhen/gallina/Rosenthal/ and store the outputs in /mnt/rds/redhen/gallina/RosenthalSplit/
--verbose flag mentioned earlier is set to False by default for production.
Run the script using

singularity exec -B /mnt ../show_segmentation_2020.img python3 segment_Rosenthal.py

Please raise an issue if you run into any errors.

Future work

If possible, replace the current celeb detection method with Azure’s Computer Vision service.
Currently the most time consuming process in the program is that of going frame by frame and extracting faces. This can be speed up using multi-threading or any other means possible.
Explore clustering process and see how to speed up this.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Phase-1		Phase-1
Phase-2		Phase-2
Phase-3		Phase-3
Week- 3-4		Week- 3-4
final_usable_code		final_usable_code
README.md		README.md
Singularity.trial		Singularity.trial
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Show-Segmentation

Blog detailing the research and working can be found at edoates84.github.io

Usage

Singularity Usage

Setup

Testing

Production

Future work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

EdOates84/Show-Segmentation-2020

Folders and files

Latest commit

History

Repository files navigation

Show-Segmentation

Blog detailing the research and working can be found at edoates84.github.io

Usage

Singularity Usage

Setup

Testing

Production

Future work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages