Skip to content

This repository explains how to build a search engine for keywords in a video using google speech api and Elasticsearch

Notifications You must be signed in to change notification settings

javierdejuan/keyword-video-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

search engine for words in a video (python)

This repository explains how to build a search engine for keywords in a video using google speech api and Elasticsearch.

Requirements

You would need a google api credentials, and an Elastic Search (version higer than 7.4) server running either in remote or in local

Process

The process is split into 2 scripts witten in Python3:

  • transcribe.py: This module downloads a video from youtube, transforms the video in the correct format then stores it in a google bucket and then performs the transcription, returning a json file with words and their time ocurrence within the video.
  • bulkinjectionES.py: This module gets the result (in a json file) and injects it into an ElasticSearch server.

Then, you can enable a "google like" search with completion. This is done through the "magic" of elasticsearch, modelling the index as follows:

  mapping={
        "mappings": {
            "properties": {
             "start" : {
                "type" : "float"
                    },
             "timestamp" : {
              "type" :"date"
            },
         "word" : {
                "type" : "search_as_you_type",
                "max_shingle_size" : 3
            }
        }
       }
     }

Python Modules needed

I spent a considerable big amount of time downloading and setting the correct modules. Google suggests to build an environement in order to install speech api mudules. You will find a requirements.txt file within the repo which may help you to install everything you need to run the above scritps.

About

This repository explains how to build a search engine for keywords in a video using google speech api and Elasticsearch

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages