This repository explains how to build a search engine for keywords in a video using google speech api and Elasticsearch.
You would need a google api credentials, and an Elastic Search (version higer than 7.4) server running either in remote or in local
The process is split into 2 scripts witten in Python3:
- transcribe.py: This module downloads a video from youtube, transforms the video in the correct format then stores it in a google bucket and then performs the transcription, returning a json file with words and their time ocurrence within the video.
- bulkinjectionES.py: This module gets the result (in a json file) and injects it into an ElasticSearch server.
Then, you can enable a "google like" search with completion. This is done through the "magic" of elasticsearch, modelling the index as follows:
mapping={
"mappings": {
"properties": {
"start" : {
"type" : "float"
},
"timestamp" : {
"type" :"date"
},
"word" : {
"type" : "search_as_you_type",
"max_shingle_size" : 3
}
}
}
}
I spent a considerable big amount of time downloading and setting the correct modules. Google suggests to build an environement in order to install speech api mudules. You will find a requirements.txt file within the repo which may help you to install everything you need to run the above scritps.