Textube

This project is a web application that allows users to perform searches within media files (audio or video) transcriptions. The search functionality supports fuzzy matching, stemming, and exact phrase matching.

Features

Extract transcriptions from YouTube videos or uploaded media files.
Store transcriptions and their timestamps in Elasticsearch.
Perform advanced text search with:
- Fuzzy matching
- Stemming and stop word removal
- Exact phrase matching
Highlight matching words in search results.

Technologies Used

Backend: Flask
Search Engine: Elasticsearch
Frontend: HTML, CSS, JavaScript
Transcription: Whisper

Installation and Setup

Prerequisites

Ensure you have the following installed:

Python 3.10
Elasticsearch
Flask

Steps to Run the Project

Clone the repository:

git clone https://github.com/your-username/transcription-search-app.git
cd transcription-search-app

Install dependencies:
```
pip install -r requirements.txt
```
Start Elasticsearch (if not running already).
Run the Flask application:
```
python app.py
```
Open your browser and navigate to http://localhost:5000.

API Endpoints

`POST /process`

Description: Processes a video link or uploaded file and indexes its transcription in Elasticsearch.
Parameters:
- videoLink (string): YouTube video URL (optional)
- mp4Upload (file): Uploaded media file (optional)
- sentence (string): Sentence to search within transcriptions

`GET /search`

Description: Searches for a sentence in the indexed transcriptions.
Parameters:
- query (string): The search term

Elasticsearch Indexing

The project uses a custom Elasticsearch index with analyzers:

simple_analyzer: Lowercase only.
custom_analyzer: Lowercase, stopword removal, and stemming.

Future Improvements

Migrate the frontend to React for a better user experience.
Implement a real-time transcription feature.
Enhance search results with word-level timestamps.

License

This project is licensed under the MIT License.

Author: Youxise

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
static		static
templates		templates
.gitattributes		.gitattributes
README.md		README.md
app.py		app.py
search.py		search.py
transcription.py		transcription.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Textube

Features

Technologies Used

Installation and Setup

Prerequisites

Steps to Run the Project

API Endpoints

`POST /process`

`GET /search`

Elasticsearch Indexing

Future Improvements

License

About

Releases

Packages

Languages

Youxise/Textube

Folders and files

Latest commit

History

Repository files navigation

Textube

Features

Technologies Used

Installation and Setup

Prerequisites

Steps to Run the Project

API Endpoints

POST /process

GET /search

Elasticsearch Indexing

Future Improvements

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`POST /process`

`GET /search`

Packages