This repository contains scripts for extracting keyframes from video files, extracting features using a Vision Transformer (ViT) model, and utilizing a Long Short-Term Memory (LSTM) network for classification.
The key_frame_extraction.py
script extracts keyframes from video files. Keyframes are sampled from the video, either by duplicating frames for videos with fewer frames than required or by extracting exactly n
keyframes for larger videos.
-
Set the
video_path
variable in the script to the path of your video file. -
Run the script:
python key_frame_extraction.py
The image_feature_extraction_with_ViT.py
script extracts features from image frames using a pre-trained Vision Transformer (ViT) model. The script utilizes the timm
library for model creation.
-
Set the
path
variable in the script to the path of your image file. -
Adjust the
image_size
variable as needed. -
Run the script:
python image_feature_extraction_with_ViT.py
The lstm.py
script uses an LSTM network for classification based on features extracted from keyframes. It loads features from CSV files, preprocesses the data, builds an LSTM model, trains the model, evaluates its performance, and saves the model for future use.
-
Ensure CSV files with extracted features are available in the specified
folder_path
. -
Run the script:
python lstm.py
- Python
- Libraries:
numpy
,pandas
,keras
,scikit-learn
,matplotlib
,seaborn
,timm
You can install the required libraries using pip:
pip install keras opencv-python numpy matplotlib seaborn pandas scikit-learn timm