A prototype implementation of VOCAL-UDF, which is a self-enhancing video data management system that empowers users to flexibly issue and answer compositional queries, even when the modules necessary to answer those queries are unavailable. See the technical report for more details.
The project uses conda
to manage dependencies. To install conda, follow the instructions here.
# Clone the repository
git clone https://github.com/uwdb/VOCAL-UDF.git
cd VOCAL-UDF
# Create a conda environment (called vocal-udf) and install dependencies
conda env create -f environment.yml --name vocal-udf
conda activate vocal-udf
python -m pip install -e .
To use OpenAI models, follow the instructions here to create and export an API key.
# Export the API key as an environment variable
export OPENAI_API_KEY="your_api_key_here"
Run the following command to export the project root directory:
export PROJECT_ROOT="$(pwd)"
Download the CLIP model (optionally, the Llava model) from HuggingFace and save them to data/models/
by running the following command:
# Uncomment the corresponding lines in data/models/save_model.py to download the models you need
python data/models/save_model.py
- Download the CLEVRER dataset from here. Place the videos in
data/clevrer/
and unzip.
# Download the dataset using command line
cd data/clevrer
curl -L -O http://data.csail.mit.edu/clevrer/videos/train/video_train.zip
unzip video_train.zip
- Extract the frames from the videos using the following command. This will create a
video_frames
directory indata/clevrer/
.
cd data/clevrer
python extract_frames.py
- Prepare the database. Download the processed annotations from here and place them in
duckdb_dir/
. - Create relations and load data into the database.
cd duckdb_dir
python load_clevrer.py
- Extract the features from the frames using the following command. This will create a
features/clevrer_three_clips
directory induckdb_dir
.
cd featurestore
# Extract attribute features (about 16GB)
python extract_clevrer.py --method "attribute"
# Extract relationship features (about 113GB)
python extract_clevrer.py --method "relationship"
- Obtain the CityFlow-NL dataset from here (i.e., 2023 Track 2; For more information, see here). Place the videos in
data/cityflow/
. Next, runpython extract_vdo_frms.py
to extract the frames from the videos. The file structure should look like this:
data/cityflow/data/
├── extract_vdo_frms.py
├── test-queries.json
├── train-tracks.json
├── test-tracks.json
├── train/
│ ├── S01/
│ │ ├── c001/
│ │ │ ├── calibration.txt
│ │ │ ├── det/
│ │ │ ├── gt/
│ │ │ ├── img1/
│ │ │ ├── mtsc/
│ │ │ ├── roi.jpg
│ │ │ ├── segm/
│ │ │ └── vdo.avi
│ │ ├── c002/...
│ │ ├── ...
│ ├── S03/...
│ ├── S04/...
└── validation/...
- Prepare the database. Download the processed annotations from here and place them in
duckdb_dir/
. - Create relations and load data into the database.
cd duckdb_dir
python load_cityflow.py
- Extract the features from the frames using the following command. This will create a
features/cityflow_three_clips
directory induckdb_dir
.
cd featurestore
# Extract attribute features
python extract_cityflow.py --method "attribute"
# Extract relationship features
python extract_cityflow.py --method "relationship"
- Download the Charades dataset (scaled to 480p) from here. Place the videos in
data/charades/
and unzip.
# Download the dataset using command line
cd data/charades
curl -L -O https://ai2-public-datasets.s3-us-west-2.amazonaws.com/charades/Charades_v1_480.zip
unzip Charades_v1_480.zip
- Download
frame_list.txt
from Action Genome annotations (here). Place the annotations indata/charades/
. - Extract the frames from the videos using the following command. This will create a
frames
directory indata/charades/
.
cd data/charades
python dump_frames.py
- Prepare the database. Download the processed annotations from here and place them in
duckdb_dir/
. - Create relations and load data into the database.
cd duckdb_dir
python load_charades.py
- Extract the features from the frames using the following command. This will create a
features/charades_five_clips
directory induckdb_dir
.
cd featurestore
# Extract relationship features (about 15GB; takes around 2 hours). Charades has no attribute features
python extract_charades.py --include_text_features
We provide an example of how to use VOCAL-UDF to process a query with three missing UDFs on the CLEVRER dataset.
- Generate UDFs
python experiments/async_main.py \
--num_missing_udfs 3 \
--run_id 0 \
--query_id 0 \
--dataset "clevrer" \
--query_filename "3_new_udfs_labels" \
--budget 20 \
--n_selection_samples 500 \
--num_interpretations 10 \
--allow_kwargs_in_udf \
--program_with_pixels \
--num_parameter_search 5 \
--num_workers 8 \
--save_labeled_data \
--n_train_distill 100 \
--selection_strategy "both" \
--llm_method "gpt" \
--is_async \
--openai_model_name "gpt-4o"
- Execute query with new UDFs
python experiments/run_query_executor.py \
--num_missing_udfs 3 \
--run_id 0 \
--query_id 0 \
--dataset "clevrer" \
--query_filename "3_new_udfs_labels" \
--budget 20 \
--n_selection_samples 500 \
--num_interpretations 10 \
--allow_kwargs_in_udf \
--program_with_pixels \
--num_parameter_search 5 \
--num_workers 8 \
--n_train_distill 100 \
--selection_strategy "both" \
--pred_batch_size 4096 \
--dali_batch_size 1 \
--llm_method "gpt"
The experiment scripts are located in the scripts/experiments
directory. See experiments/README.md
for more details.
We provide a Command-Line Interface (CLI) to process your own queries over the CLEVRER, CityFlow-NL, and Charades datasets.
- Run the following command to specify the dataset (
--dataset
) and hyperparameters:
python experiments/cli.py \
--dataset cityflow \
--allow_kwargs_in_udf \
--num_parameter_search 5 \
--budget 20 \
--n_selection_samples 500 \
--num_interpretations 10 \
--num_workers 8 \
--n_train_distill 500 \
--selection_strategy both \
--llm_method gpt \
--is_async \
--openai_model_name gpt-4o
- In the terminal, you will be prompted to enter your query in natural language and then a list of UDFs that are available to answer the query. Some good examples to start with:
Dataset | Query | List of available UDFs (indices) |
---|---|---|
CLEVRER | A cyan-colored object o1 is in front of a cylinder o2, then o1 moves to be behind and close to o2. | 1, 14, 19 |
CityFlow-NL | A red car is initially in front of another car, then drives to be behind and near the car | 2, 9 |
-
Then, VOCAL-UDF will start to process the query and generate the missing UDFs when needed. You will see the progress in the terminal. During the UDF selection stage, you will be prompted to provide labels for system-selected frames.
-
Once the query is processed, VOCAL-UDF will output all matching vids in the terminal. For CityFlow-NL and Charades datasets, to map the vid to the corresponding video file, please refer to the
duckdb_dir/{dataset}_metadata.csv
file.
TODO