This project hosts the code for implementing the PooDLe (ICLR 2025) framework for self-supervised learning from videos.
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang*, Christopher Hoang*, Yuwen Xiong, Yann LeCun, Mengye Ren
International Conference on Learning Representations 2025
arXiv preprint (arXiv 2408.11208)
| model | resolution | epochs | data | download | |
|---|---|---|---|---|---|
| PooDLe | 512x1024 | 100 | BDD100K | full checkpoint | configs |
| PooDLe | 512x1024 | 10 | Walking Tours Venice | full checkpoint | configs |
| PooDLe | 512x1024 | 20 | Walking Tours All | full checkpoint | configs |
| FlowE | 512x1024 | 100 | BDD100K | full checkpoint | configs |
.
├── configs # directory in which all experiment '.yaml' configs are stored
├── src # the package
│ ├── train.py # main training loop for poodle
│ ├── train_uflow.py # main training loop for unsupervised flow model
│ ├── datasets # datasets, data loaders
│ ├── models # model definitions
│ ├── routines # additional training routines
│ └── utils # shared utilities
└── main.py # entrypoint for launch PooDLe pre-training locally or SLURM cluster
Config files: Note that all experiment parameters are specified in config files (as opposed to command-line-arguments). See the configs/ directory for example config files.
main.py is an entrypoint script for launching experiments with submitit and hydra. The actual implementation is in src/train.py, which parses the experiment config file and runs the PooDLe pre-training.
Here is an example of how to run ablation-sized PooDLe pre-training on a local, 2 GPU machine with config configs/exp/poodle_ablation.yaml:
export CUDA_VISIBLE_DEVICES=0,1
torchrun --standalone --nnodes=1 --nproc-per-node=2 main.py \
exp=poodle_ablation \
name='poodle-ablation-bdd100k'
Note: This example is just for illustrative purposes. The full PooDLe config should be run for an effective batch-size of 128, in order to reproduce our results.
We use MMSegmentation to evaluate on semantic segmentation and MMDetection to evaluate on object detection.
Their tools will work out-of-the-box for UperNet and ResNet encoder-only evaluations. A custom model file must be used running linear evaluations with the SDM as the architecture changes.
We train a UFlow-based model with a PWC backbone for our ablation experiments. The model is first trained on KITTI data using the command
python main.py exp=uflow_pwc name='uflow_pwc-kitti'
Then we further train it on BDD
python exp=uflow_pwc_bdd name='uflow_pwc-kitti-bdd' \
occ_start_epochs=0 selfsup_start_epochs=0 selfsup_warmup_epochs=1 \
warmup_epochs=20 \
lr_scheduler=cosine \
resume='"PATH-TO-YOUR-UFLOW-CKPT"'
Following, you can train a PooDLe with this flow model with the following command
torchrun --standalone --nnodes=1 --nproc-per-node=2 main.py \
exp=poodle_ablation \
name='poodle_uflow-ablation-bdd100k' \
model=poodle_uflow \
+model_configs.flow_model_checkpoint_uflow='"PATH-TO-YOUR-UFLOW-CKPT"'
- Python 3.10 (or newer)
- PyTorch 2.2.0
- torchvision 0.17.1 (build from source, for video_reader)
- ffmpeg 5.1.2 (from conda-forge, for video_reader)
- spatial-correlation-sampler (build from source, only needed for unsupervised flow model)
- Other dependencies: decord, ffprobe-python, flow-vis, hydra-core, kornia, numpy, scipy, timm==0.3.2, wandb
Importing this version of timm will raise an import error, see here for a fix.
See the LICENSE file for details about the license under which this code is made available.
If you find this repository useful in your research, please consider giving a star ⭐ and a citation:
@inproceedings{wang_hoang:2025:poodle,
title={PooDLe: Pooled and dense self-supervised learning from naturalistic videos},
author={Alex N. Wang and Chris Hoang and Yuwen Xiong and Yann LeCun and Mengye Ren},
booktitle={International Conference on Learning Representations},
year={2025}
}