PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Paper | Website

This project hosts the code for implementing the PooDLe (ICLR 2025) framework for self-supervised learning from videos.

PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang*, Christopher Hoang*, Yuwen Xiong, Yann LeCun, Mengye Ren
International Conference on Learning Representations 2025
arXiv preprint (arXiv 2408.11208)

Pretrained models

model	resolution	epochs	data	download
PooDLe	512x1024	100	BDD100K	full checkpoint	configs
PooDLe	512x1024	10	Walking Tours Venice	full checkpoint	configs
PooDLe	512x1024	20	Walking Tours All	full checkpoint	configs
FlowE	512x1024	100	BDD100K	full checkpoint	configs

Code Structure

.
├── configs                   # directory in which all experiment '.yaml' configs are stored
├── src                       # the package
│   ├── train.py              #   main training loop for poodle
│   ├── train_uflow.py        #   main training loop for unsupervised flow model
│   ├── datasets              #   datasets, data loaders
│   ├── models                #   model definitions
│   ├── routines              #   additional training routines
│   └── utils                 #   shared utilities
└── main.py         # entrypoint for launch PooDLe pre-training locally or SLURM cluster

Config files: Note that all experiment parameters are specified in config files (as opposed to command-line-arguments). See the configs/ directory for example config files.

Launching PooDLe pre-training

main.py is an entrypoint script for launching experiments with submitit and hydra. The actual implementation is in src/train.py, which parses the experiment config file and runs the PooDLe pre-training.

Training

Here is an example of how to run ablation-sized PooDLe pre-training on a local, 2 GPU machine with config configs/exp/poodle_ablation.yaml:

export CUDA_VISIBLE_DEVICES=0,1
torchrun --standalone --nnodes=1 --nproc-per-node=2 main.py \
exp=poodle_ablation \
name='poodle-ablation-bdd100k'

Note: This example is just for illustrative purposes. The full PooDLe config should be run for an effective batch-size of 128, in order to reproduce our results.

Evaluation

We use MMSegmentation to evaluate on semantic segmentation and MMDetection to evaluate on object detection.

Their tools will work out-of-the-box for UperNet and ResNet encoder-only evaluations. A custom model file must be used running linear evaluations with the SDM as the architecture changes.

Pretraining with unsupervised flow model

We train a UFlow-based model with a PWC backbone for our ablation experiments. The model is first trained on KITTI data using the command

python main.py exp=uflow_pwc name='uflow_pwc-kitti'

Then we further train it on BDD

python exp=uflow_pwc_bdd name='uflow_pwc-kitti-bdd' \
    occ_start_epochs=0 selfsup_start_epochs=0 selfsup_warmup_epochs=1 \
    warmup_epochs=20 \
    lr_scheduler=cosine \
    resume='"PATH-TO-YOUR-UFLOW-CKPT"'

Following, you can train a PooDLe with this flow model with the following command

torchrun --standalone --nnodes=1 --nproc-per-node=2 main.py \
  exp=poodle_ablation \
  name='poodle_uflow-ablation-bdd100k' \
  model=poodle_uflow \
  +model_configs.flow_model_checkpoint_uflow='"PATH-TO-YOUR-UFLOW-CKPT"'

Requirements

Python 3.10 (or newer)
PyTorch 2.2.0
torchvision 0.17.1 (build from source, for video_reader)
ffmpeg 5.1.2 (from conda-forge, for video_reader)
spatial-correlation-sampler (build from source, only needed for unsupervised flow model)
Other dependencies: decord, ffprobe-python, flow-vis, hydra-core, kornia, numpy, scipy, timm==0.3.2, wandb

Importing this version of timm will raise an import error, see here for a fix.

License

See the LICENSE file for details about the license under which this code is made available.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation:

@inproceedings{wang_hoang:2025:poodle,
  title={PooDLe: Pooled and dense self-supervised learning from naturalistic videos}, 
    author={Alex N. Wang and Chris Hoang and Yuwen Xiong and Yann LeCun and Mengye Ren},
  booktitle={International Conference on Learning Representations},  
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
src		src
.gitignore		.gitignore
.resubmit.sh		.resubmit.sh
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Paper | Website

Pretrained models

Code Structure

Launching PooDLe pre-training

Training

Evaluation

Pretraining with unsupervised flow model

Requirements

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

agentic-learning-ai-lab/poodle

Folders and files

Latest commit

History

Repository files navigation

PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Paper | Website

Pretrained models

Code Structure

Launching PooDLe pre-training

Training

Evaluation

Pretraining with unsupervised flow model

Requirements

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages