Skip to content

agentic-learning-ai-lab/poodle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PooDLe: Pooled and dense self-supervised learning from naturalistic videos

This project hosts the code for implementing the PooDLe (ICLR 2025) framework for self-supervised learning from videos.

PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang*, Christopher Hoang*, Yuwen Xiong, Yann LeCun, Mengye Ren
International Conference on Learning Representations 2025
arXiv preprint (arXiv 2408.11208)

Pretrained models

model resolution epochs data download
PooDLe 512x1024 100 BDD100K full checkpoint configs
PooDLe 512x1024 10 Walking Tours Venice full checkpoint configs
PooDLe 512x1024 20 Walking Tours All full checkpoint configs
FlowE 512x1024 100 BDD100K full checkpoint configs

Code Structure

.
├── configs                   # directory in which all experiment '.yaml' configs are stored
├── src                       # the package
│   ├── train.py              #   main training loop for poodle
│   ├── train_uflow.py        #   main training loop for unsupervised flow model
│   ├── datasets              #   datasets, data loaders
│   ├── models                #   model definitions
│   ├── routines              #   additional training routines
│   └── utils                 #   shared utilities
└── main.py         # entrypoint for launch PooDLe pre-training locally or SLURM cluster

Config files: Note that all experiment parameters are specified in config files (as opposed to command-line-arguments). See the configs/ directory for example config files.

Launching PooDLe pre-training

main.py is an entrypoint script for launching experiments with submitit and hydra. The actual implementation is in src/train.py, which parses the experiment config file and runs the PooDLe pre-training.

Training

Here is an example of how to run ablation-sized PooDLe pre-training on a local, 2 GPU machine with config configs/exp/poodle_ablation.yaml:

export CUDA_VISIBLE_DEVICES=0,1
torchrun --standalone --nnodes=1 --nproc-per-node=2 main.py \
exp=poodle_ablation \
name='poodle-ablation-bdd100k'

Note: This example is just for illustrative purposes. The full PooDLe config should be run for an effective batch-size of 128, in order to reproduce our results.

Evaluation

We use MMSegmentation to evaluate on semantic segmentation and MMDetection to evaluate on object detection.

Their tools will work out-of-the-box for UperNet and ResNet encoder-only evaluations. A custom model file must be used running linear evaluations with the SDM as the architecture changes.


Pretraining with unsupervised flow model

We train a UFlow-based model with a PWC backbone for our ablation experiments. The model is first trained on KITTI data using the command

python main.py exp=uflow_pwc name='uflow_pwc-kitti'

Then we further train it on BDD

python exp=uflow_pwc_bdd name='uflow_pwc-kitti-bdd' \
    occ_start_epochs=0 selfsup_start_epochs=0 selfsup_warmup_epochs=1 \
    warmup_epochs=20 \
    lr_scheduler=cosine \
    resume='"PATH-TO-YOUR-UFLOW-CKPT"'

Following, you can train a PooDLe with this flow model with the following command

torchrun --standalone --nnodes=1 --nproc-per-node=2 main.py \
  exp=poodle_ablation \
  name='poodle_uflow-ablation-bdd100k' \
  model=poodle_uflow \
  +model_configs.flow_model_checkpoint_uflow='"PATH-TO-YOUR-UFLOW-CKPT"'

Requirements

  • Python 3.10 (or newer)
  • PyTorch 2.2.0
  • torchvision 0.17.1 (build from source, for video_reader)
  • ffmpeg 5.1.2 (from conda-forge, for video_reader)
  • spatial-correlation-sampler (build from source, only needed for unsupervised flow model)
  • Other dependencies: decord, ffprobe-python, flow-vis, hydra-core, kornia, numpy, scipy, timm==0.3.2, wandb

Importing this version of timm will raise an import error, see here for a fix.


License

See the LICENSE file for details about the license under which this code is made available.

Citation

If you find this repository useful in your research, please consider giving a star ⭐ and a citation:

@inproceedings{wang_hoang:2025:poodle,
  title={PooDLe: Pooled and dense self-supervised learning from naturalistic videos}, 
    author={Alex N. Wang and Chris Hoang and Yuwen Xiong and Yann LeCun and Mengye Ren},
  booktitle={International Conference on Learning Representations},  
  year={2025}
}

About

PooDLe: Pooled and dense self-supervised learning from naturalistic videos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published