|
| 1 | +# Universal Visual Decomposer: <br>Long-Horizon Manipulation Made Easy |
| 2 | + |
| 3 | +<div align="center"> |
| 4 | + |
| 5 | +[[Website]](https://cec-agent.github.io/) |
| 6 | +[[arXiv]](https://zcczhang.github.io/UVD/) |
| 7 | +[[PDF]](https://zcczhang.github.io/UVD/assets/pdf/full_paper.pdf) |
| 8 | +[[Installation]](#Installation) |
| 9 | +[[Usage]](#Usage) |
| 10 | +[[BibTex]](#Citation) |
| 11 | +______________________________________________________________________ |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | +</div> |
| 17 | + |
| 18 | +# Installation |
| 19 | + |
| 20 | +- Follow the [instruction](https://github.com/openai/mujoco-py#install-mujoco) for installing `mujuco-py` and install the following apt packages if using Ubuntu: |
| 21 | +```commandline |
| 22 | +sudo apt install -y libosmesa6-dev libgl1-mesa-glx libglfw3 patchelf |
| 23 | +``` |
| 24 | +- create conda env with Python==3.9 |
| 25 | +```commandline |
| 26 | +conda create -n uvd python==3.9 -y && conda activate uvd |
| 27 | +``` |
| 28 | +- Install any/all standalone visual foundation models from their repos separately *before* setup UVD, in case dependency conflicts, e.g.: |
| 29 | +<details><summary> |
| 30 | +<a href="https://github.com/facebookresearch/vip">VIP</a> |
| 31 | +</summary> |
| 32 | +<p> |
| 33 | + |
| 34 | +```commandline |
| 35 | +git clone https://github.com/facebookresearch/vip.git |
| 36 | +cd vip && pip install -e . |
| 37 | +python -c "from vip import load_vip; vip = load_vip()" |
| 38 | +``` |
| 39 | + |
| 40 | +</p> |
| 41 | +</details> |
| 42 | + |
| 43 | +<details><summary> |
| 44 | +<a href="https://github.com/facebookresearch/r3m">R3M</a> |
| 45 | +</summary> |
| 46 | +<p> |
| 47 | + |
| 48 | +```commandline |
| 49 | +git clone https://github.com/facebookresearch/r3m.git |
| 50 | +cd r3m && pip install -e . |
| 51 | +python -c "from r3m import load_r3m; r3m = load_r3m('resnet50')" |
| 52 | +``` |
| 53 | + |
| 54 | +</p> |
| 55 | +</details> |
| 56 | + |
| 57 | +<details><summary> |
| 58 | +<a href="https://github.com/penn-pal-lab/LIV">LIV (& CLIP)</a> |
| 59 | +</summary> |
| 60 | +<p> |
| 61 | + |
| 62 | +```commandline |
| 63 | +git clone https://github.com/penn-pal-lab/LIV.git |
| 64 | +cd LIV && pip install -e . && cd liv/models/clip && pip install -e . |
| 65 | +python -c "from liv import load_liv; liv = load_liv()" |
| 66 | +``` |
| 67 | + |
| 68 | +</p> |
| 69 | +</details> |
| 70 | + |
| 71 | + |
| 72 | +<details><summary> |
| 73 | +<a href="https://github.com/facebookresearch/eai-vc">VC1</a> |
| 74 | +</summary> |
| 75 | +<p> |
| 76 | + |
| 77 | +```commandline |
| 78 | +git clone https://github.com/facebookresearch/eai-vc.git |
| 79 | +cd eai-vc && pip install -e vc_models |
| 80 | +``` |
| 81 | + |
| 82 | +</p> |
| 83 | +</details> |
| 84 | + |
| 85 | +<details><summary> |
| 86 | +<a href="https://github.com/facebookresearch/dinov2">DINOv2</a> and <a href="https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html">ResNet</a> pretrained with ImageNet-1k are directly loaded via <a href="https://pytorch.org/hub/">torch hub</a> and <a href="https://pytorch.org/vision/main/models/generated/torchvision.models.resnet50.html">torchvision</a>. |
| 87 | +</summary></details> |
| 88 | + |
| 89 | +- Under *this* UVD repo directory, install other dependencies |
| 90 | +```commandline |
| 91 | +pip install -e . |
| 92 | +``` |
| 93 | + |
| 94 | +# Usage |
| 95 | + |
| 96 | +We provide a simple API for decompose RGB videos: |
| 97 | + |
| 98 | +```python |
| 99 | +import torch |
| 100 | +import uvd |
| 101 | + |
| 102 | +# (N sub-goals, *video frame shape) |
| 103 | +subgoals = uvd.get_uvd_subgoals( |
| 104 | + "xxx.mp4", # video filename or (L, *video frame shape) video numpy array |
| 105 | + preprocessor_name="vip", # Literal["vip", "r3m", "liv", "clip", "vc1", "dinov2"] |
| 106 | + device="cuda" if torch.cuda.is_available() else "cpu", # device for loading frozen preprocessor |
| 107 | + return_indices=False, # True if only want the list of subgoal timesteps |
| 108 | +) |
| 109 | +``` |
| 110 | + |
| 111 | +or run |
| 112 | +```commandline |
| 113 | +python demo.py |
| 114 | +``` |
| 115 | +to host a Gradio demo locally with different choices of visual representations. |
| 116 | + |
| 117 | +## Simulation Data |
| 118 | + |
| 119 | +We post-processed the data released from original [Relay-Policy-Learning](https://github.com/google-research/relay-policy-learning/tree/master) that keeps the successful trajectories only and adapt the control and observations used in our paper by: |
| 120 | +```commandline |
| 121 | +python datasets/data_gen.py raw_data_path=/PATH/TO/RAW_DATA |
| 122 | +``` |
| 123 | + |
| 124 | +Also consider to force set `Builder = LinuxCPUExtensionBuilder` to `Builder = LinuxGPUExtensionBuilder` in `PATH/TO/CONDA/envs/uvd/lib/python3.9/site-packages/mujoco_py/builder.py` to enable (multi-)GPU acceleration. |
| 125 | + |
| 126 | + |
| 127 | +## Runtime Benchmark |
| 128 | + |
| 129 | +Since UVD's goal is to be an off-the-shelf method applying to *any* existing policy learning frameworks and models, across BC and RL, we provide minimal scripts for benchmarking the runtime showing negligible runtime under `./scripts` directory: |
| 130 | +```commandline |
| 131 | +python scripts/benchmark_decomp.py /PATH/TO/VIDEO |
| 132 | +``` |
| 133 | +and passing `--preprocessor_name` with other preprocessors (default `vip`) and `--n` for the number of repeated iterations (default `100`). |
| 134 | + |
| 135 | +For inference or rollouts, we benchmark the runtime by |
| 136 | +```commandline |
| 137 | +python scripts/benchmark_inference.py |
| 138 | +``` |
| 139 | +and passing `--policy` for using MLP or causal GPT policy; `--preprocessor_name` with other preprocessors (default `vip`); `--use_uvd` as boolean arg for whether using UVD or no decomposition (i.e. final goal conditioned); and `--n` for the number of repeated iterations (default `100`). The default episode horizon is set to 300. We found that running in the terminal would be almost 2s slower every episode than directly running with python IDE (e.g. PyCharm, under the script directory and run as script instead of module), but the general trend that including UVD introduces negligible extra runtime still holds true. |
| 140 | + |
| 141 | +# Citation |
| 142 | +If you find this project useful in your research, please consider citing: |
| 143 | + |
| 144 | +```bibtex |
| 145 | +@misc{zhang2023universal, |
| 146 | + title = {Universal Visual Decomposer: Long-Horizon Manipulation Made Easy}, |
| 147 | + author = {Zichen Zhang and Yunshuang Li and Osbert Bastani and Abhishek Gupta and Dinesh Jayaraman and Yecheng Jason Ma and Luca Weihs}, |
| 148 | + title = {Universal Visual Decomposer: Long-Horizon Manipulation Made Easy}, |
| 149 | + year = {2023}, |
| 150 | + eprint = {arXiv:2310.08581}, |
| 151 | +} |
| 152 | +``` |
0 commit comments