Skip to content

Commit 0907206

Browse files
authored
Update pretrain.md
1 parent 4c2107a commit 0907206

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

docs/pretrain.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Pipeline of Pre-Training RDT
22

3-
Firstly, you need to install the prerequisites for RDT (see *Installation* in [README](../README.md)). Then, you can install the prerequisites for TensorFlow Dataset (in another Conda environment).
3+
Firstly, you need to install the prerequisites for RDT (see [README](../README.md#installation)). Then, you can install the prerequisites for TensorFlow Dataset (in another Conda environment).
44

55
## Installation for TensorFlow Dataset
66

@@ -73,12 +73,12 @@ We introduce how to download each of our pre-training datasets. If you plan to p
7373
Before everything, let's link the dataset directory on your disk to a subfolder of this repo:
7474

7575
```bash
76-
ln -s /path/to/dataset /path/to/project/robotics-diffusion-transformer/data/datasets
76+
ln -s /path/to/dataset /path/to/repo/RoboticsDiffusionTransformer/data/datasets
7777
```
7878

7979
### Open X-Embodiment
8080

81-
Specify the correct path to the `gsutil` in your Conda in L72 in [this file](../data/openx_embod/download.sh).
81+
Specify the correct path to the `gsutil` in your Conda in [this file](../data/openx_embod/download.sh#L72).
8282

8383
Run the following commands to download our selected datasets for the Open X-Embodiment:
8484

@@ -154,7 +154,7 @@ Add the control frequency of your dataset.
154154

155155
##### 2. `data/preprocess_scripts/my_pretrain_dataset.py`
156156

157-
If your dataset can be loaded by `tfds.builder_from_directory()`, then you only need to download it into the folder of Open X-Embodiment `data/datasets/openx_embod` and implement the function of `process_step()`. You may need to specify the tfds loading path in L78 (see [this file](../data/vla_dataset.py)). We refer to `data/preprocess_scripts/droid.py` for an example.
157+
If your dataset can be loaded by `tfds.builder_from_directory()`, then you only need to download it into the folder of Open X-Embodiment `data/datasets/openx_embod` and implement the function of `process_step()`. You may need to specify the tfds loading path in L78 (see [this file](../data/vla_dataset.py#L78)). We refer to `data/preprocess_scripts/droid.py` for an example.
158158

159159
If not, you need to first convert it into TFRecords and then implement both `load_dataset()` and `process_step()`. We refer to `data/agilex/hdf5totfrecords.py` and `data/preprocess_scripts/agilex.py` for examples.
160160

@@ -247,7 +247,7 @@ We employ a producer-consumer framework with TensorFlow Dataset for fast data lo
247247

248248
[This file](../configs/base.yaml) includes configurations relevant to model architecture (including number of heads, hidden dimension, and so on) and data processing. You may need to modify `buf_path` (L22) to your real buffer path. This buffer is used as disk shuffling buffer for data loading.
249249

250-
Configurations relevant to training are passed through *Command Line Arguments*. Use `python main.py -h ` to see the descriptions. We provide an example pre-training script in [this file](../pretrain.sh) (`pretrain.sh`). You may need to modify some of the parameters in this file, such as `OUTPUT_DIR`, `CUTLASS_PATH`, and `WANDB_PROJECT`.
250+
Configurations relevant to training are passed through *Command Line Arguments*. Use `python main.py -h ` to see the descriptions. We provide an example pre-training script in [this file](../pretrain.sh) (`pretrain.sh`). You may need to modify some of the parameters in this file, such as `CUTLASS_PATH` and `WANDB_PROJECT`.
251251

252252
You may need to modify the list of pre-training datasets in [this file](../configs/pretrain_datasets.json) and their corresponding sampling weights in [this file](../configs/pretrain_sample_weights.json). If you want to fine-tune RDT through this pipeline, you may need to remove abundant datasets in the list.
253253

0 commit comments

Comments
 (0)