Skip to content

Adding scene alignment & normalization across datasets #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 68 additions & 1 deletion DATA.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ We list the available data used in the current version of CrossOver in the table
| ------------ | ----------------------------- | ----------------------------------- | -------------------------- | -------------------------- |
| ScanNet | `[point, rgb, cad, referral]` | `[point, rgb, floorplan, referral]` | ❌ | ✅ |
| 3RScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ✅ | ✅ |
| ARKitScenes | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |
| MultiScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |


We detail data download and release instructions for preprocessing with scripts for ScanNet + 3RScan.
Expand Down Expand Up @@ -110,4 +112,69 @@ Scan3R/
| │ ├── objectsDataMultimodal.pt -> object data combined from data1D.pt + data2D.pt + data3D.pt (for easier loading)
| │ └── sel_cams_on_mesh.png (visualisation of the cameras selected for computing RGB features per scan)
| └── ...
```
```
### MultiScan

#### Running preprocessing scripts
Adjust the path parameters of `MultiScan` in the config files under `configs/preprocess`. Run the following (after changing the `--config-path` in the bash file):

```bash
$ bash scripts/preprocess/process_multiscan.sh
```

Our script for MultiScan dataset performs the following additional processing:

- 3D-to-2D projection for 2D segmentation and stores as `gt-projection-seg.pt` for each scan.

Post running preprocessing, the data structure should look like the following:

```
MultiScan/
├── objects_chunked/ (object data chunked into hdf5 format for instance baseline training)
| ├── train_objects.h5
| └── val_objects.h5
├── scans/
| ├── scene_00000_00/
| │ ├── gt-projection-seg.pt -> 3D-to-2D projected data consisting of framewise 2D instance segmentation
| │ ├── data1D.pt -> all 1D data + encoded (object referrals + BLIP features)
| │ ├── data2D.pt -> all 2D data + encoded (RGB + floorplan + DinoV2 features)
| │ ├── data2D_all_images.pt (RGB features of every image of every scan)
| │ ├── data3D.pt -> all 3D data + encoded (Point Cloud + I2PMAE features - object only)
| │ ├── object_id_to_label_id_map.pt -> Instance ID to NYU40 Label mapped
| │ ├── objectsDataMultimodal.pt -> object data combined from data1D.pt + data2D.pt + data3D.pt (for easier loading)
| │ └── sel_cams_on_mesh.png (visualisation of the cameras selected for computing RGB features per scan)
| └── ...
```

### ARKitScenes

#### Running preprocessing scripts
Adjust the path parameters of `ARKitScenes` in the config files under `configs/preprocess`. Run the following (after changing the `--config-path` in the bash file):

```bash
$ bash scripts/preprocess/process_arkit.sh
```

Our script for ARKitScenes dataset performs the following additional processing:

- 3D-to-2D projection for 2D segmentation and stores as `gt-projection-seg.pt` for each scan.

Post running preprocessing, the data structure should look like the following:

```
ARKitScenes/
├── objects_chunked/ (object data chunked into hdf5 format for instance baseline training)
| ├── train_objects.h5
| └── val_objects.h5
├── scans/
| ├── 40753679/
| │ ├── gt-projection-seg.pt -> 3D-to-2D projected data consisting of framewise 2D instance segmentation
| │ ├── data1D.pt -> all 1D data + encoded (object referrals + BLIP features)
| │ ├── data2D.pt -> all 2D data + encoded (RGB + floorplan + DinoV2 features)
| │ ├── data2D_all_images.pt (RGB features of every image of every scan )
| │ ├── data3D.pt -> all 3D data + encoded (Point Cloud + I2PMAE features - object only)
| │ ├── object_id_to_label_id_map.pt -> Instance ID to NYU40 Label mapped
| │ ├── objectsDataMultimodal.pt -> object data combined from data1D.pt + data2D.pt + data3D.pt (for easier loading)
| │ └── sel_cams_on_mesh.png (visualisation of the cameras selected for computing RGB features per scan)
| └── ...
```
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,9 @@ See [DATA.MD](DATA.md) for detailed instructions on data download, preparation a
| ------------ | ----------------------------- | ----------------------------------- | -------------------------- | -------------------------- |
| Scannet | `[point, rgb, cad, referral]` | `[point, rgb, floorplan, referral]` | ❌ | ✅ |
| 3RScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ✅ | ✅ |
| ARKitScenes | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |
| MultiScan | `[point, rgb, referral]` | `[point, rgb, referral]` | ❌ | ✅ |


> To run our demo, you only need to download generated embedding data; no need for any data preprocessing.

Expand All @@ -134,7 +137,7 @@ Various configurable parameters:
- `--database_path`: Path to the precomputed embeddings of the database scenes downloaded before (eg: `./release_data/embed_scannet.pt`).
- `--query_modality`: Modality of the query scene, Options: `point`, `rgb`, `floorplan`, `referral`
- `--database_modality`: Modality used for retrieval. Same options as above.
- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`).
- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`.

For embedding and pre-trained model download, refer to [generated embedding data](DATA.md#generated-embedding-data) and [checkpoints](#checkpoints) sections.

Expand Down
2 changes: 1 addition & 1 deletion TRAIN.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ $ bash scripts/train/train_instance_crossover.sh
```

#### Train Scene Retrieval Pipeline
Adjust path/configuration parameters in `configs/train/train_scene_crossover.yaml`. You can also add your customised dataset or choose to train on Scannet & 3RScan or either. Run the following:
Adjust path/configuration parameters in `configs/train/train_scene_crossover.yaml`. You can also add your customised dataset or choose to train on Scannet, 3RScan, MultiScan, & ARKitScenes or any combination of the same. Run the following:

```bash
$ bash scripts/train/train_scene_crossover.sh
Expand Down
23 changes: 21 additions & 2 deletions configs/evaluation/eval_instance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,14 +43,33 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : InferenceObjectRetrieval
InferenceObjectRetrieval:
val : [Scannet]
modalities : ['rgb', 'point', 'cad', 'referral']
scene_modalities : ['rgb', 'point', 'referral', 'floorplan']
ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/instance_crossover_scannet+scan3r.pth

ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/instance_crossover_scannet+scan3r+multiscan.pth

inference_module: ObjectRetrieval

Expand Down
21 changes: 20 additions & 1 deletion configs/evaluation/eval_scene.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,13 +43,32 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
max_object_len : 150
voxel_size : 0.02
avail_modalities : ['point', 'cad', 'rgb', 'referral']
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : InferenceSceneRetrieval
InferenceSceneRetrieval:
val : [Scannet]
modalities : ['rgb', 'point', 'cad', 'referral']
scene_modalities : ['rgb', 'point', 'referral', 'floorplan'] #, 'point']
ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/scene_crossover_scannet+scan3r.pth
ckpt_path : /drive/dumps/multimodal-spaces/runs/release_runs/scene_crossover_scannet+scan3r+multiscan.pth

inference_module: SceneRetrieval
model:
Expand Down
15 changes: 15 additions & 0 deletions configs/preprocess/process_1d.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,21 @@ data:
label_filename : labels.instances.align.annotated.v2.ply
skip_frames : 1

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
skip_frames : 1
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
skip_frames : 1

Shapenet:
base_dir : /drive/datasets/Shapenet/ShapeNetCore.v2/

Expand Down
17 changes: 16 additions & 1 deletion configs/preprocess/process_2d.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,21 @@ data:
label_filename : labels.instances.align.annotated.v2.ply
skip_frames : 1

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
skip_frames : 1
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
skip_frames : 1

modality_info:
1D :
feature_extractor:
Expand Down Expand Up @@ -60,4 +75,4 @@ task:
name : Preprocess
Preprocess :
modality : '2D'
splits : ['val']
splits : ['train', 'val']
14 changes: 14 additions & 0 deletions configs/preprocess/process_3d.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,20 @@ data:
processor1D : Scan3R1DProcessor
label_filename : labels.instances.align.annotated.v2.ply

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
skip_frames : 1

modality_info:
1D :
feature_extractor:
Expand Down
18 changes: 18 additions & 0 deletions configs/preprocess/process_multimodal.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,24 @@ data:
skip_frames : 1
avail_modalities : ['point', 'rgb', 'referral']

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
chunked_dir : ${data.process_dir}/ARKitScenes/objects_chunked
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'rgb', 'referral']

MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/MultiScan
process_dir : ${data.process_dir}/MultiScan/
chunked_dir : ${data.process_dir}/MultiScan/objects_chunked
processor3D : Scan3R3DProcessor
processor2D : Scan3R2DProcessor
processor1D : Scan3R1DProcessor
avail_modalities : ['point', 'rgb', 'referral']

modality_info:
1D :
feature_extractor:
Expand Down
21 changes: 21 additions & 0 deletions configs/train/train_instance_baseline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,27 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
chunked_dir : ${data.process_dir}/ARKitScenes/objects_chunked
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/Multiscan
process_dir : ${data.process_dir}/MultiScan/
chunked_dir : ${data.process_dir}/MultiScan/objects_chunked
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : ObjectLevelGrounding
ObjectLevelGrounding :
Expand Down
25 changes: 23 additions & 2 deletions configs/train/train_instance_crossover.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,33 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
chunked_dir : ${data.process_dir}/ARKitScenes/objects_chunked
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/Multiscan
process_dir : ${data.process_dir}/MultiScan/
chunked_dir : ${data.process_dir}/MultiScan/objects_chunked
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : SceneLevelGrounding
SceneLevelGrounding :
modalities : ['rgb', 'point', 'cad', 'referral']
train : [Scannet, Scan3R]
val : [Scannet, Scan3R]
train : [Scannet, Scan3R, MultiScan, ARKitScenes]
val : [Scannet, Scan3R, MultiScan, ARKitScenes]

trainer: GroundingTrainer

Expand Down
29 changes: 25 additions & 4 deletions configs/train/train_scene_crossover.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,14 +44,35 @@ data :
max_object_len : 150
voxel_size : 0.02

ARKitScenes:
base_dir : /media/sayan/Expansion/data/datasets/ARKitScenes
process_dir : ${data.process_dir}/ARKitScenes/
chunked_dir : ${data.process_dir}/ARKitScenes/objects_chunked
processor3D : ARKitScenes3DProcessor
processor2D : ARKitScenes2DProcessor
processor1D : ARKitScenes1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02
MultiScan:
base_dir : /media/sayan/Expansion/data/datasets/Multiscan
process_dir : ${data.process_dir}/MultiScan/
chunked_dir : ${data.process_dir}/MultiScan/objects_chunked
processor3D : MultiScan3DProcessor
processor2D : MultiScan2DProcessor
processor1D : MultiScan1DProcessor
avail_modalities : ['point', 'cad', 'rgb', 'referral']
max_object_len : 150
voxel_size : 0.02

task:
name : UnifiedTrain
UnifiedTrain :
modalities : ['rgb', 'point', 'cad', 'referral']
scene_modalities : ['rgb', 'point', 'floorplan', 'referral']
train : [Scannet, Scan3R, MultiScan]
val : [Scannet, Scan3R, MultiScan]
object_enc_ckpt : /drive/dumps/multimodal-spaces/runs/release_runs/instance_crossover_scannet+scan3r+multiscan.pth
train : [Scannet, Scan3R, MultiScan, ARKitScenes]
val : [Scannet, Scan3R, MultiScan, ARKitScenes]
object_enc_ckpt : /drive/dumps/multimodal-spaces/runs/release_runs/instance_crossover_scannet+scan3r+multiscan+arkitscenes.pth

trainer: UnifiedTrainer

Expand All @@ -78,7 +99,7 @@ model:
base_modality : 'rgb'

dataloader:
batch_size : 16
batch_size : 32
num_workers : 6

eval:
Expand Down
4 changes: 3 additions & 1 deletion data/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
from .scannet import *
from .scan3r import *
from .scan3r import *
from .arkit import *
from .multiscan import *
Loading