Skip to content

Commit

Permalink
Update README.md for webpage
Browse files Browse the repository at this point in the history
  • Loading branch information
cvachha authored Dec 14, 2023
1 parent 1aa200d commit 1cf97b8
Showing 1 changed file with 22 additions and 79 deletions.
101 changes: 22 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,100 +1,43 @@
# Instruct-GS2GS: Editing 3D Scenes with Instructions
Cyrus Vachha and Ayaan Haque

![teaser](webpage_imgs\frederik_montage_v3.mp4)

![teaser](imgs/in2n_teaser.png)

# Installation

## 1. Install Nerfstudio dependencies

Instruct-GS2GS is build on Nerfstudio and therefore has the same dependency reqirements. Specfically [PyTorch](https://pytorch.org/) and [tinycudann](https://github.com/NVlabs/tiny-cuda-nn) are required.

Follow the instructions [at this link](https://docs.nerf.studio/quickstart/installation.html) to create the environment and install dependencies. Only follow the commands up to tinycudann. After the dependencies have been installed, return here.

## 2. Installing Instruct-GS2GS

Once you have finished installing dependencies, you can install Instruct-GS2GS using the following command:
```bash
pip install git+https://github.com/cvachha/instruct-gs2gs
```

_Optional_: If you would like to work with the code directly, clone then install the repo:
```bash
git clone https://github.com/cvachha/instruct-gs2gs.git
cd instruct-gs2gs
pip install --upgrade pip setuptools
pip install -e .
```

## 3. Checking the install

The following command should include `igs2gs` as one of the options:
```bash
ns-train -h
```

# Using Instruct-GS2GS

![teaser](imgs/in2n_pipeline.png)

To edit a GS, you must first train a regular `gaussian-splatting` scene using your data. To process your custom data, please refer to [this](https://docs.nerf.studio/quickstart/custom_dataset.html) documentation.

Once you have your custom data, you can train your initial GS with the following command:

```bash
ns-train gaussian-splatting --data {PROCESSED_DATA_DIR}
```

For more details on training a GS, see [Nerfstudio documentation](https://docs.nerf.studio/quickstart/first_nerf.html).

Once you have fully trained your scene, the checkpoints will be saved to the `outputs` directory. Copy the path to the `nerfstudio_models` folder.

To start training for editing the GS, run the following command:

```bash
ns-train igs2gs --data {PROCESSED_DATA_DIR} --load-dir {outputs/.../nerfstudio_models} --pipeline.prompt {"prompt"} --pipeline.guidance-scale 7.5 --pipeline.image-guidance-scale 1.5
```

The `{PROCESSED_DATA_DIR}` must be the same path as used in training the original GS. Using the CLI commands, you can choose the prompt and the guidance scales used for InstructPix2Pix.
We propose a method for editing 3D Gaussian Splatting scenes with text-instructions. Our work is based largely off Instruct-NeRF2NeRF which proposes an iterative dataset update method to make consistent 3D edits to Neural Radiance Fields given a text instruction. We propose a modified technique to adapt the editing scheme for 3D gaussian splatting scenes. We demonstrate comparable results to Instruct-NeRF2NeRF and show that our method can perform realistic global text edits on large real-world scenes and individual subjects.

After the GS is trained, you can render the GS using the standard Nerfstudio workflow, found [here](https://docs.nerf.studio/quickstart/viewer_quickstart.html).
## Introduction
Recent advances in photo-realistic novel 3D representations such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have instigated a multitude of works exploring 3D generation, neural 3D reconstruction, and practical applications for these representations. Editing novel 3D representations like NeRF or 3DGS remains a challenge, and traditional 3D tools are generally incompatible with these representations. Instruct-NeRF2NeRF describes a method to semantically edit NeRFs with text instructions. Instruct-NeRF2NeRF uses a 2D diffusion model, InstructPix2Pix, to iteratively edit the training dataset and update the NeRF simultaneously. Recently, 3DGS has gained popularity as a representation, but the Instruct-NeRF2NeRF algorithm cannot be naively applied to gaussian splatting. While NeRFs offer detailed 3D reconstructions, 3DGS has the primary advantage of real-time rendering speeds, making it a more suitable choice for integration with game engines, web compatibility, and virtual reality.

## Training Notes
In this paper, we propose Instruct-GS2GS, a method to edit 3D Gaussian Splatting scenes and objects with global text instructions. Our method performs edits on a pre-captured 3DGS scene in a 3D consistent manner, similar to Instruct-NeRF2NeRF, while also having much a faster training and inference speed. Our method adapts the iterative dataset update approach from Instruct-NeRF2NeRF to work effectively for 3DGS. Our method is implemented in Nerfstudio, allowing users to perform edits quickly and view them in real-time.

***Important***
Please note that training the GS on images with resolution larger than 512 will likely cause InstructPix2Pix to throw OOM errors. Moreover, it seems InstructPix2Pix performs significantly worse on images at higher resolution. We suggest training with a resolution that is around 512 (max dimension), so add the following tag to the end of both your `gaussian-splatting` and `igs2gs` training command: `nerfstudio-data --downscale-factor {2,4,6,8}` to the end of your `ns-train` commands. Alternatively, you can downscale your dataset yourself and update your `transforms.json` file (scale down w, h, fl_x, fl_y, cx, cy), or you can use a smaller image scale provided by Nerfstudio.
## Method

We recommend capturing data using images from Polycam, as smaller datasets work better and faster with our method.
Our method takes in a dataset of camera poses and training images, a trained 3DGS scene, and a user specified text-prompt instruction, e.g. "make him a marble statue". Instruct-GS2GS constructs the edited GS scene guided by the text-prompt by applying a 2D text and image conditioned diffusion model, in this case Instruct-Pix2Pix, to all training images over the course of training. We perform these edits using an iterative udpate scheme in which all training dataset images are updated using a diffusion model individually, for sequential iterations spanning the size of the training images, every 2.5k training iterations. This process allows the GS to have a holistic edit and maintain 3D consistency.

If you have multiple GPUs, training can be sped up by placing InstructPix2Pix on a separate GPU. To do so, add `--pipeline.ip2p-device cuda:{device-number}` to your training command.
![pipeline](webpage_imgs\igs2gs_pipeline.png)

Our method uses ~16K rays and LPIPS, but not all GPUs have enough memory to run this configuration. As a result, we have provided two alternative configurations which use less memory, but be aware that these configurations lead to decreased performance. The differences are the precision used for IntructPix2Pix and whether LPIPS is used (which requires 4x more rays). The details of each config is provided in the table below.
Our process is similar to Instruct-NeRF2NeRF where for a given training camera view, we set the original training image as the conditioning image, the noisy image input as the NeRF rendered from the camera combined with some randomly selected noise, and receive an edited image respecting the text conditioning. With this method we are able to propagate the edited changes to the GS scene. We are able to maintain grounded edits by conditioning Instruct-Pix2Pix on the original unedited training image.

| Method | Description | Memory | Quality |
| ---------------------------------------------------------------------------------------------------- | -------------- | ----------------------------------------------------------------- | ----------------------- |
| `igs2gs` | Full model, used in paper | ~15GB | Best |
### Implementation
We use Nerfstudio's gsplat library for our underlying gaussian splatting model. We adapt similar parameters for the diffusion model from Instruct-NeRF2NeRF. Among these are the values for $[t_\text{min}, t_\text{max}] = [0.70,0.98]$, which define the amount of noise (and therefore the amount signal retained from the original images). We vary the classifier-free guidance scales per edit and scene, using a range of values from $s_I=(1.2,1.5)$ and $s_T=(7.5,12.5)$. We edit the entire dataset and then train the scene for 2.5k iterations. For GS training, we use L1 and LPIPS losses. We train our method for a maximum of 30k iterations (starting with a GS scene trained for 20k iterations). However, in practice we stop training once the edit has converged. In many cases, the optimal training length is a subjective decision — a user may prefer more subtle or more extreme edits that are best found at different stages of training.

Currently, we set the max number of iterations for `igs2gs` training to be 15k iteratios. Most often, the edit will look good after ~10k iterations. If you would like to train for longer, just reload your last `igs2gs` checkpoint and continue training, or change `--max-num-iterations 30000`.
# Results
Our qualitative results are shown in our first video and the following results. For each edit, we show multiple views to illustrate the 3D consistency. On the portrait capture in the first video, we are able to perform the same edits as Instruct-NeRF2NeRF, as well as new edits like "turn him into a Lego Man." In certain cases, the results look more 3D consistent and higher quality, and we provide a comparison below. However, the gaussian splatting representation makes it challenging to add entirely new geometry. These edits also extend to subjects other than people, like changing a bear statue into a real polar bear, panda, and grizzly bear. We are able to edit large-scale scenes just like Instruct-NeRF2NeRF, while maintaining the same level of 3D consistency.

## Tips
![timelapse](webpage_imgs\comparison_igs2gs.png)

If your edit isn't working as you desire, it is likely because InstructPix2Pix struggles with your images and prompt. We recommend taking one of your training views and trying to edit it in 2D first with InstructPix2Pix, which can be done at [this](https://huggingface.co/spaces/timbrooks/instruct-pix2pix) HuggingFace space. More tips on getting a good edit can be found [here](https://github.com/timothybrooks/instruct-pix2pix#tips).
Most importantly, we find that our method outputs a reasonable result in around 13 min while Instruct-NeRF2NeRF takes approximately 50 min on the same scene.

# Extending Instruct-GS2GS
![timelapse](webpage_imgs\igs2gs_timelapse.mp4)

### Issues
Please open Github issues for any installation/usage problems you run into. We've tried to support as broad a range of GPUs as possible, but it might be necessary to provide even more low-footprint versions. Please contribute with any changes to improve memory usage!

### Code structure
To build off Instruct-GS2GS, we provide explanations of the core code components.
Below we show results on real-world environments

`igs2gs_datamanager.py`: This file is almost identical to the `base_datamanager.py` in Nerfstudio. The main difference is that the entire dataset tensor is pre-computed in the `setup_train` method as opposed to being sampled in the `next_train` method each time.
![timelapse](webpage_imgs\campanile_montage_v2.mp4)

`igs2gs_pipeline.py`: This file builds on the pipeline module in Nerfstudio. The `get_train_loss_dict` method samples images and places edited images back into the dataset.
![timelapse](webpage_imgs\egypt_montage_v1.mp4)

`ip2p.py`: This file houses the InstructPix2Pix model (using the `diffusers` implementation). The `edit_image` method is where an image is denoised using the diffusion model, and a variety of helper methods are contained in this file as well.
![timelapse](webpage_imgs\bear_montage_v1.mp4)

`igs2gs.py`: We overwrite the `get_loss_dict` method to use LPIPs loss and L1Loss.
![timelapse](webpage_imgs\platform_montage_v1.mp4)

# Citation
Cyrus Vachha and Ayaan Haque

0 comments on commit 1cf97b8

Please sign in to comment.