Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include observation.environment_state with keypoints in PushT dataset #303

Merged
merged 20 commits into from
Jul 9, 2024

Conversation

alexander-soare
Copy link
Contributor

@alexander-soare alexander-soare commented Jul 4, 2024

What this does

  • Adds "observation.environment_state" to the data keys. For PushT this refers to the keypoints of the T.
  • Adds an option to create a dataset with only the keypoints.

TODO before merging:

  • Run dataset backwards compatibility locally. Won't do, as this is failing on main. I at least did my best to update the test artifacts.
  • Merge Add keypoints mode, and fix order of operations in add_tee gym-pusht#13
  • Merge Fix generation of dataset test artifact #306
  • Run python lerobot/scripts/push_dataset_to_hub.py --raw-dir data/pusht_raw/pusht --repo-id lerobot/pusht_keypoints --raw-format pusht_zarr --video 0 --push-to-hub 1 with the option keypoints_instead_of_image=True manually set in pusht_zarr_format.py::from_raw_to_lerobot_format to upload a version of the dataset with keypoints instead of the image.

How to checkout & try? (for the reviewer)

I have already run python lerobot/scripts/push_dataset_to_hub.py --raw-dir ${PATH_TO_RAW_PUSHT_DATA} --repo-id alexandersoare/pusht --raw-format pusht_zarr --video 1 --push-to-hub 1, with two variants (with images, and with keypoints).

You can run this script to visualize the keypoints (check your /tmp/pusht_keypoints_frames folder after it runs)

from pathlib import Path
from shutil import rmtree

import cv2
import numpy as np

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

DIR = Path("/tmp/pusht_keypoints_frames")
if DIR.exists():
    rmtree(DIR)
DIR.mkdir()

n_steps = 300
size = 96

dataset_image = LeRobotDataset(repo_id="alexandersoare/pusht")
dataset_kp = LeRobotDataset(repo_id="alexandersoare/pusht_keypoints")

for i, (item_image, item_kp) in enumerate(zip(dataset_image, dataset_kp, strict=True)):
    img = (item_image["observation.image"].permute(1, 2, 0).numpy() * 255).astype(np.uint8)
    img = np.ascontiguousarray(img)
    for kp in item_kp["observation.environment_state"].reshape(8, 2):
        kp = np.round(kp / 512 * img.shape[0]).numpy().astype(int)
        cv2.circle(img, tuple(kp), radius=2, color=(0, 0, 255), thickness=-1)
    cv2.putText(
        img,
        str(item_image["next.reward"].item()),
        org=[0, 5],
        fontFace=cv2.FONT_HERSHEY_COMPLEX,
        fontScale=0.5,
        color=(255, 0, 0),
    )
    cv2.imwrite(str(DIR / f"{i}.jpg"), img)
    if i == 1200:
        break

Comment on lines 242 to 244
# Manually change this to True to not include images at all (but don't merge with True). Also make sure to
# use video = 0 in the `push_dataset_to_hub.py` script.
keypoints_only = False
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I had the idea to add a **kwargs style interface in push_dataset_to_hub.py but it added too much complication for this one option. I think we can leave this manual for now and consider dataset specific kwargs later.

@alexander-soare alexander-soare requested a review from Cadene July 4, 2024 13:54
Copy link
Collaborator

@Cadene Cadene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides CODEBASE_VERSIOn, LGTM

Let's update the CODEBASE_VERSION, and update all datasets with something like

from huggingface_hub import create_branch
create_branch(f"lerobot/{dataset_id}", repo_type="dataset", branch="v1.5", revision"v1.4")
create_branch(f"lerobot/{dataset_id}", repo_type="dataset", branch="main", revision"v1.5", exist_ok=True)

lerobot/common/datasets/lerobot_dataset.py Show resolved Hide resolved
Comment on lines 216 to 218
features["observation.environment_state"] = Sequence(
length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

Suggested change
features["observation.environment_state"] = Sequence(
length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None)
)
if keypoints_only:
features["observation.environment_state"] = Sequence(
length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None)
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been handled now because I've made image/keypoints mutually exclusive.

lerobot/__init__.py Outdated Show resolved Hide resolved
lerobot/scripts/push_dataset_to_hub.py Outdated Show resolved Hide resolved
lerobot/scripts/push_dataset_to_hub.py Outdated Show resolved Hide resolved
Comment on lines 55 to 59
from lerobot import available_datasets

for repo_id in available_datasets:
try:
create_branch(repo_id, repo_type="dataset", branch="v1.5", revision="v1.4")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before updating we should have little logic that check if there isn't a dataset with this new CODEBASE_VERSION already.

In this case, at least two people are updating an existing dataset and they should synchronize, so we should print a message that says to contact a core maintainer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a more explicit way to check first.

Comment on lines 60 to 63
except HfHubHTTPError:
# Note, this should only be the case for the datasets you have updated. If you see any others, please
# reach out to the core LeRobot team.
print(f"Found existing branch for {repo_id}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this. If there is an exception/error, we should exit.

We should also print the repo_id that have been updated without issue, so that if something is wrong, the user can know when to resume in repo_id list.

Suggested change
except HfHubHTTPError:
# Note, this should only be the case for the datasets you have updated. If you see any others, please
# reach out to the core LeRobot team.
print(f"Found existing branch for {repo_id}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've exited early and printed out the repos that were successfully updated.

lerobot/scripts/push_dataset_to_hub.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Cadene Cadene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexander-soare alexander-soare merged commit a4d77b9 into huggingface:main Jul 9, 2024
5 checks passed
@alexander-soare alexander-soare deleted the add_pusht_keypoints branch July 9, 2024 16:44
amandip7 pushed a commit to amandip7/lerobot that referenced this pull request Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants