Include `observation.environment_state` with keypoints in PushT dataset #303

alexander-soare · 2024-07-04T13:41:10Z

What this does

Adds "observation.environment_state" to the data keys. For PushT this refers to the keypoints of the T.
Adds an option to create a dataset with only the keypoints.

TODO before merging:

Run dataset backwards compatibility locally. Won't do, as this is failing on main. I at least did my best to update the test artifacts.
Merge Add keypoints mode, and fix order of operations in add_tee gym-pusht#13
Merge Fix generation of dataset test artifact #306
Run python lerobot/scripts/push_dataset_to_hub.py --raw-dir data/pusht_raw/pusht --repo-id lerobot/pusht_keypoints --raw-format pusht_zarr --video 0 --push-to-hub 1 with the option keypoints_instead_of_image=True manually set in pusht_zarr_format.py::from_raw_to_lerobot_format to upload a version of the dataset with keypoints instead of the image.

How to checkout & try? (for the reviewer)

I have already run python lerobot/scripts/push_dataset_to_hub.py --raw-dir ${PATH_TO_RAW_PUSHT_DATA} --repo-id alexandersoare/pusht --raw-format pusht_zarr --video 1 --push-to-hub 1, with two variants (with images, and with keypoints).

You can run this script to visualize the keypoints (check your /tmp/pusht_keypoints_frames folder after it runs)

from pathlib import Path
from shutil import rmtree

import cv2
import numpy as np

from lerobot.common.datasets.lerobot_dataset import LeRobotDataset

DIR = Path("/tmp/pusht_keypoints_frames")
if DIR.exists():
    rmtree(DIR)
DIR.mkdir()

n_steps = 300
size = 96

dataset_image = LeRobotDataset(repo_id="alexandersoare/pusht")
dataset_kp = LeRobotDataset(repo_id="alexandersoare/pusht_keypoints")

for i, (item_image, item_kp) in enumerate(zip(dataset_image, dataset_kp, strict=True)):
    img = (item_image["observation.image"].permute(1, 2, 0).numpy() * 255).astype(np.uint8)
    img = np.ascontiguousarray(img)
    for kp in item_kp["observation.environment_state"].reshape(8, 2):
        kp = np.round(kp / 512 * img.shape[0]).numpy().astype(int)
        cv2.circle(img, tuple(kp), radius=2, color=(0, 0, 255), thickness=-1)
    cv2.putText(
        img,
        str(item_image["next.reward"].item()),
        org=[0, 5],
        fontFace=cv2.FONT_HERSHEY_COMPLEX,
        fontScale=0.5,
        color=(255, 0, 0),
    )
    cv2.imwrite(str(DIR / f"{i}.jpg"), img)
    if i == 1200:
        break

lerobot/common/datasets/lerobot_dataset.py

alexander-soare · 2024-07-04T13:50:46Z

lerobot/common/datasets/push_dataset_to_hub/pusht_zarr_format.py

+    # Manually change this to True to not include images at all (but don't merge with True). Also make sure to
+    # use video = 0 in the `push_dataset_to_hub.py` script.
+    keypoints_only = False


FYI, I had the idea to add a **kwargs style interface in push_dataset_to_hub.py but it added too much complication for this one option. I think we can leave this manual for now and consider dataset specific kwargs later.

Cadene

~~Besides CODEBASE_VERSIOn, LGTM~~

Let's update the CODEBASE_VERSION, and update all datasets with something like

from huggingface_hub import create_branch
create_branch(f"lerobot/{dataset_id}", repo_type="dataset", branch="v1.5", revision"v1.4")
create_branch(f"lerobot/{dataset_id}", repo_type="dataset", branch="main", revision"v1.5", exist_ok=True)

lerobot/common/datasets/lerobot_dataset.py

Cadene · 2024-07-04T13:59:01Z

lerobot/common/datasets/push_dataset_to_hub/pusht_zarr_format.py

+    features["observation.environment_state"] = Sequence(
+        length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None)
+    )


What do you think?

Suggested change

features["observation.environment_state"] = Sequence(

length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None)

)

if keypoints_only:

features["observation.environment_state"] = Sequence(

length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None)

)

This has been handled now because I've made image/keypoints mutually exclusive.

lerobot/__init__.py

lerobot/scripts/push_dataset_to_hub.py

Cadene · 2024-07-04T16:03:41Z

lerobot/scripts/push_dataset_to_hub.py

+from lerobot import available_datasets
+
+for repo_id in available_datasets:
+    try:
+        create_branch(repo_id, repo_type="dataset", branch="v1.5", revision="v1.4")


Before updating we should have little logic that check if there isn't a dataset with this new CODEBASE_VERSION already.

In this case, at least two people are updating an existing dataset and they should synchronize, so we should print a message that says to contact a core maintainer.

I used a more explicit way to check first.

Cadene · 2024-07-04T16:05:39Z

lerobot/scripts/push_dataset_to_hub.py

+    except HfHubHTTPError:
+        # Note, this should only be the case for the datasets you have updated. If you see any others, please
+        # reach out to the core LeRobot team.
+        print(f"Found existing branch for {repo_id}")


I would remove this. If there is an exception/error, we should exit.

We should also print the repo_id that have been updated without issue, so that if something is wrong, the user can know when to resume in repo_id list.

Suggested change

except HfHubHTTPError:

# Note, this should only be the case for the datasets you have updated. If you see any others, please

# reach out to the core LeRobot team.

print(f"Found existing branch for {repo_id}")

I've exited early and printed out the repos that were successfully updated.

lerobot/scripts/push_dataset_to_hub.py

Co-authored-by: Remi <[email protected]>

…tion' into add_pusht_keypoints

…tifact_generation

…tion' into add_pusht_keypoints

Cadene

LGTM

…et (huggingface#303) Co-authored-by: Remi <[email protected]>

ready for review

5da448e

alexander-soare commented Jul 4, 2024

View reviewed changes

lerobot/common/datasets/lerobot_dataset.py Show resolved Hide resolved

alexander-soare commented Jul 4, 2024

View reviewed changes

alexander-soare requested a review from Cadene July 4, 2024 13:54

Cadene reviewed Jul 4, 2024

View reviewed changes

alexander-soare added 5 commits July 4, 2024 15:23

revert dataset version and make image/keypoints mutually exclusive

6b46e6a

add pusht_keypoints to available datasets

750fed1

bump dataset version to 1.5

4acee2f

add instructions for updating a dataset

66be2af

add pusht_keypoints to test artifacts

0f9c983

Cadene reviewed Jul 4, 2024

View reviewed changes

alexander-soare and others added 14 commits July 4, 2024 17:12

Update lerobot/scripts/push_dataset_to_hub.py

d70e80b

Co-authored-by: Remi <[email protected]>

Update lerobot/scripts/push_dataset_to_hub.py

0c0cbbe

Co-authored-by: Remi <[email protected]>

update docs and update test data artifacts

b225b4a

update example script

7c78bfa

add b/c test artifact

f5e9090

ready for review

6f2b695

Merge remote-tracking branch 'origin/fix_dataset_test_artifact_genera…

7eba0cc

…tion' into add_pusht_keypoints

Merge remote-tracking branch 'upstream/main' into fix_dataset_test_ar…

8d94bb3

…tifact_generation

fix e2e

2c10195

improve logic

de03105

Merge remote-tracking branch 'origin/fix_dataset_test_artifact_genera…

862748a

…tion' into add_pusht_keypoints

udpate poetry

d2f6720

Merge remote-tracking branch 'upstream/main' into add_pusht_keypoints

350eab7

update poetry.lock

7438148

alexander-soare force-pushed the add_pusht_keypoints branch from 984b1a4 to 7438148 Compare July 5, 2024 10:19

Cadene approved these changes Jul 8, 2024

View reviewed changes

alexander-soare merged commit a4d77b9 into huggingface:main Jul 9, 2024
5 checks passed

alexander-soare deleted the add_pusht_keypoints branch July 9, 2024 16:44

amandip7 pushed a commit to amandip7/lerobot that referenced this pull request Oct 10, 2024

Include observation.environment_state with keypoints in PushT datas…

66b9f0f

…et (huggingface#303) Co-authored-by: Remi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include `observation.environment_state` with keypoints in PushT dataset #303

Include `observation.environment_state` with keypoints in PushT dataset #303

alexander-soare commented Jul 4, 2024 •

edited

Loading

alexander-soare Jul 4, 2024

Cadene left a comment •

edited

Loading

Cadene Jul 4, 2024

alexander-soare Jul 4, 2024

Cadene Jul 4, 2024

alexander-soare Jul 4, 2024

Cadene Jul 4, 2024

alexander-soare Jul 4, 2024

Cadene left a comment

Include observation.environment_state with keypoints in PushT dataset #303

Include observation.environment_state with keypoints in PushT dataset #303

Conversation

alexander-soare commented Jul 4, 2024 • edited Loading

What this does

How to checkout & try? (for the reviewer)

alexander-soare Jul 4, 2024

Choose a reason for hiding this comment

Cadene left a comment • edited Loading

Choose a reason for hiding this comment

Cadene Jul 4, 2024

Choose a reason for hiding this comment

alexander-soare Jul 4, 2024

Choose a reason for hiding this comment

Cadene Jul 4, 2024

Choose a reason for hiding this comment

alexander-soare Jul 4, 2024

Choose a reason for hiding this comment

Cadene Jul 4, 2024

Choose a reason for hiding this comment

alexander-soare Jul 4, 2024

Choose a reason for hiding this comment

Cadene left a comment

Choose a reason for hiding this comment

Include `observation.environment_state` with keypoints in PushT dataset #303

Include `observation.environment_state` with keypoints in PushT dataset #303

alexander-soare commented Jul 4, 2024 •

edited

Loading

Cadene left a comment •

edited

Loading