-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include observation.environment_state
with keypoints in PushT dataset
#303
Include observation.environment_state
with keypoints in PushT dataset
#303
Conversation
# Manually change this to True to not include images at all (but don't merge with True). Also make sure to | ||
# use video = 0 in the `push_dataset_to_hub.py` script. | ||
keypoints_only = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I had the idea to add a **kwargs style interface in push_dataset_to_hub.py
but it added too much complication for this one option. I think we can leave this manual for now and consider dataset specific kwargs later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides CODEBASE_VERSIOn, LGTM
Let's update the CODEBASE_VERSION, and update all datasets with something like
from huggingface_hub import create_branch
create_branch(f"lerobot/{dataset_id}", repo_type="dataset", branch="v1.5", revision"v1.4")
create_branch(f"lerobot/{dataset_id}", repo_type="dataset", branch="main", revision"v1.5", exist_ok=True)
features["observation.environment_state"] = Sequence( | ||
length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think?
features["observation.environment_state"] = Sequence( | |
length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None) | |
) | |
if keypoints_only: | |
features["observation.environment_state"] = Sequence( | |
length=data_dict["observation.environment_state"].shape[1], feature=Value(dtype="float32", id=None) | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been handled now because I've made image/keypoints mutually exclusive.
from lerobot import available_datasets | ||
|
||
for repo_id in available_datasets: | ||
try: | ||
create_branch(repo_id, repo_type="dataset", branch="v1.5", revision="v1.4") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before updating we should have little logic that check if there isn't a dataset with this new CODEBASE_VERSION already.
In this case, at least two people are updating an existing dataset and they should synchronize, so we should print a message that says to contact a core maintainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used a more explicit way to check first.
except HfHubHTTPError: | ||
# Note, this should only be the case for the datasets you have updated. If you see any others, please | ||
# reach out to the core LeRobot team. | ||
print(f"Found existing branch for {repo_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this. If there is an exception/error, we should exit.
We should also print the repo_id that have been updated without issue, so that if something is wrong, the user can know when to resume in repo_id list.
except HfHubHTTPError: | |
# Note, this should only be the case for the datasets you have updated. If you see any others, please | |
# reach out to the core LeRobot team. | |
print(f"Found existing branch for {repo_id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've exited early and printed out the repos that were successfully updated.
Co-authored-by: Remi <[email protected]>
Co-authored-by: Remi <[email protected]>
…tion' into add_pusht_keypoints
…tifact_generation
…tion' into add_pusht_keypoints
984b1a4
to
7438148
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…et (huggingface#303) Co-authored-by: Remi <[email protected]>
What this does
TODO before merging:
add_tee
gym-pusht#13python lerobot/scripts/push_dataset_to_hub.py --raw-dir data/pusht_raw/pusht --repo-id lerobot/pusht_keypoints --raw-format pusht_zarr --video 0 --push-to-hub 1
with the optionkeypoints_instead_of_image=True
manually set inpusht_zarr_format.py::from_raw_to_lerobot_format
to upload a version of the dataset with keypoints instead of the image.How to checkout & try? (for the reviewer)
I have already run
python lerobot/scripts/push_dataset_to_hub.py --raw-dir ${PATH_TO_RAW_PUSHT_DATA} --repo-id alexandersoare/pusht --raw-format pusht_zarr --video 1 --push-to-hub 1
, with two variants (with images, and with keypoints).You can run this script to visualize the keypoints (check your /tmp/pusht_keypoints_frames folder after it runs)