Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalar torch Tensors not recognized by (NoHeader)TensorSerializer -> Serialized as Pickle #424

Closed
enrico-stauss opened this issue Nov 28, 2024 · 1 comment · Fixed by #431
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@enrico-stauss
Copy link
Contributor

🐛 Bug

I noticed that some scalar valued torch tensors in my dataset are being inferred as Pickle datatype. A quick investigation revealed, that neither the TensorSerialize nor the NoHeaderTensorSerializer can_serialize scalar valued tensors due to the conditions and len(item.shape) == 1 and and len(item.shape) > 1 respectively.

This is a quick fix and I'm happy to submit a PR. Am I correct to assume that a scalar valued tensor would best be serialized by the TensorSerializer? I think I recall that the NoHeaderTensorSerializer was designed specifically for LLM training use-cases?

To Reproduce

Execute the following snippet and you'll see a console printout indicating that the ``['pickle'] data format is inferred.

Code sample
import torch
from litdata import optimize


def fn(i):
    return torch.tensor(i, dtype=torch.float32)


if __name__ == "__main__":
    optimize(
        fn=fn,
        inputs=list(range(1000)),
        output_dir="litdata/test",
        chunk_bytes=f"{1}Mb",
    )

Expected behavior

Additional context

Environment detail
  • PyTorch Version (e.g., 1.0):
  • OS (e.g., Linux):
  • How you installed PyTorch (conda, pip, source):
  • Build command you used (if compiling from source):
  • Python version:
  • CUDA/cuDNN version:
  • GPU models and configuration:
  • Any other relevant information:
@enrico-stauss enrico-stauss added bug Something isn't working help wanted Extra attention is needed labels Nov 28, 2024
@tchaton
Copy link
Collaborator

tchaton commented Nov 28, 2024

Hey @enrico-stauss. Good catch. Please, submit a PR :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants