You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that some scalar valued torch tensors in my dataset are being inferred as Pickle datatype. A quick investigation revealed, that neither the TensorSerialize nor the NoHeaderTensorSerializer can_serialize scalar valued tensors due to the conditions and len(item.shape) == 1 and and len(item.shape) > 1 respectively.
This is a quick fix and I'm happy to submit a PR. Am I correct to assume that a scalar valued tensor would best be serialized by the TensorSerializer? I think I recall that the NoHeaderTensorSerializer was designed specifically for LLM training use-cases?
To Reproduce
Execute the following snippet and you'll see a console printout indicating that the ``['pickle'] data format is inferred.
🐛 Bug
I noticed that some scalar valued torch tensors in my dataset are being inferred as Pickle datatype. A quick investigation revealed, that neither the TensorSerialize nor the NoHeaderTensorSerializer
can_serialize
scalar valued tensors due to the conditionsand len(item.shape) == 1
andand len(item.shape) > 1
respectively.This is a quick fix and I'm happy to submit a PR. Am I correct to assume that a scalar valued tensor would best be serialized by the TensorSerializer? I think I recall that the NoHeaderTensorSerializer was designed specifically for LLM training use-cases?
To Reproduce
Execute the following snippet and you'll see a console printout indicating that the ``['pickle']
data format
is inferred.Code sample
Expected behavior
Additional context
Environment detail
conda
,pip
, source):The text was updated successfully, but these errors were encountered: