-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LanceDatasink for integration with ray doesn't support None #3308
Comments
Here's the error. It appears the pyarrow -> arrow-rs conversion is failing for null arrays:
A workaround for the above script (not sure if it works on your real data) is to supply the schema when you create the table so you never get null arrays. Change |
Thanks for the fast response! I am reading data from parquet, performing transformations and assigning def f(row):
return {
'a': row['a']
'b': None,
'c': None
}
ds = ray.data.read_parquet(input_file)
ds = ds.map(f)
ds.write_datasink(datasink)
|
Passing
None
to nullable fields crashes ray jobs.Same thing is happening for other datatypes. For example if datatype is
pa.string()
passingNone
still crashes ray job. Usingpa.null()
instead works for primitive datatypes but doesn't work forpa.list_(pa.string())
The text was updated successfully, but these errors were encountered: