Pii Modifier should work with DocumentDataset
on cudf
#418
Labels
enhancement
New feature or request
DocumentDataset
on cudf
#418
Is your feature request related to a problem? Please describe.
(not urgent since we anyway have to spill to host memory, but we might benefit from faster I/O and dataset filtering e.g. in #417 )
Noticed an oddity in the PII examples / scripts / docs that PII doesn't work when we do DocDataset.read_*(backend="cudf")
Given that
All of the examples / scripts / docs do a read dataset using dask (pandas) but to the Modifier pass in device='gpu'
Describe the solution you'd like
The code works with DocumentDataset('cudf')
I think we might just need
to_pyarrow().tolist()
when series is cudf typeThe text was updated successfully, but these errors were encountered: