Transformations #68

davidbuniat · 2020-10-12T05:15:31Z

davidbuniat
Oct 12, 2020
Maintainer

How to design the transformation of the dataset into another dataset?

We should finalize the API for v1.0 transformation

edogrigqv2 · 2020-10-12T05:51:48Z

edogrigqv2
Oct 12, 2020

3 types of transformation

Generator (@hub.generator) one input sample, many output samples, resulting dataset has not getitem or setitem only iter. ds.store function stores it wherever we want.
Transformation (@hub.transform) one input sample, one output sample, getitem should work, no setitem. ds.store works
Apply (@hub.apply) input sample = output sample. If copy is needed we can pre-copy the dataset.

@hub.apply(...)
def my_apply(sample):
    sample["image"][5] = ndvi(sample["image"][0:3])

@hub.transform(dtype=...)
def my_transform(sample):
    res = copy(sample)
    res["image"][5] = ndvi(sample["image"][0:3])
    return res

@hub.generator(dtype=...)
def my_generator(sample):
    yield ...
    yield ...
    yield ...
    yield ...

ds2 = my_generator(ds1)
for i in ds2:
    pass
ds2.store("s3://...")

0 replies

davidbuniat · 2020-10-14T18:22:16Z

davidbuniat
Oct 14, 2020
Maintainer Author

From the feedback, we received today from a customer. They would need to specify where each step (either generator, transform, or apply) would be deployed including either GPU or CPU.

Then they also asked how they can run a model on the data and connect the preprocessing pipeline to it. Let's say if you are doing normalization as a preprocessing step for generating the dataset and training the model, then the same should also apply during the inference side.

Finally, The insight we have regarding chunk-based storage needs to be used for processing, otherwise it's no different from Ray or Dask processes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transformations #68

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Transformations #68

davidbuniat Oct 12, 2020 Maintainer

How to design the transformation of the dataset into another dataset?

Replies: 2 comments

edogrigqv2 Oct 12, 2020

davidbuniat Oct 14, 2020 Maintainer Author

davidbuniat
Oct 12, 2020
Maintainer

edogrigqv2
Oct 12, 2020

davidbuniat
Oct 14, 2020
Maintainer Author