Dask Bag examples #46

mrocklin · 2018-09-26T18:10:12Z

We currently lack dask bag examples in this repository. Two come to mind:

Read JSON data, and do some groupby aggregation with both Bag.groupby and Bag.foldby
Read text data and do some basic wordcount

For the JSON data it might make sense to add a dataset generation tool for nested records data, similar to dask.datasets.timeseries, and then use that to generate JSON data to disk, similar to how we generate CSV data in http://examples.dask.org/dataframes/01-data-access.html#Create-artificial-dataset.

We would then read the JSON data, and do some minimal processing.

For the text data I wonder if there is an online dataset we can download. I suspect that the complete works of shakespeare is around somewhere. We might do a simple thing like read, split, frequencies. Or we might do more complex work afterwards by bringing in NLTK, stemming words, removing stopwords, etc..

The text was updated successfully, but these errors were encountered:

mrocklin added the good first issue Good for newcomers label Sep 26, 2018

jacobtomlinson added the help wanted Extra attention is needed label Oct 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask Bag examples #46

Dask Bag examples #46

mrocklin commented Sep 26, 2018

Dask Bag examples #46

Dask Bag examples #46

Comments

mrocklin commented Sep 26, 2018