It could be useful to specify how exactly data will move through GenomeDK.
Maybe something like:
- Sent to GenomeDK as a file of some format
data.xyz (cannot leave GenomeDK)
- [shear] Transformed to a data frame
variable (cannot leave GenomeDK)
- [sprout] Properties extracted and edited
properties.py, datapackage.json (committed and synced to GitHub)
- [sprout] Data frame saved as batch Parquet file
batch.parquet (cannot leave GenomeDK)
- [sprout] Batch Parquet files merged
data.parquet (cannot leave GenomeDK)
It could be useful to specify how exactly data will move through GenomeDK.
Maybe something like: