v4.6.0
Added
- Support for community datasets on GCS.
- [API]
tfds.builder_from_directoryandtfds.builder_from_directories, see
https://www.tensorflow.org/datasets/external_tfrecord#directly_from_folder. - [API] Dash ("-") support in split names.
- [API]
file_formatargument todownload_and_preparemethod, allowing user
to specify an alternative file format to store prepared data (e.g. "riegeli"). - [API]
file_formattoDatasetInfostring representation. - [API] Expose the return value of Beam pipelines. This allows for users to
read the Beam metrics. - [API] Expose Feature
tf_example_specto public. - [API]
dockwarg onFeatures, to describe a feature. - [Documentation] Features description is shown on TFDS Catalog.
- [Documentation] More metadata about HuggingFace datasets in TFDS catalog.
- [Performance] Parallel load of metadata files.
- [Testing] TFDS tests are now run using GitHub actions - misc improvements such
as caching and sharding. - [Testing] Improvements to MockFs.
- New datasets.
Changed
- [API]
num_shardsis now optional in the shard name.
Removed
- TFDS pathlib API, migrated to a self-contained
etils.epath(see
https://github.com/google/etils).
Fixed
- Various datasets.
- Dataset builders that are defined adhoc (e.g. in Colab).
- Better
DatasetNotFoundErrormessages. - Don't set
deterministicon a global level but locally in interleave, so it
only apply to interleave and not all transformations. - Google drive downloader.
As always, thank you to all contributors!