Releases: tensorflow/datasets
Releases · tensorflow/datasets
v2.0.0
- This is the last version of TFDS that will support Python 2. Going forward, we'll only support and test against Python 3.
- The default versions of all datasets are now using the S3 slicing API. See the guide for details.
- The previous split API is still available, but is deprecated. If you wrote
DatasetBuilders outside the TFDS repository, please make sure they do not useexperiments={tfds.core.Experiment.S3: False}. This will be removed in the next version, as well as thenum_shardskwargs fromSplitGenerator. - Several new datasets. Thanks to all the contributors!
- API changes and new features:
shuffle_filesdefaults to False so that dataset iteration is deterministic by default. You can customize the reading pipeline, including shuffling and interleaving, through the newread_configparameter intfds.load.urlskwargs renamedhomepageinDatasetInfo- Support for nested
tfds.features.Sequenceandtf.RaggedTensor - Custom
FeatureConnectors can override thedecode_batch_examplemethod for efficient decoding when wrapped inside atfds.features.Sequence(my_connector) - Declaring a dataset in Colab won't register it, which allow to re-run the cell without having to change the name
- Beam datasets can use a
tfds.core.BeamMetadataDictto store additional metadata computed as part of the Beam pipeline. - Beam datasets'
_split_generatorsaccepts an additionalpipelinekwargs to define a pipeline shared between all splits.
- Various other bug fixes and performance improvements. Thank you for all the reports and fixes!
v1.3.0
Bug fixes and performance improvements.
v1.2.0
Features
- Add
shuffle_filesargument totfds.loadfunction. The semantic is the same as inbuilder.as_datasetfunction, which for now means that by default, files will be shuffled forTRAINsplit, and not for other splits. Default behaviour will change to always be False at next release. - Most datasets now support the new S3 API (documentation)
- Support for uint16 PNG images
Misc
- Crash while shuffling on Windows
- Various documentation improvements
New datasets
- AFLW2000-3D
- Amazon_US_Reviews
- binarized_mnist
- BinaryAlphaDigits
- Caltech Birds 2010
- Coil100
- DeepWeeds
- Food101
- MIT Scene Parse 150
- RockYou leaked password
- Stanford Dogs
- Stanford Online Products
- Visual Domain Decathlon
v1.1.0
Features
- Add
in_memoryoption to cache small dataset in RAM. - Better sharding, shuffling and sub-split
- It is now possible to add arbitrary metadata to
tfds.core.DatasetInfo
which will be stored/restored with the dataset. Seetfds.core.Metadata. - Better proxy support, possibility to add certificate
- Add
decoderskwargs to override the default feature decoding
(guide).
New datasets
More datasets added:
- downsampled_imagenet
- patch_camelyon
- coco 2017 (with and without panoptic annotations)
- uc_merced
- trivia_qa
- super_glue
- so2sat
- snli
- resisc45
- pet_finder
- mnist_corrupted
- kitti
- eurosat
- definite_pronoun_resolution
- curated_breast_imaging_ddsm
- clevr
- bigearthnet
v1.0.2
- Add Apache Beam support
- Add direct GCS access for MNIST (with
tfds.load('mnist', try_gcs=True)) - More datasets added
- Option to turn off tqdm bar (
tfds.disable_progress_bar()) - Subsplit do not depends on the number of shard anymore (#292)
- Various bug fixes
Thanks to all external contributors for raising issues, their feedback and their pull request.