This repository has been archived by the owner on Aug 3, 2021. It is now read-only.
Releases: NVIDIA/OpenSeq2Seq
Releases · NVIDIA/OpenSeq2Seq
OpenSeq2Seq 18.12
- Jasper speech recognition model and documentation
- Audio classification model for Speech Commands dataset and documentation
- Improved documentation (LARC, distributed training, WaveNet)
- Minor enhancements (speech recognition data layer, custom decoder, TRT)
- Various bug fixes
OpenSeq2Seq 18.11
- Significantly improved Speech recognition (WER 4.32%)
- BPE support for Speech recognition
- WaveNet
- Enhanced cuDNN RNN support for Language Modeling
- Improvements in distributed text2text (thanks to @vsuthichai )
- Ability to freeze some layers (thanks to @ka-bu )
- Various bug-fixes
OpenSeq2Seq v18.10
What's new
- Improved and updated models with checkpoints (see the documentation)
- New models:
- Transformer Big (for NMT)
- Sentiment Analysis (based on universal LM)
- Joint CTC Attention based ASR
- Improved scalability
- Speech Synthesis audio samples (in the documentation)
- Support for CUDA10, Horovod 0.14
OpenSeq2Seq v18.09
What's new
- Improved and updated models with checkpoints (see the documentation)
- Dropped Python2 support
- Switched to TensorFlow 1.10
- Added TensorRT support for fast inference
- Refactored and updated documentation: https://nvidia.github.io/OpenSeq2Seq
- Switched versioning to month-based labels
OpenSeq2Seq v0.5
New Modality
text2speech - spectrogram synthesis from text
New Models
- Tacotron 2 - like model for text2speech (English)
Various improvements
- in ConvS2S model for translation
- in Wav2Letter - like model for speech2text
- in DeepSpeech2 - like model for speech2text
- Bugfixes
Other
- Tensorflow's version increased to 1.9
OpenSeq2Seq v0.4
New models:
- ConvS2S model for translation.
- Wav2Letter model for speech recognition.
- CIFAR-10 dataset support.
- CNNEncoder that can be used to construct arbitrary (almost) CNN models. Based on that, integrated AlexNet and cifar10-nv.
New features:
- Support for "iter_size" (accumulating gradients for "iter_size" steps without update).
- "objects" benchmarking to evaluation and inference modes.
- cuDNN compatible cells support for GNMT.
- 8-padding for transformer.
- Improved config overwriting by train/eval/infer params (will not replace whole dicts, but update incrementally).
- Audio normalization before preprocessing for speech2text models.
- More summaries/parameters for different models.
Bug fixes:
- Regularization in mixed precision mode (loss scaling was not applied, disabling regularizer).
- Overwriting bool values from command line.
- Multi-GPU evaluation in towers mode.
- Multi-GPU inference for speech2text.
- "reflect" padding changed to use zeros for audio preprocessing.
- Unicode support for Python 2.
Important config/API changes:
- Unified static/dynamic loss scaling into a single parameter.
- Made RNN cells accept arbitrary parameters.
- Exposed training step into maybe_print_logs and evaluate functions.
Other changes:
- Improved unit tests and documentation.
OpenSeq2Seq v0.3
New models:
- Added ResNet model and ImageNet data layer.
- Improved DeepSpeech-2 models and reached 4.59% WER.
- Added Transformers model.
New features:
- Implemented evaluation in Horovod mode.
- Added mixed precision support for Horovod mode.
- Fixed evaluation in multi-GPU mode.
- All string/numerical config parameters can now be rewritten from command line (nested dicts are separated with "/").
- Moved start_experiment.sh functionality to run.py (--enable_logs parameter). Additionally it now logs exact command line arguments used to invoke the script.
- Added new benchmarking functionality: now models can also report number of objects (e..g tokens or images) per second.
- Added more summaries/parameters for different models.
API changes:
- Replaced Seq2Seq class with EncoderDecoderModel to support arbitrary models that can be expressed in encoder-decoder-loss paradigm.
- Changed data layer API to only work with tf.data (dropped placeholders support).
- Hid Horovod/non Horovod differences from users (no need to take care about that when creating new models / data layers).
Other changes:
- Improved unit tests and documentation.
OpenSeq2Seq v0.2
- Massive API changes
- Add mixed precision training support
- Add speech-to-text models support
- Improved documentation