Small improvements and a few bug fixes.
Improvement:
- Added a
cache_dir
option tofrom_pretrained()
function to select a specific path to download and cache the pre-trained model weights. Useful for distributed training (see readme) (fix issue #44).
Bug fixes in model training and tokenizer loading:
- Fixed error in CrossEntropyLoss reshaping (issue #55).
- Fixed unicode error in vocabulary loading (issue #52).
Bug fixes in examples:
- Fix weight decay in examples (previously bias and layer norm weights were also decayed due to an erroneous check in training loop).
- Fix fp16 grad norm is None error in examples (issue #43).
Updated readme and docstrings