Small improvements and a few bug fixes.

thomwolf released this 26 Nov 09:57

ce37b8e

Improvement:

Added a cache_dir option to from_pretrained() function to select a specific path to download and cache the pre-trained model weights. Useful for distributed training (see readme) (fix issue #44).

Bug fixes in model training and tokenizer loading:

Fixed error in CrossEntropyLoss reshaping (issue #55).
Fixed unicode error in vocabulary loading (issue #52).

Bug fixes in examples:

Fix weight decay in examples (previously bias and layer norm weights were also decayed due to an erroneous check in training loop).
Fix fp16 grad norm is None error in examples (issue #43).

Updated readme and docstrings

Assets 4