Skip to content

Commit 9f301c1

Browse files
committed
Version used by prototype.
1 parent 0d5b565 commit 9f301c1

File tree

114 files changed

+4761
-4504036
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+4761
-4504036
lines changed

.gitignore

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -34,17 +34,6 @@ wheels/
3434
.installed.cfg
3535
*.egg
3636

37-
# Logs
38-
logs/
39-
40-
# Testes
41-
test/
42-
train/
43-
eduamf/
44-
data/
45-
models/
46-
embeddings/
47-
4837
# PyInstaller
4938
# Usually these files are written by a python script from a template
5039
# before PyInstaller builds the exe, so as to inject date/other infos into it.
@@ -106,6 +95,9 @@ env/
10695
venv/
10796
ENV/
10897

98+
# My changes
99+
eduamf/
100+
109101
# Spyder project settings
110102
.spyderproject
111103
.spyproject

.travis.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ before_install:
1717
# Useful for debugging any issues with conda
1818
- conda info -a
1919
# freeze the supported pytorch version for consistency
20-
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION pytorch=0.4.0 -c soumith
20+
- conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION pytorch=0.4.1 cuda92 -c pytorch
2121
- source activate test-environment
2222
# use requirements.txt for dependencies
2323
- pip install -r requirements.txt
@@ -32,15 +32,15 @@ install:
3232
script:
3333
- wget -O /tmp/im2text.tgz http://lstm.seas.harvard.edu/latex/im2text_small.tgz; tar zxf /tmp/im2text.tgz -C /tmp/; head /tmp/im2text/src-train.txt > /tmp/im2text/src-train-head.txt; head /tmp/im2text/tgt-train.txt > /tmp/im2text/tgt-train-head.txt; head /tmp/im2text/src-val.txt > /tmp/im2text/src-val-head.txt; head /tmp/im2text/tgt-val.txt > /tmp/im2text/tgt-val-head.txt
3434
- wget -O /tmp/speech.tgz http://lstm.seas.harvard.edu/latex/speech.tgz; tar zxf /tmp/speech.tgz -C /tmp/; head /tmp/speech/src-train.txt > /tmp/speech/src-train-head.txt; head /tmp/speech/tgt-train.txt > /tmp/speech/tgt-train-head.txt; head /tmp/speech/src-val.txt > /tmp/speech/src-val-head.txt; head /tmp/speech/tgt-val.txt > /tmp/speech/tgt-val-head.txt
35-
- wget -O /tmp/test_model_speech.pt http://lstm.seas.harvard.edu/latex/test_model_speech.pt
35+
- wget -O /tmp/test_model_speech.pt http://lstm.seas.harvard.edu/latex/model_step_2760.pt
3636
- wget -O /tmp/test_model_im2text.pt http://lstm.seas.harvard.edu/latex/test_model_im2text.pt
3737
- python -m unittest discover
3838
# test nmt preprocessing
3939
- python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data /tmp/data -src_vocab_size 1000 -tgt_vocab_size 1000 && rm -rf /tmp/data*.pt
4040
# test im2text preprocessing
41-
- python preprocess.py -data_type img -src_dir /tmp/im2text/images -train_src /tmp/im2text/src-train.txt -train_tgt /tmp/im2text/tgt-train.txt -valid_src /tmp/im2text/src-val.txt -valid_tgt /tmp/im2text/tgt-val.txt -save_data /tmp/im2text/data && rm -rf /tmp/im2text/data*.pt
41+
- python preprocess.py -data_type img -shard_size 3 -src_dir /tmp/im2text/images -train_src /tmp/im2text/src-train.txt -train_tgt /tmp/im2text/tgt-train.txt -valid_src /tmp/im2text/src-val.txt -valid_tgt /tmp/im2text/tgt-val.txt -save_data /tmp/im2text/data && rm -rf /tmp/im2text/data*.pt
4242
# test speech2text preprocessing
43-
- python preprocess.py -data_type audio -src_dir /tmp/speech/an4_dataset -train_src /tmp/speech/src-train.txt -train_tgt /tmp/speech/tgt-train.txt -valid_src /tmp/speech/src-val.txt -valid_tgt /tmp/speech/tgt-val.txt -save_data /tmp/speech/data && rm -rf /tmp/speech/data*.pt
43+
- python preprocess.py -data_type audio -shard_size 300 -src_dir /tmp/speech/an4_dataset -train_src /tmp/speech/src-train.txt -train_tgt /tmp/speech/tgt-train.txt -valid_src /tmp/speech/src-val.txt -valid_tgt /tmp/speech/tgt-val.txt -save_data /tmp/speech/data && rm -rf /tmp/speech/data*.pt
4444
# test nmt translation
4545
- head data/src-test.txt > /tmp/src-test.txt; python translate.py -model onmt/tests/test_model.pt -src /tmp/src-test.txt -verbose
4646
# test im2text translation
@@ -50,7 +50,7 @@ script:
5050
# test nmt preprocessing and training
5151
- head data/src-val.txt > /tmp/src-val.txt; head data/tgt-val.txt > /tmp/tgt-val.txt; python preprocess.py -train_src /tmp/src-val.txt -train_tgt /tmp/tgt-val.txt -valid_src /tmp/src-val.txt -valid_tgt /tmp/tgt-val.txt -save_data /tmp/q -src_vocab_size 1000 -tgt_vocab_size 1000; python train.py -data /tmp/q -rnn_size 2 -batch_size 10 -word_vec_size 5 -report_every 5 -rnn_size 10 -train_steps 10 && rm -rf /tmp/q*.pt
5252
# test nmt preprocessing w/ sharding and training w/copy
53-
- head data/src-val.txt > /tmp/src-val.txt; head data/tgt-val.txt > /tmp/tgt-val.txt; python preprocess.py -train_src /tmp/src-val.txt -train_tgt /tmp/tgt-val.txt -valid_src /tmp/src-val.txt -valid_tgt /tmp/tgt-val.txt -max_shard_size 1 -dynamic_dict -save_data /tmp/q -src_vocab_size 1000 -tgt_vocab_size 1000; python train.py -data /tmp/q -rnn_size 2 -batch_size 10 -word_vec_size 5 -report_every 5 -rnn_size 10 -copy_attn -train_steps 10 && rm -rf /tmp/q*.pt
53+
- head data/src-val.txt > /tmp/src-val.txt; head data/tgt-val.txt > /tmp/tgt-val.txt; python preprocess.py -train_src /tmp/src-val.txt -train_tgt /tmp/tgt-val.txt -valid_src /tmp/src-val.txt -valid_tgt /tmp/tgt-val.txt -shard_size 1 -dynamic_dict -save_data /tmp/q -src_vocab_size 1000 -tgt_vocab_size 1000; python train.py -data /tmp/q -rnn_size 2 -batch_size 10 -word_vec_size 5 -report_every 5 -rnn_size 10 -copy_attn -train_steps 10 && rm -rf /tmp/q*.pt
5454

5555
# test im2text preprocessing and training
5656
- head /tmp/im2text/src-val.txt > /tmp/im2text/src-val-head.txt; head /tmp/im2text/tgt-val.txt > /tmp/im2text/tgt-val-head.txt; python preprocess.py -data_type img -src_dir /tmp/im2text/images -train_src /tmp/im2text/src-val-head.txt -train_tgt /tmp/im2text/tgt-val-head.txt -valid_src /tmp/im2text/src-val-head.txt -valid_tgt /tmp/im2text/tgt-val-head.txt -save_data /tmp/im2text/q; python train.py -model_type img -data /tmp/im2text/q -rnn_size 2 -batch_size 10 -word_vec_size 5 -report_every 5 -rnn_size 10 -train_steps 10 && rm -rf /tmp/im2text/q*.pt

CHANGELOG.md

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,47 @@
33

44

55
## [Unreleased]
6+
### Fixes and improvements
7+
8+
## [0.8.2](https://github.com/OpenNMT/OpenNMT-py/tree/0.8.2) (2019-02-16)
9+
* Update documentation and Library example
10+
* Revamp args
11+
* Bug fixes, save moving average in FP32
12+
* Allow FP32 inference for FP16 models
13+
14+
## [0.8.1](https://github.com/OpenNMT/OpenNMT-py/tree/0.8.1) (2019-02-12)
15+
* Update documentation
16+
* Random sampling scores fixes
17+
* Bug fixes
18+
19+
## [0.8.0](https://github.com/OpenNMT/OpenNMT-py/tree/0.8.0) (2019-02-09)
20+
* Many fixes and code cleaning thanks @flauted, @guillaumekln
21+
* Datasets code refactor (thanks @flauted) you need to r-preeprocess datasets
622

723
### New features
24+
* FP16 Support: Experimental, using Apex, Checkpoints may break in future version.
25+
* Continuous exponential moving average (thanks @francoishernandez, and Marian)
26+
* Relative positions encoding (thanks @francoishernanndez, and Google T2T)
27+
* Deprecate the old beam search, fast batched beam search supports all options
828

9-
### Fixes and improvements
29+
30+
## [0.7.2](https://github.com/OpenNMT/OpenNMT-py/tree/0.7.2) (2019-01-31)
31+
* Many fixes and code cleaning thanks @bpopeters, @flauted, @guillaumekln
32+
33+
### New features
34+
* Multilevel fields for better handling of text featuer embeddinggs.
35+
36+
37+
## [0.7.1](https://github.com/OpenNMT/OpenNMT-py/tree/0.7.1) (2019-01-24)
38+
* Many fixes and code refactoring thanks @bpopeters, @flauted, @guillaumekln
39+
40+
### New features
41+
* Random sampling thanks @daphnei
42+
* Enable sharding for huge files at translation
43+
44+
## [0.7.0](https://github.com/OpenNMT/OpenNMT-py/tree/0.7.0) (2019-01-02)
45+
* Many fixes and code refactoring thanks @benopeters
46+
* Migrated to Pytorch 1.0
1047

1148
## [0.6.0](https://github.com/OpenNMT/OpenNMT-py/tree/0.6.0) (2018-11-28)
1249
* Many fixes and code improvements

CONTRIBUTING.md

Lines changed: 82 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,88 @@
1+
# Contributors
2+
13
OpenNMT-py is a community developed project and we love developer contributions.
24

5+
## Guidelines
36
Before sending a PR, please do this checklist first:
47

5-
- Please run `onmt/tests/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
6-
1. flake8 and pep8-naming check for coding style;
8+
- Please run `tools/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
9+
1. flake8 check for coding style;
710
2. unittest;
811
3. continuous integration tests listed in `.travis.yml`.
9-
- When adding/modifying class constructor, please make the arguments as same naming style as its superclass in pytorch.
10-
- If your change is based on a paper, please include a clear comment and reference in the code.
11-
- If your function takes/returns tensor arguments, please include assertions to document the sizes. See `GlobalAttention.py` for examples.
12+
- When adding/modifying class constructor, please make the arguments as same naming style as its superclass in PyTorch.
13+
- If your change is based on a paper, please include a clear comment and reference in the code (more on that below).
14+
15+
### Docstrings
16+
Above all, try to follow the Google docstring format
17+
([Napoleon example](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html),
18+
[Google styleguide](http://google.github.io/styleguide/pyguide.html)).
19+
This makes it easy to include your contributions in the Sphinx documentation. And, do feel free
20+
to autodoc your contributions in the API ``.rst`` files in the `docs/source` folder! If you do, check that
21+
your additions look right.
22+
23+
```bash
24+
cd docs
25+
# install some dependencies if necessary:
26+
# recommonmark, sphinx_rtd_theme, sphinxcontrib-bibtex
27+
make html
28+
firefox build/html/main.html # or your browser of choice
29+
```
30+
31+
Some particular advice:
32+
- Try to follow Python 3 [``typing`` module](https://docs.python.org/3/library/typing.html) conventions when documenting types.
33+
- Exception: use "or" instead of unions for more readability
34+
- For external types, use the full "import name". Common abbreviations (e.g. ``np``) are acceptable.
35+
For ``torch.Tensor`` types, the ``torch.`` is optional.
36+
- Please don't use tics like `` (`str`) `` or rst directives like `` (:obj:`str`) ``. Napoleon handles types
37+
very well without additional help, so avoid the clutter.
38+
- [Google docstrings don't support multiple returns](https://stackoverflow.com/questions/29221551/can-sphinx-napoleon-document-function-returning-multiple-arguments).
39+
For multiple returns, the following works well with Sphinx and is still very readable.
40+
```python
41+
def foo(a, b):
42+
"""This is my docstring.
43+
44+
Args:
45+
a (object): Something.
46+
b (class): Another thing.
47+
48+
Returns:
49+
(object, class):
50+
51+
* a: Something or rather with a long
52+
description that spills over.
53+
* b: And another thing.
54+
"""
55+
56+
return a, b
57+
```
58+
- When citing a paper, avoid directly linking in the docstring! Add a Bibtex entry to `docs/source/refs.bib`.
59+
E.g., to cite "Attention Is All You Need", visit [arXiv](https://arxiv.org/abs/1706.03762), choose the
60+
[bibtext](https://dblp.uni-trier.de/rec/bibtex/journals/corr/VaswaniSPUJGKP17) link, search `docs/source/refs.bib`
61+
using `CTRL-F` for `DBLP:journals/corr/VaswaniSPUJGKP17`, and if you do not find it then copy-paste the
62+
citation into `refs.bib`. Then, in your docstring, use ``:cite:`DBLP:journals/corr/VaswaniSPUJGKP17` ``.
63+
- However, a link is better than nothing.
64+
- Please document tensor shapes. Prefer the format
65+
``` ``(a, b, c)`` ```. This style is easy to read, allows using ``x`` for multplication, and is common
66+
(PyTorch uses a few variations on the parentheses format, AllenNLP uses exactly this format, Fairseq uses
67+
the parentheses format with single ticks).
68+
- Again, a different style is better than no shape documentation.
69+
- Please avoid unnecessary space characters, try to capitalize, and try to punctuate.
70+
71+
For multi-line docstrings, add a blank line after the closing ``"""``.
72+
Don't use a blank line before the closing quotes.
73+
74+
``""" not this """`` ``"""This."""``
75+
76+
```python
77+
"""
78+
Not this.
79+
"""
80+
```
81+
```python
82+
"""This."""
83+
```
84+
85+
This note is the least important. Focus on content first, but remember that consistent docs look good.
86+
- Be sensible about the first line. Generally, one stand-alone summary line (per the Google guidelines) is good.
87+
Sometimes, it's better to cut directly to the args or an extended description. It's always acceptable to have a
88+
"trailing" citation.

README.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ All dependencies can be installed via:
3434
pip install -r requirements.txt
3535
```
3636

37-
Note that we currently only support PyTorch 0.4.1
37+
Note that we currently only support PyTorch 1.0.0
3838

3939
## Features
4040

@@ -50,11 +50,9 @@ Note that we currently only support PyTorch 0.4.1
5050
- ["Attention is all you need"](http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-the-transformer-model)
5151
- [Multi-GPU](http://opennmt.net/OpenNMT-py/FAQ.html##do-you-support-multi-gpu)
5252
- Inference time loss functions.
53-
54-
Beta Features (committed):
55-
- Structured attention
5653
- [Conv2Conv convolution model]
5754
- SRU "RNNs faster than CNN" paper
55+
- FP16 training (mixed-precision with Apex)
5856

5957
## Quickstart
6058

@@ -122,7 +120,7 @@ Click this button to open a Workspace on [FloydHub](https://www.floydhub.com/?ut
122120

123121
## Pretrained embeddings (e.g. GloVe)
124122

125-
Go to tutorial: [How to use GloVe pre-trained embeddings in OpenNMT-py](http://forum.opennmt.net/t/how-to-use-glove-pre-trained-embeddings-in-opennmt-py/1011)
123+
Please see the FAQ: [How to use GloVe pre-trained embeddings in OpenNMT-py](http://opennmt.net/OpenNMT-py/FAQ.html#how-do-i-use-pretrained-embeddings-e-g-glove)
126124

127125
## Pretrained Models
128126

@@ -145,6 +143,7 @@ Major contributors are:
145143
[Paul Tardy](https://github.com/pltrdy) (Ubiqus / Lium)
146144
[François Hernandez](https://github.com/francoishernandez) (Ubiqus)
147145
[Jianyu Zhan](http://github.com/jianyuzhan) (Shanghai)
146+
[Dylan Flaute](http://github.com/flauted (University of Dayton)
148147
and more !
149148

150149
OpentNMT-py belongs to the OpenNMT project along with OpenNMT-Lua and OpenNMT-tf.

data/README.md

Lines changed: 0 additions & 7 deletions
This file was deleted.

0 commit comments

Comments
 (0)