libeineu
diff --git a/‎.DS_Store
10 KB b/‎.DS_Store
10 KB
diff --git a/‎LICENSE
Lines changed: 30 additions & 0 deletions b/‎LICENSE
Lines changed: 30 additions & 0 deletions
diff --git a/‎PATENTS
Lines changed: 33 additions & 0 deletions b/‎PATENTS
Lines changed: 33 additions & 0 deletions
diff --git a/‎README.md
Lines changed: 264 additions & 2 deletions b/‎README.md
Lines changed: 264 additions & 2 deletions
diff --git a/‎docs/Makefile
Lines changed: 20 additions & 0 deletions b/‎docs/Makefile
Lines changed: 20 additions & 0 deletions
diff --git a/‎docs/_static/theme_overrides.css
Lines changed: 9 additions & 0 deletions b/‎docs/_static/theme_overrides.css
Lines changed: 9 additions & 0 deletions
@@ -0,0 +1,30 @@
+BSD License
+
+For fairseq software
+
+Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
+
+Redistribution and use in source and binary forms, with or without modification,
+are permitted provided that the following conditions are met:
+
+ * Redistributions of source code must retain the above copyright notice, this
+    list of conditions and the following disclaimer.
+
+ * Redistributions in binary form must reproduce the above copyright notice,
+    this list of conditions and the following disclaimer in the documentation
+       and/or other materials provided with the distribution.
+
+ * Neither the name Facebook nor the names of its contributors may be used to
+    endorse or promote products derived from this software without specific
+       prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
@@ -0,0 +1,33 @@
+Additional Grant of Patent Rights Version 2
+
+"Software" means the fairseq software distributed by Facebook, Inc.
+
+Facebook, Inc. ("Facebook") hereby grants to each recipient of the Software
+("you") a perpetual, worldwide, royalty-free, non-exclusive, irrevocable
+(subject to the termination provision below) license under any Necessary
+Claims, to make, have made, use, sell, offer to sell, import, and otherwise
+transfer the Software. For avoidance of doubt, no license is granted under
+Facebook’s rights in any patent claims that are infringed by (i) modifications
+to the Software made by you or any third party or (ii) the Software in
+combination with any software or other technology.
+
+The license granted hereunder will terminate, automatically and without notice,
+if you (or any of your subsidiaries, corporate affiliates or agents) initiate
+directly or indirectly, or take a direct financial interest in, any Patent
+Assertion: (i) against Facebook or any of its subsidiaries or corporate
+affiliates, (ii) against any party if such Patent Assertion arises in whole or
+in part from any software, technology, product or service of Facebook or any of
+its subsidiaries or corporate affiliates, or (iii) against any party relating
+to the Software. Notwithstanding the foregoing, if Facebook or any of its
+subsidiaries or corporate affiliates files a lawsuit alleging patent
+infringement against you in the first instance, and you respond by filing a
+patent infringement counterclaim in that lawsuit against that party that is
+unrelated to the Software, the license granted hereunder will not terminate
+under section (i) of this paragraph due to such counterclaim.
+
+A "Necessary Claim" is a claim of a patent owned by Facebook that is
+necessarily infringed by the Software standing alone.
+
+A "Patent Assertion" is any lawsuit or other action alleging direct, indirect,
+or contributory infringement or inducement to infringe any patent, including a
+cross-claim or counterclaim.
@@ -1,2 +1,264 @@
-# ODE-Transformer
-This is a code repository for the ACL 2022 paper "ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation", which redesigns the Transformer architecture from the ODE perspective via using high-order ODE solvers to enhance the residual connections.
+# ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
+This code is based on Fairseq v0.6.2
+## Requirements and Installation
+- PyTorch version >= 1.2.0
+- python version >= 3.6  
+
+## Prepare Data
+### For Machine Translation 
+
+#### 1、Download [WMT14' En-De](https://drive.google.com/uc?export=download&id=0B_bZck-ksdkpM25jRUN2X2UxMm8) and [WMT14' En-Fr](https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2fr.sh)
+
+#### 2、Preprocessed dataset
+
+### For Abstractive Summarization Task
+
+#### 1、Download [CNN dataset](https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfTHk4NFg2SndKcjQ) and [Daily Mail dataset](https://drive.google.com/uc?export=download&id=0BwmD_VLjROrfM1BxdkxVaTY2bWs)
+
+
+#### 2、Generate binary dataset ```data-bin/cnndm```
+
+```bash preprocess_cnndaily_bin.sh path/to/cnndm_raw_data```
+
+### For Grammatical Error Correction Task  
+
+  #### 1、Download [FCE v2.1 dataset](https://www.cl.cam.ac.uk/research/nl/bea2019st/data/fce_v2.1.bea19.tar.gz)、[Lang-8 Corpus of Learner English dataset](https://docs.google.com/forms/d/e/1FAIpQLSflRX3h5QYxegivjHN7SJ194OxZ4XN_7Rt0cNpR2YbmNV-7Ag/viewform)、[NUCLE dataset](https://sterling8.d2.comp.nus.edu.sg/nucle_download/nucle.php)、[W&I+LOCNESS v2.1 dataset](https://www.cl.cam.ac.uk/research/nl/bea2019st/data/wi+locness_v2.1.bea19.tar.gz)
+
+  #### 2、Get CONLL14 test set  
+
+  ```bash prepare_conll14_test_data.sh```
+
+  #### 3、Preprocessed dataset  
+
+  ```bash preprocess_gec.sh```
+
+  #### 4、Generate binary dataset  ```data-bin/BEA```
+
+  ```bash preprocess_gec_bin.sh```
+
+## Train
+### For WMT'14 En-De Task  
+
+#### Train a RK2-block  $\textrm{learnable}\, \gamma_i$ model (6-layer Big model)
+
+```bash train_wmt_en_de.sh```
+
+```
+python3 -u train.py data-bin/$data_dir
+  --distributed-world-size 8 -s src -t tgt
+  --arch transformer_ode_t2t_wmt_en_de_big
+  --share-all-embeddings
+  --optimizer adam --clip-norm 0.0
+  --adam-betas '(0.9, 0.997)'
+  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 16000
+  --lr 0.002 --min-lr 1e-09
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.1
+  --max-tokens 4096
+  --update-freq 4
+  --max-epoch 20
+  --dropout 0.3 --attention-dropout 0.1 -- relu-dropout 0.1
+  --no-progress-bar
+  --log-interval 100
+  --ddp-backend no_c10d
+  --seed 1 
+  --save-dir $save_dir
+  --keep-last-epochs 10
+```
+
+
+
+### For WMT'14 En-Fr Task
+
+#### Train a RK2-block  $\textrm{learnable}\, \gamma_i$ model
+
+```bash train_wmt_en_fr.sh```
+
+```
+python3 -u train.py data-bin/$data_dir
+  --distributed-world-size 8 -s src -t tgt
+  --arch transformer_ode_t2t_wmt_en_de_big
+  --share-all-embeddings
+  --optimizer adam --clip-norm 0.0
+  --adam-betas '(0.9, 0.997)'
+  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 16000
+  --lr 0.002 --min-lr 1e-09
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.1
+  --max-tokens 4096
+  --update-freq 8
+  --max-epoch 20
+  --dropout 0.1 --attention-dropout 0.1 -- relu-dropout 0.1
+  --no-progress-bar
+  --log-interval 100
+  --ddp-backend no_c10d
+  --seed 1 
+  --save-dir $save_dir
+  --keep-last-epochs 10
+```
+
+
+
+### For Abstractive Summarization Task  
+
+#### Train a RK2-block  $\textrm{learnable}\, \gamma_i$ model
+
+```bash train_cnn_daily.sh```
+
+```
+python3 -u train.py data-bin/$data_dir
+  --distributed-world-size 8 -s src -t tgt
+  --arch transformer_ode_t2t_wmt_en_de
+  --share-all-embeddings
+  --optimizer adam --clip-norm 0.0
+  --adam-betas '(0.9, 0.997)'
+  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 8000
+  --lr 0.002 --min-lr 1e-09
+  --weight-decay 0.0001
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.1
+  --max-tokens 4096
+  --update-freq 4
+  --max-epoch 20
+  --dropout 0.1 --attention-dropout 0.1 -- relu-dropout 0.1
+  --truncate-source  --skip-invalid-size-inputs-valid-test --max-source-positions 500
+  --no-progress-bar
+  --log-interval 100
+  --ddp-backend no_c10d
+  --seed 1 
+  --save-dir $save_dir
+  --keep-last-epochs 10
+```
+
+### For Grammatical Error Correction Task  
+
+#### Train a RK2-block $\textrm{learnable}\, \gamma_i$ model 
+```bash train_gec.sh```
+
+```
+python3 -u train.py data-bin/$data_dir
+  --distributed-world-size 8 -s src -t tgt
+  --arch transformer_ode_t2t_wmt_en_de
+  --share-all-embeddings
+  --optimizer adam --clip-norm 0.0
+  --adam-betas '(0.9, 0.98)'
+  --lr-scheduler inverse_sqrt --warmup-init-lr 1e-07 --warmup-updates 4000
+  --lr 0.0015 --min-lr 1e-09
+  --weight-decay 0.0001
+  --criterion label_smoothed_cross_entropy --label-smoothing 0.1
+  --max-tokens 4096
+  --update-freq 2
+  --max-epoch 55
+  --dropout 0.2 --attention-dropout 0.1 -- relu-dropout 0.1
+  --no-progress-bar
+  --log-interval 100
+  --ddp-backend no_c10d
+  --seed 1 
+  --save-dir $save_dir
+  --keep-last-epochs 10
+  --tensorboard-logdir $save_dir"
+```
+
+## Evaluation
+### For WMT'14 En-De Task
+
+We measure the performance through multi-bleu and sacrebleu
+
+```
+python3 generate.py \
+data-bin/wmt-en2de \
+--path $model_dir/$checkpoint \
+--gen-subset test \
+--batch-size 64 \
+--beam 4 \
+--lenpen 0.6 \
+--output hypo.txt \
+--quiet \
+--remove-bpe
+```
+
+
+
+### For WMT'14 En-Fr Task
+
+We measure the performance through multi-bleu and sacrebleu
+
+```
+python3 generate.py \
+data-bin/wmt-en2fr \
+--path $model_dir/$checkpoint \
+--gen-subset test \
+--batch-size 64 \
+--beam 4 \
+--lenpen 0.6 \
+--output hypo.txt \
+--quiet \
+--remove-bpe
+```
+
+
+
+### For Abstractive Summarization Task
+
+We use pyrouge as the scoring script. 
+
+```
+python3 generate.py \
+data-bin/$data_dir \
+--path $model_dir/$checkpoint \
+--gen-subset test \
+--truncate-source \
+--batch-size 32 \
+--lenpen 2.0 \
+--min-len 55 \
+--max-len-b 140 \
+--max-source-positions 500 \
+--beam 4 \
+--no-repeat-ngram-size 3 \
+--remove-bpe
+
+python3 get_rouge.py --decodes_filename cnndm.test.target.tok --targets_filename $model_dir/hypo.sorted.tok
+```
+
+### For Grammatical Error Correction Task
+We use m2scorer as the scoring script. 
+
+```
+python3 generate.py \
+data-bin/$data_dir \
+--path $model_dir/$checkpoint \
+--gen-subset test \
+--batch-size 64 \
+--beam 4 \
+--lenpen 2.0 \
+--output hypo.txt \
+--quiet \
+--remove-bpe
+
+path/to/m2scorer path/to/model_output path/to/conll14st-test.m2
+```
+
+
+## Results
+### Machine Translation
+
+| Model                            | Layer | En-De | En-Fr |
+| -------------------------------- | ----- | ----- | ----- |
+| Residual-block (baseline)        | 6-6   | 29.21 | 42.89 |
+| RK2-block (learnable $\gamma_i$) | 6-6   | 30.53 | 43.59 |
+| Residual-block (baseline)        | 12-6  | 29.91 | 43.22 |
+| RK2-block (learnable $\gamma_i$) | 12-6  | 30.76 | 44.11 |
+
+### Abstractive Summarization Task
+
+| Model                             | RG-1 | RG-2 | RG-L |
+| --------------------------------- | ---- | ---- | ---- |
+| Residual-block                    | 40.47 | 17.73 | 37.29 |
+| RK2-block ((learnable $\gamma_i$) | 41.58 | 18.57 | 38.41 |
+| RK4-block                         | 41.83 | 18.84 | 38.68 |
+
+### Grammatical Error Correction Task
+
+|   Model  |  Prec.  |  Recall | F_0.5 |
+|  ----  |  ----  | ----  | ---- |
+| Residual-block  | 67.97 | 32.17 |55.61 |
+| RK2-block ((learnable $\gamma_i$) | 68.21 | 35.30 |57.49 |
+| RK4-block | 66.20  | 38.13 |57.71 |
+
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = python -msphinx
+SPHINXPROJ    = fairseq
+SOURCEDIR     = .
+BUILDDIR      = _build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
@@ -0,0 +1,9 @@
+.wy-table-responsive table td kbd {
+    white-space: nowrap;
+}
+.wy-table-responsive table td {
+    white-space: normal !important;
+}
+.wy-table-responsive {
+    overflow: visible !important;
+}