Skip to content

Commit

Permalink
+ readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Yondijr committed Jan 11, 2021
1 parent b076347 commit a4e1f25
Show file tree
Hide file tree
Showing 3 changed files with 25 additions and 18 deletions.
35 changes: 21 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,24 @@
# On-grammar-improvements-of-GPT-2-generation

This repository includes the complete code for the paper XXXX. The purpose of this repository is to make experiments reproducable and give advanced insights into the experiments that were conducted.

<img src="architecture.png" alt="drawing" width="200"/>
This repository includes the complete code for the paper XXXX. The purpose of this repository is to make the conducted experiments reproducable and give additional insights.
<img src="placeholder.png" alt="drawing" width="200"/>

## Getting Started

This repository includes the following things:

- Documentation of the Dataset building process
- Finetuning, Grammar Correction and Generation scripts that were used during this research project
- Finetuning, Grammar Correction, and Generation scripts that were used during this research project
- Documentation of the complete evaluation process
- A mountain of generated samples that was used during evaluation
- A mountain of generated samples that was used for evaluation
- Documentation of the model combination evaluation
- Documentation of generating samples referenced in our paper
- Documentation of example generation for our paper.


## Build on

* [HuggingFace -Transformers](https://github.com/huggingface/transformers)
* [language-check](https://pypi.org/project/language-check/)

### Installation

Expand All @@ -39,20 +39,27 @@ You should be good to go.
- The model checkpoints are needed to retrace everything. If needed they are available by contacting Gideon-Stein.


## Generation script usage:
## Finetuning scripts usage:
```
python transgenerator_translation.py --model_path=../trained_models/the_dream_final_3/checkpoint-257616/pytorch_model.bin --text_path ../build_data/EOS_new_filter_700.txt --n_data 1000 --save_path the_dream_filter_700_3_1.p
python run_generation_edited.py --model_name_or_path=model_save/only_correctedAll/pytorch_model.bin --save_name oootf
```
Parameters can be added and changed accordingly to the script.
python run_lm_finetuning_frozen_3.py --output_dir=model_save/test --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=train.txt
python run_generation_edited.py --output_dir=model_save/test --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=data_files/train.txt
## Finetuning script usage:
```
python run_lm_finetuning_frozen_3.py --output_dir=model_save/the_dream_finetune_3 --model_type=gpt2 --model_name_or_path=gpt2 --do_train --train_data_file=classic_finetune_train.txt --per_gpu_train_batch_size 1 --gradient_accumulation_steps 4 --save_steps 41599 --save_total_limit 20 --num_train_epochs 20
python run_generation_edited.py --model_name_or_path=model_save/the_dream_classic_finetune_2/first/checkpoint-41599/pytorch_model.bin --save_name generate_cf_1 --max_length 1024 --n_sentences 100

## Generation scripts usage:
```
python transgenerator_translation.py --model_path=../trained_models/test/pytorch_model.bin --text_path ../build_data/test.txt --n_data 1000 --save_path test.p
python run_generation_edited.py --model_name_or_path=model_save/test/pytorch_model.bin --save_name test
```
Parameters can be added and changed accordingly to the script.


## Grammar correction scripts usage:
```
python grammar_parser_json.py --path data/small-117M.train.jsonl --save_replace True --name test
python grammar_parser_txt.py --path base.txt --save_replace True --name test
```


## Authors
Expand Down
6 changes: 3 additions & 3 deletions scripts/grammar_parser_txt.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def main():
parser = argparse.ArgumentParser()
parser.add_argument("--path", default="test.txt", type=str)
parser.add_argument("--save_replace", default=False, type=bool)
parser.add_argument("--splitnr", default="None", type=str)
parser.add_argument("--name", default="None", type=str)
args = parser.parse_args()

corpus = load(args.path)
Expand All @@ -64,8 +64,8 @@ def main():
autocorrect[x] = autocorrect[x] +"<|endoftext|>"


pickle.dump(autocorrect,open("EOS_corrected_v2_" args.nr + ".p","wb"))
pickle.dump(stats,open("EOS_stats_v2_" + args.nr + ".p","wb"))
pickle.dump(autocorrect,open("EOS_corrected_v2_" args.name + ".p","wb"))
pickle.dump(stats,open("EOS_stats_v2_" + args.name + ".p","wb"))



Expand Down
2 changes: 1 addition & 1 deletion scripts/run_lm_finetuning_frozen_3.py
Original file line number Diff line number Diff line change
Expand Up @@ -397,7 +397,7 @@ def main():
parser.add_argument("--warmup_steps", default=0, type=int,
help="Linear warmup over warmup_steps.")

parser.add_argument('--logging_steps', type=int, default=50,
parser.add_argument('--logging_steps', type=int, default=5000,
help="Log every X updates steps.")
parser.add_argument('--save_steps', type=int, default=50,
help="Save checkpoint every X updates steps.")
Expand Down

0 comments on commit a4e1f25

Please sign in to comment.