Skip to content

Commit

Permalink
initial push
Browse files Browse the repository at this point in the history
  • Loading branch information
Yondijr committed Jan 11, 2021
1 parent 63cdb36 commit ca356e4
Show file tree
Hide file tree
Showing 187 changed files with 70,343 additions and 0 deletions.
741 changes: 741 additions & 0 deletions Build_datasets.ipynb

Large diffs are not rendered by default.

476 changes: 476 additions & 0 deletions Combine both methods.ipynb

Large diffs are not rendered by default.

645 changes: 645 additions & 0 deletions Evaluate Models on Lambada.ipynb

Large diffs are not rendered by default.

1,486 changes: 1,486 additions & 0 deletions Evaluate_grammar.ipynb

Large diffs are not rendered by default.

374 changes: 374 additions & 0 deletions Evaluate_translation.ipynb

Large diffs are not rendered by default.

140 changes: 140 additions & 0 deletions Get examples.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,140 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Get corrected samples from the translation model\n",
"- Examples of mistakes that are made by the translation model when translating\n",
"- Usage: OBJ includes all relevant data \n",
"- .upgrade_example(Rule,n) gives you examples of a specific rule applied succesfully\n",
"- .copy example(Rule,n)gives you examples of a copied mistakes\n",
"- .get_mistake_types() gives a complete list of all mistake types of the translations and the original\n",
"\n",
"\n",
"Results: All works reasonably well. However the with some small mistakes"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [],
"source": [
"from transformers import GPT2Tokenizer\n",
"from utility import *"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"23 were deleted since they had more than99 mistakes\n",
"42004 sentences had no grammar mistakes.\n"
]
}
],
"source": [
"OBJ = filter_examples()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"TRANSLATIONS:dict_keys(['EN_UNPAIRED_BRACKETS', 'MORFOLOGIK_RULE_EN_US', 'CD_NN', 'HE_VERB_AGR', 'A_PLURAL', 'SENTENCE_FRAGMENT', 'ENGLISH_WORD_REPEAT_BEGINNING_RULE', 'MANY_NN', 'I_LOWERCASE', 'ITS_JJ_NNSNN', 'I_AM', 'ALLY_ALLAY', 'NO_SPACE_CLOSING_QUOTE', 'UPPERCASE_SENTENCE_START', 'COMMA_PARENTHESIS_WHITESPACE', 'ENGLISH_WORD_REPEAT_RULE', 'SENTENCE_WHITESPACE', 'A_INFINITVE', 'THIS_NNS', 'EN_A_VS_AN', 'A_LOT_OF_NN', 'NON3PRS_VERB', 'I_A', 'THE_SUPERLATIVE', 'IT_SELF', 'GENERAL_XX', 'PROGRESSIVE_VERBS', 'AS_ADJ_AS', 'POSSESSIVE_APOSTROPHE', 'IT_VBZ', 'FEWER_LESS'])\n",
"Original:dict_keys(['UPPERCASE_SENTENCE_START', 'EN_UNPAIRED_BRACKETS', 'EN_QUOTES', 'MORFOLOGIK_RULE_EN_US', 'COMMA_PARENTHESIS_WHITESPACE', 'CD_NN', 'WHITESPACE_RULE', 'DOUBLE_PUNCTUATION', 'HE_VERB_AGR', 'A_PLURAL', 'SENTENCE_FRAGMENT', 'ENGLISH_WORD_REPEAT_BEGINNING_RULE', 'SENTENCE_WHITESPACE', 'CANT', 'EN_CONTRACTION_SPELLING', 'I_LOWERCASE', 'ENGLISH_WORD_REPEAT_RULE', 'ITS_JJ_NNSNN', 'PRP_PAST_PART', 'AM_I', 'I_AM', 'EN_A_VS_AN', 'SO_AS_TO', 'EN_COMPOUNDS', 'IT_IS', 'ADVISE_VBG', 'SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA', 'COMP_THAN', 'A_INFINITVE', 'NON3PRS_VERB', 'THIS_NNS', 'PHRASE_REPETITION', 'I_A', 'THE_SUPERLATIVE', 'MANY_NN', 'IT_SELF', 'ALL_OF_THE', 'GENERAL_XX', 'PROGRESSIVE_VERBS', 'AS_ADJ_AS', 'TRY_AND', 'DT_PRP', 'POSSESSIVE_APOSTROPHE', 'IT_VBZ', 'ONES', 'DT_DT', 'WHETHER', 'SAY_TELL', 'FEWER_LESS', 'ABOUT_ITS_NN', 'ONE_OF_THE_ONLY', 'MUCH_COUNTABLE', 'THESE_ONES'])\n"
]
}
],
"source": [
"OBJ.get_mistake_types()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mistake type: MORFOLOGIK_RULE_EN_US\n",
"Original:think about one night being on a vacation, the readers might realize this is quite \"almost \"sonics\", and yet there's enough time!\n",
"Translation: Think about one night being on a vacation, the readers might realize this is quite “almost “sonic”, and yet there's enough time!<|endoftext|>\n",
"1\n"
]
}
],
"source": [
"OBJ.upgrade_example('MORFOLOGIK_RULE_EN_US',1)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Mistake type: A_PLURAL\n",
"Original:\"Others emerge in other ways as part of more elaborate hacking scrambles or investigation A Plays -- cyber information, the way a whole hacker crew or company sort of knows for certain, because, you know, they suggest they probably need something to provide it to the government.\"\n",
"\n",
"The botnet technology is also highly invasive and unpredictable, even by U.S.\n",
"Translation: “Others emerge in other ways as part of more elaborate hacking scrambles or investigation A Plays -- cuber information, the way a whole hacker crew or company sort of knows for certain, because, you know, they suggest they probably need something to provide it to the government.”\n",
"\n",
"The bonnet technology is also highly invasive and unpredictable, even by U.S.<|endoftext|>\n",
"Correct:“Others emerge in other ways as part of more elaborate hacking scrambles or investigation A play -- cuber information, the way a whole hacker crew or company sort of knows for certain, because, you know, they suggest they probably need something to provide it to the government.”\n",
"\n",
"The bonnet technology is also highly invasive and unpredictable, even by U.S.\n",
"53\n"
]
}
],
"source": [
"OBJ.copy_example('A_PLURAL',1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.9"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
49 changes: 49 additions & 0 deletions examples.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Mistake type: ENGLISH_WORD_REPEAT_RULE
Original:Videos show squads of senior SWAT team members often breaking at any moment to chase chase down the active shooter.

Translation: Videos show squads of senior SWAT team members often breaking at any moment to chase down the active shooter.<|endoftext|>
128

Mistake type: UPPERCASE_SENTENCE_START
Original:is this what Revival has talked about first?

Translation: Is this what Revival has talked about first?<|endoftext|>

Original:citizen, has welcomed trial for Benghazi burn inmate Dacwan Heqyar, the aging executioner, worried the government-set model of direct punishment for terrorists could dissuade most of his followers from further involvement and make him reluctant to use force against others.
Translation: Citizen, has welcomed trial for Benghazi burn inmate Nathan Hekmatyar, the aging executioner, worried the government-set model of direct punishment for terrorists could dissuade most of his followers from further involvement and make him reluctant to use force against others.<|endoftext|>

Mistake type: EN_A_VS_AN
Original:

Gabhal, an strong contender.
Translation:

Gabriel, a strong contender.<|endoftext|>
339

Original:Could it be that he was moving forward with a boycott of the clerical structures that are acting as the main instrument in an economic hashing out timeframe or 2022 seeing for Sringla temporary signedand a envoy within the next few days (all while calming down the imperialist Moynihanlander who is planning to draw the IBWI Chief Su, the Mother Church.) If Lee Sukhaile was issuing Sools Janata listsheis [2st Class Orders of the U.N., or Sinn Liturgy Tagn., which is the Tidanga Priesthood mentioned by ACLU.
Translation: Could it be that he was moving forward with a boycott of the clerical structures that are acting as the main instrument in an economic hashing out time frame or 2022 seeing for Single temporary signed and an envoy within the next few days (all while calming down the imperialist Moynihanlander who is planning to draw the IBWI Chief So, the Mother Church.) If Lee Sukhaile was issuing Tools Jana ta list shears [2st Class Orders of the U.N., or Sign Liturgy Tag., which is the Tidal Priesthood mentioned by ACLU.<|endoftext|>
214

Mistake type: SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA /ALL upgraded
Original:Thus more growth is experienced in human females with a delivery 7–15 days later as compared to male sheep's acceptance and this is thought to be only seen in sheep four weeks post KDR since then.
Translation: Thus, more growth is experienced in human females with a delivery 7–15 days later than compared to male sheep's acceptance and this is thought to be only seen in sheep four weeks post KDR since then.<|endoftext|>
296
Original:Also because females are more likely to care for other members of their group, sterile (Percent Prem Length Head bearing) male sheep are at higher risk for poor health and impairment, and also likely to have better amenities [ 19 ].

Translation: Also, because females are more likely to care for other members of their group, sterile (Percent Poem Length Head bearing) male sheep are at higher risk for poor health and impairment, and also likely to have better amenities [19].<|endoftext|>

Mistake type: CD_NN
Original:We recommend 10 ring for long-term comfort and SAFE fit.
Translation: We recommend 10 rings for long-term comfort and SAFE fit.<|endoftext|>

BAD EXAMPLES:

Mistake type: MUCH_COUNTABLE /ALL upgraded
Original:How much does it cost for $20 or $35 for one year?
Translation: ' How many does it cost for $20 or $35 for one year?<|endoftext|>'

Mistake type: SAY_TELL /ALL upgraded
Original:McGill, who says her family did not have a law enforcement incident report as of 5 p.m., is dubious Francis publicly expressed remorse.

' McGill, who tells her family did not have a law enforcement incident report as of 5 p.m., is dubious Francis publicly expressed remorse.<|endoftext|>'
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added generated/3layer/train/3Layer_dream_1.p
Binary file not shown.
Binary file added generated/3layer/train/3Layer_dream_2.p
Binary file not shown.
Binary file added generated/3layer/train/3Layer_dream_3.p
Binary file not shown.
Binary file added generated/3layer/train/3Layer_dream_4.p
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added generated/6layer/train/6_layer_1.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_10.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_11.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_12.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_13.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_14.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_15.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_16.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_17.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_18.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_19.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_2.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_20.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_3.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_4.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_5.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_6.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_7.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_8.p
Binary file not shown.
Binary file added generated/6layer/train/6_layer_9.p
Binary file not shown.
Loading

0 comments on commit ca356e4

Please sign in to comment.