-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
143 lines (81 loc) · 3.04 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
#
# Juliane Bruck 8297746
#
ASSIGNMENT #2
==============
BEFORE RUNING:
make sure you have python 3.11.5 installed as well as the following dependencies:
pip install evaluate
pip uninstall urllib3
pip install urllib3
pip install transformers
pip install --upgrade transformers
(you need version 4.39.1 of transformers library see data config function in model_bert.py)
# Required packages
datasets>=2.12.0
pandas>=1.5.3
scikit-learn
transformers>=4.38.2
torch>=2.1.2
accelerate>=0.28.0
# Additional packages
numpy>=1.24.2
scipy>=1.11.3
==========================
This assignment contains 5 python files:
model_bert.py
model_gpt.py
svm_model.py
format_checker.py
scorer.py
No need to worry about the format_checker.py file as it is being called by the scorer.py file.
In order to run the code properly, you sequentially run the following files:
SVM Model
--------
At the command prompt, type:
> python ./model_svm.py
Uses the following data:
dev_data = load_data(
'.//data//SubtaskA//medium//subtaskA_dev_monolingual.jsonl')
train_data = load_data(
'.//data//SubtaskA//medium//subtaskA_train_monolingual.jsonl')
test_data = load_data('.//data//SubtaskA//medium//subtaskA_monolingual.jsonl')
output:
Results_model_svm.jsonl
For results, at the command prompt, type:
> python scorer.py --gold_file_path .\data\SubtaskA\subtaskA_monolingual_gold.jsonl --pred_file_path .\Results_model_svm.jsonl
INFO : Prediction file format is correct
INFO : macro-F1=0.60538 micro-F1=0.64474 accuracy=0.64474
BERT Model
-----------
At the command prompt, type:
> python ./model_bert.py
Uses the following data:
dev_data = load_data(
'.//data//SubtaskA//medium//subtaskA_dev_monolingual.jsonl')
train_data = load_data(
'.//data//SubtaskA//medium//subtaskA_train_monolingual.jsonl')
test_data = load_data('.//data//SubtaskA//medium//subtaskA_monolingual.jsonl')
output:
Results_model_bert.jsonl
For results, at the command prompt, type:
> python scorer.py --gold_file_path .\data\SubtaskA\subtaskA_monolingual_gold.jsonl --pred_file_path .\Results_model_bert.jsonl
>>
INFO : Prediction file format is correct
INFO : macro-F1=0.74996 micro-F1=0.75177 accuracy=0.75177
GPT-2 Model
-----------
> python ./model_gpt.py
Uses the following data:
dev_data = load_data(
'.//data//SubtaskA//medium//subtaskA_dev_monolingual.jsonl')
train_data = load_data(
'.//data//SubtaskA//medium//subtaskA_train_monolingual.jsonl')
test_data = load_data('.//data//SubtaskA//medium//subtaskA_monolingual.jsonl')
output:
Results_model_gpt.jsonl
For results, at the command prompt, type:
> python scorer.py --gold_file_path .\data\SubtaskA\subtaskA_monolingual_gold.jsonl --pred_file_path .\Results_model_gpt.jsonl
>>
INFO : Prediction file format is correct
INFO : macro-F1=0.36384 micro-F1=0.40789 accuracy=0.40789