The following are the instructions to strictly reproduce the results cited in this paper.
The necessary packages can be installed as an Anaconda environment from the environment.yml
file.
First, change DATA_PATH
in config.py
to the directory that you would like to store your data. Then,
To download the Google Relation Extraction Corpus (GREC):
python get_data.py
To build a corpus of claims, please consult the documentation for the Google FactCheck API, and add your data to the JSON_DIR
as a JSON in the same format as the GREC.
To parse, first acquire a FRED API key, and past it in a text file at DATA_PATH/api_keys/fred_key_lmtd
. Then:
python generate_fred_rdfs.py
This should produce a directory of RDF files, one for each JSON file.
The following script identifies the nodes that correspond to the subject and object for each snippet, and store them in a dataframe, to be used later when calculating the shortest path.
python identify_terminal_nodes.py
python build_corpus_graph.py
Be sure to note the corpus graph file tag, i.e. corpus_graph-<tag>.pkl
python generate_node_embeddings.py -tag <corpus_graph_file_tag>
To build the features for the relation classification training, run:
python build_shortest_path_df.py
Currently, training is not implemented with cross-validation, so this step is necessary (although it is a TODO to add cross-validation training).
python test_train_splits.py
The experimental tag needs to be provided. This is the tag attached to the split files, i.e. X_train-<exp-tag>.pkl
python train.py --model-name "dnn_wide" -itag <exp_tag>
To test the model on the selected ClaimReview claims, run the code found in classify_claimreview.ipynb
or export this code to a python script. Be sure to change the input filenames found at the top of the notebook.
To fact-check the relevant claims from ClaimReview, first install Knowledge Stream and download and extract the data to config.KS_DIR
.
Run the code found in prep_claims.ipynb
, or export and run as a python script. Again, be sure to check filenames found throughout script.
If using my Knowledge Stream fork for Python3, you will be prompted upon running kstream
for the data directory. Enter <config.KS_DIR>/data
kstream -m stream -d datasets/claimreview/claims.csv -o <config.KS_OUTPUT>
kstream -m klinker -d datasets/claimreview/claims.csv -o <config.KS_OUTPUT>
kstream -m relklinker -d datasets/claimreview/claims.csv -o <config.KS_OUTPUT>
Run the code found in evaluate_kl.ipynb
, or export and run as a python script. Again, be sure to check filenames throughout script.