This repository contains code for the paper Reverse-engineering NLI: A study of the meta-inferential properties of Natural Language Inference.
By meta-inferential, we mean properties like the transitivity of entailment:
Given SNLI examples
$(P_1, H_1, E)$ and$(P_2, H_2, E$ ) both with label$E$ forentailment, if$P_2=H_1$ then we can conclude$(P_1, H_2, E)$ is also a valid example.
More generally, we are interested in what can be inferred from NLI examples with any overlap between the sentences in the premises and hypothesis.
One reason this is interesting is that the valid meta-inferential patterns depend on how the NLI labels of entailment, contradiction and neutral
are interpreted.
By observing which meta-inferential patterns models trained on a particular NLI dataset validate,
we can reverse-engineer which reading of inference labels the model learned from the data.
We exploit two (actually 3) sources of overlapping sentences to test meta-inferential patterns for models trained on SNLI:
- Each SNLI premise is used to construct up to three different examples (one for each label). Given examples
$(P, H_1, L_1)$ and$(P, H_2, L_2)$ we can consider how the model deals with$(H_1, H_2)$ and$(H_2, H_1)$ . - We also use LLMs to generate new examples. Given
$(P, H, L)$ , we ask the model to generate an example$(H, H', L')$ for each label and test model predictions for$(P, H'$ ) and$(H', P)$ . - Given an example
$(P, H, L)$ we can get model predictios for$(H, P)$ .
... more details in the paper!
The main data used is SNLI, which can be downloaded here. The code expects to find it at data/snli_1.0.
data.py- data loader for SNIL and exteneded SNLI datasetsfind_prompt_examples.py- searches SNLI for examples with "perfect" agreement among re-annotators & save them for prompting LLMgenerate_nli_items.py- use LLM to generate new items based on SNLI hypothesesinfer_nli_items.py- create a dataset of new items according to the schemes described above
We test a vanilla BERT model with a 3-way classifier off the pooler token and RoBERTa+Self Explaining, which is a recently SOTA model on SNLI.
The self-explaining code is adapted from the paper repository found here.
nli.py- train the NLI modelsevaluate.py- produce model predictions for SNLI, generated, and inferred test sets.
analysis.py- create the tables found in the paper