This repository contains my scripts and some visualizations for my bachelor thesis "Medical concept PROBLEM: Polarity, Modality and Temporal Relations".
PROBLEM is considered one of the key medical concepts as it plays a vital role in research for medical information of a patient. In my thesis, different experiments to extract 3 aspects of this concept were conducted based on the guidelines and annotated data from the i2b2 2010 and i2b2 2012 (Uzuner et al., 2011, Sun et al., 2013) datasets.
This thesis is to show the effects of different techniques on clinical data. This includes Bi-RNN-based models (Bi-GRU, Bi-LSTM), contextual embeddings (BERT, FLAIR), domain-specific data the embeddings were pre-trained on (ClinicalBERT, HunFLAIR), and fine-tuning (Lin et al., 2018).
Experiments were conducted as follows:
- Bi-GRU-models were used for Polarity and Modality Tasks
- Bi-LSTM-model was used for Temporal Relations Task in terms of entities within sentences (sentence boundaries were observed)
- A "universal model" with the technique of fine-tuning different pretrained BERT-models were for Temporal Relations task with data including cases out-of-sentences (in short relations of entities in the whole text if there are)
Also, to observe the importance of entity markup (how the entities are distinguished when fed into pre-trained Language Model), based on research from Lin et al., 2019, XML and non-XML markers were under experiments.
Each PROBLEM entity has 2 values, polarity (NEG, POS) and modality (FACTUAL, NONFACTUAL).
All temporal relations (BEFORE, AFTER, OVERLAP) towards a pair of < PROBLEM, OTHER EVENT > inter- and intrasententially are reported.
.
├── scripts - scripts to run the whole experiments
├── __init__.py
│ ├── eval
│ │ ├── __init__.py
│ │ └── plotters.py
│ ├── main.py
│ ├── nn
│ │ ├── __init__.py
│ │ ├── config.py
│ │ ├── metrics.py
│ │ ├── models.py
│ │ └── processors.py
│ ├── prep
│ │ ├── __init__.py
│ │ ├── elements.py
│ │ ├── embeddings.py
│ │ └── reformatter.py
│ ├── requirements.txt
│ └── utils
│ ├── __init__.py
│ ├── commanders.py
│ ├── loaders.py
│ └── savers.py
└── visualization - some interesting visualizations during analysis models in Temporal Relation Task
- Clone the repo and redirect to
scripts
- Install
requirements.txt
pip install -r requirements.txt
-
Command Line Execution:
Run the script with the desired command
main.py
.python main.py <command> [<args>]
Replace
<command>
with one of the available subcommands (prep
,mod
, orplot
). Also, provide any required or optional arguments specific to the chosen subcommand. -
Available Commands:
prep
: Run preprocessing.mod
: Run the model.plot
: Run the plotter.
-
Subcommand-specific Arguments:
-
For each subcommand, there are specific arguments you need to provide.
-
Preprocessing (
prep
):python main.py prep <input_file/directory> -o <output_folder> --dtype <train/test> --compress
More options run
python main.py prep -h
for help -
Modeling (
mod
):python main.py mod <train_file> <test_file> --labels 0 1 2 --train_prop 0.9 --model_path bert-base-uncased
More options run
python main.py mod -h
for help -
Plotting (
plot
):python main.py plot <inferred_json_file> --output <output_folder> --ptype cf_matrix --noerror --labels 0 1 2
More options run
python main.py plot -h
for help
-
-
-
Review Help Information:
-
If you're unsure about the available commands and their arguments, you can review the help information by running the script with the
-h
or--help
option:python main.py -h
This will provide information on the available commands and their respective arguments.
-
-
Configurations:
- The script relies on configurations (
PrepConfig
,ModelConfig
,EmbeddingConfig
,PlotConfig
) to define the settings for each phase. These configurations are created based on the provided command-line arguments.
- The script relies on configurations (
-
Check Results:
- Depending on the command, the script may generate outputs such as preprocessed data, trained models, or plots. Check the specified output folders for the results.
Remember to replace placeholder values such as <input_file/directory>
, <output_folder>
, <train_file>
, <test_file>
, and others with your actual file paths or values.
- Chen Lin, Timothy Miller, Dmitriy Dligach, Steven Bethard, and Guergana Savova. A BERT-based Universal Model for Both Within- and Cross-sentence Clinical Temporal Relation Extraction. June 2019. URL
- Chen Lin, Timothy Miller, Dmitriy Dligach, Farig Sadeque, Steven Bethard, and Guergana Savova. A BERT-based One-Pass Multi-Task Model for Clinical Temporal Relation Extraction. July 2020. URL