Skip to content

Shaumik-Ashraf/BART-MIMIC-CXR

Repository files navigation

BART-MIMIC-CXR

Use BART word embedder to do the abstractive summarization NLP task on radiology reports from MIMIC-CXR v2.0.0.

Use:

  • clone this repository
  • in this repository, mkdir data
  • from MIMIC-CXR download cxr-study-list.csv.gz and place it in the data folder
  • also from MIMIC-CXR, download mimic-cxr-reports.zip, unzip it, and place it in the data folder
  • run python preprocessing, which generates train.csv, validate.csv, and test.csv in the data folder
  • use the notebook/*.ipynb files to run the various BART models
    • the baseline model creates a summaries.csv file in data/ that contains both ground-truth and generated summaries
    • the fine-tuned and fune-tuned with truncation models create a .txt file, that must be converted to a .csv file by running python merge_txt_csv.py
  • run python rouge_score or python word_count to get the ROUGE scores and word count of the summaries respectively.

Note: You may need to go into the python file to set the proper path or filename, which is all configurable from variables at the top of the file.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published