-
xray_enc_dec_train.py used the train the model. Outputs are written to xray_model_lr1-3.pth by default.
-
xray_report_gen.py reads the xray_model_lr1-3.pth and generates reports starting with the token and picking the word with the highest probability. It creates reports_preds_generated.csv and preds_only_generated.csv files. reports_preds_generated.csv includes ground truth report and generated report. preds_only_generated.csv includes only generated report for CheXpert Labeler use.
-
compare_labels.py first runs CheXpert Labeler and compared the ground truth labels with the labels from generated reports.
-
dataset.py generates ground truth labels, scaled/normalized images and labels during training/testing from given dataframes.
-
gen_vocab_datasets.py creates vocabulary/tokens and splits UI X ray into train and test set.
-
trainer.py handles the training/testing loop with dataloaders, learning rate scheduling, optimizer and back propagation.
-
utils.py has seed setting and sampler code for text generation
-
mymodel.py defines the architecture
-
tokenizer.py is used to experiment with WordPieceTokenizer from BERT but was not promising.
We have used some starter code from https://github.com/karpathy/minGPT mostly trainer.py, utils.py. But most of our code including the model architecture iswritten by our team.