This repository contains code for creating different machine learning models to predict imminent ICU admission and prolong stay at the ICU using clinical notes only. The notes are part of the MIMIC-III database. The SQL script for extracting the data from the database can be found ]here](https://github.com/sudarshan85/mimic_extraction/tree/master/imminent-adm-prolonged-stay-with-notes).
Our procedure for building the model is as follows:
- Extract the notes from the database with relevant conditions as detailed in the SQL script
- Use scispacy to tokenize the notes
- Use TF-IDF processing for numericalizing the text
- Build logistic regression, random forest, and gradient boosting machine models
We also build a CNN model using a one-hot count vector of the notes. The code for each of the models can be found in the corresponding folders.