Towards Automated Diagnosis with Attentive Multi-Modal Learning Using Electronic Health Records and Chest X-rays
Jointly learning from Electronic Health Records (EHR) and medical images is a promising area of research in deep learning for medical imaging. Using the context available in EHR together with medical images can lead to more efficient data usage. Recent work has shown that jointly learning from EHR and medical images can indeed improve performance on several tasks. Current methods are however still not independent of clinician input. To obtain an automated method only prior patient information should be used together with a medical image, without the reliance on further clinician input.
In this paper we propose an automated multi-modal method which creates a joint feature representation based on prior patient information from EHR and associated X-ray scan. This feature representation, which joins the two different modalities through attention leverages the contextual relationship between the modalities. This method is used to perform two tasks: diagnosis classification and free-text diagnosis generation. We show the benefit of the multi-modal approach over single-modality approaches on both tasks.