Exploring Value of Electronic Health Records in Diseases Predictive Analytics

Ms Qiu Lin
Dr Tan Cheng Yian, Bernard, Shaw Senior Professor, School of Computing

  05 Jul 2019 Friday, 10:30 AM to 12:00 PM

 MR1, COM1-03-19


Disease Predictive Analytics (DPA) is valuable in early diagnosis and to provide better care to patients. In reality, patients often tend to have multiple diseases simultaneously or develop multiple complications over time. It is realistic to categorize DPA into predictive analytics for single disease and multiple diseases. The increasing adoption of Electronic Health Record (EHR) systems makes diverse and large-scale health data of patients significantly more available than before, but the value of EHR in the disease predictive analytics is less explored. EHR comprises various types of data and each type provides patients' health information from a particular perspective. This different and complementary information together tells physicians the whole picture of patients' disease status, especially for complicate diseases. Going beyond pure EHR-driven analytics, how to integrate the value of recently-developed EHR with biomedical knowledge accumulated for a long time in DPA problem is worth exploring as well.

The first paper in my doctoral thesis observes the problem of multiple disease predictive analytics. Unlike single disease predictions, multiple disease predictions have to take the relationships between diseases into account and these relationships have been observed through empirical studies and accumulated as biomedical knowledge. The first essay presents a multiple diseases predictive approach, called Deep Knowledge-augmented Multi-Label Learning (DKML) to integrate an EHR-driven model with biomedical knowledge. Besides, note that multiple diseases predictive analytics face the challenge of data imbalance because rare diseases tend to have much lesser data than common diseases. DKML utilizes an advanced deep generative modeling to generate synthetic EHR data to alleviate the imbalance problem through data augmentation.

In the second paper of my thesis, I study another type of disease predictive analytics -- single disease prediction. Notice that a unique disease can be reflected in multiple modalities for a patient. I propose a multi-modality model to combine multiple different but complementary modalities jointly so as to improve the disease predictive performance. I use the single Alzheimer disease (AD) as my research case because AD is complicate and measured in different types of health data, such as MRI images, genotype data, and textual medical notes, where each data type is a different modality.