Towards Facilitating Healthcare Data Analytics: Resolving Challenges in Electronic Medical Records

Ms Zheng Kaiping
Dr Ooi Beng Chin, Distinguished Professor, School of Computing

  23 Apr 2019 Tuesday, 02:00 PM to 03:00 PM

 COM2-01-08 Tutorial Room 9


In recent years, the increasing availability of Electronic Medical Records (EMR) has brought more promising opportunities to automate healthcare data analytics. This helps gradually reduce the need for traditional manual data analytics which relies on domain expertise, experience, and costly as well as painstakingly designed experiments. However, some challenges in EMR data and EMR data analytics pose a negative effect on healthcare analytic performance if not well handled, and lead to a gap between the potential of EMR data for analytics and its usability in practice. Consequently, it is of vital necessity and importance to resolve the challenges in both EMR data and EMR data analytics in order to boost the performance and further facilitate healthcare data analytics for more medical insights, contributing to better patient management and faster medical research advancement.

In this proposal, we probe into four representative challenges in EMR data and EMR data analytics, namely irregularity, bias, lack of reliability and lack of interpretability, and then present our devised solutions to resolving them.

Firstly, we identify the irregularity challenge in EMR data and justify that it should be resolved at the feature level to reduce the time information loss. We propose an adapted Gated Recurrent Unit model to incorporate the fine-grained feature-level time span information and experimental results show that our proposed model can effectively improve EMR data analytic performance.

Secondly, we further investigate the irregularity challenge in EMR data and figure out it is a phenomenon, while bias should be the underlying reason. Hence, we formalize the bias challenge in EMR data and propose a general method to transform the biased EMR time series into unbiased data, taking into account two characteristics of medical features, Condition Change Rate, and Observation Rate. Experimental results demonstrate that our proposed bias resolving method manages to not only impute missing data more accurately but also benefit the performance of downstream data analytic applications.

Thirdly, we point out that EMR data analytics is a high-stakes application in which every patient needs to be considered equally important. We find it would be better if the model takes over the tasks it can well predict, while asks for assistance on the difficult ones. This is identified as the task decomposition problem and the partial coverage model with a reject option is a solution. We devise a general two-level approach to optimize task decomposition for healthcare applications, i.e., optimize the partial coverage model with a reject option through re-weighting the task distribution. Experimental results illustrate that the proposed model exhibits a substantial superiority over baselines in terms of the model's prediction performance on easy tasks.

Finally, we identify the essential role of interpretability when designing analytic models and figure out the necessity of breaking down the feature importance into the global level and the local level to provide general and time-specific explanations respectively. To be specific, we propose to model the global-level feature importance in two subnetworks, and through training both subnetworks jointly, we hope to achieve accurate predictions and derive medically interpretable insights simultaneously. Our future work includes evaluating the effectiveness of this devised model in terms of both prediction performance and interpretation capability with doctors' assistance on validation.