PH.D DEFENCE - PUBLIC SEMINAR

Developing clinical prediction models

Speaker
Mr. Chan Wei Xin
Advisor
Dr Wong Lim Soon, Kithct Chair Professor, School of Computing


11 Jul 2023 Tuesday, 03:00 PM to 04:30 PM

MR3, COM2-02-26

Abstract:

Clinical prediction models are developed to estimate the absolute risk of clinically important outcomes in patients. These models are often designed with the purpose of guiding clinical decision making. Recently, there has been a deluge of publications regarding clinical prediction models due to the resurgence of interest in artificial intelligence. However, very few of these models end up being deployed in the real-world.

In this thesis, we discuss the main obstacles facing effective model deployment in healthcare. We identify improper development and evaluation of clinical prediction models as the principal cause behind some of the main obstacles. Clinical prediction models are particularly susceptible to improper development and evaluation due to the inherent heterogeneities in clinical data. We discuss two of the most prevalent heterogeneities in clinical data in further detail in this thesis.

Batch effects are a common heterogeneity in high-dimensional biological data, such as gene expression microarray data. Failure to properly account for batch effects during the development of prediction models often leads to poor generalisation ability. Very few quantitative batch effects metrics have been proposed for use in small data sets. The accuracy of these metrics are reduced when used to quantify batch effects in data where different batches contain different class proportions. We propose recursive variance partitioning (RVP), a novel metric for quantifying batch effects. We show that RVP is able to accurately estimate the proportion of total variance attributable to batch effects in data, over a range of magnitudes of batch effects. RVP exhibits similar performance even in data with severe batch-class imbalance.

Another common heterogeneity that exists in clinical data is when it is made up of patients who receive different treatments. This heterogeneity complicates model development and evaluation as different patient treatments affect patient outcome, which is often the prediction target, to varying degrees. We propose a scoring scheme for use in evaluating clinical prediction models, which incorporates patient treatment information. We use the Malaysia-Singapore acute lymphoblastic leukaemia (ALL) data set as a case study to demonstrate the use of the proposed scoring scheme. Evaluating models in this manner would help to avoid errors that arise due to treatment differences.

In this thesis, we develop a subtype-specific prediction model for treatment outcome in ALL patients. Our subtype-specific model incorporates the use of transcriptomic features engineered from patient gene expression profiles (GEPs) at different time-points of treatment. Our model outperforms other methods in classifying patients who achieved continuous complete remission and patients who relapsed, in homogeneous ALL subtypes. Our subtype-specific model is designed to be robust to small sample sizes. We also present the biological hypothesis behind our model: GEPs measure the average gene expression of all leukaemic and normal cells in a sample, and patients who are more responsive towards treatment will experience a faster decrease in proportion of leukaemic cells and hence exhibit a greater shift in their GEP. We validate the hypothesis by estimating B-cell abundance in patient samples using various methods.