A Semiparametric Approach for Imputing Missing Value in Predictive Analytics
18 Sep 2017 Monday, 02:00 PM to 03:30 PM
Supervisor: Associate Professor HUANG Ke-Wei
Examiners: Associate Professor GOH Khim Yong; Dr. PHAN Tuan Quang
This study proposes a semiparametric approach for imputing missing value in the predictive analytics. Building upon theoretical results in the statistical literature, our approach relaxes two main assumptions imposed in existing methods, namely the ignorable missing data assumption and the distributional assumption among all variables. Similar to the well-known EM (Expectation Maximization) algorithm, the proposed model consists of two iterative steps. First, a semiparametric method is used to capture the relationship between the missing occurrence and all the variables. Second, to relax distributional assumption, a nonparametric model is developed to impute the missing value with its kernel weight being adjusted based on the results of the first step. Preliminary simulation results demonstrate higher imputation accuracy than the benchmark methods, such as the multivariate imputation by chained equations (MICE), the random forest imputation (MissForest), and the classic nonparametric imputation method. Besides, experimentation in real-world data shows that the proposed model also produces higher prediction accuracy.