PH.D DEFENCE - PUBLIC SEMINAR

Machine Learning for Improving Social Science Research Methods

Speaker
Ms. Liu Xuanqi
Advisor
Dr Huang Ke-Wei, Associate Professor, School of Computing


08 Jan 2024 Monday, 03:00 PM to 04:30 PM

MR3, COM2-02-26

Abstract:

Machine learning (ML) has experienced significant advancements in the past two decades, demonstrating immense potential for enhancing interdisciplinary research. It offers the prospect of gaining a deep understanding of social phenomena through more accurate predictions and estimations. Inspired by the rapid advancement of machine learning, this thesis aims to explore how sophisticated machine learning techniques can enhance two social science tasks, including causal inference and measurement.

The first study focuses on producing a better estimation of peer influence by controlling latent homophily using node embedding. Given the existence of unobservable homophily variables and the potential nonlinearity of homophily effects, traditional regression models may face endogeneity issues when estimating peer effects based on observable data. To alleviate these issues, I propose two methods that leverage node embedding derived from network structure to nonparametrically control for latent homophily. The first method draws upon the extensive literature on partially linear regression and employs the double machine learning estimator for improving peer estimation. The second method involves directly estimating peer influence and controlling the homophily effect through a novel neural network model. Experimental results show that these new estimators outperform several popular ad-hoc methods documented in the literature in reducing the omitted variable bias arising from latent homophily variables. Theoretical analysis that characterizes the biasness and efficiency of these estimators is also provided. This study contributes to the growing stream of research on overcoming various statistical challenges in estimating causal effects by machine learning. Future non-experimental empirical studies may benefit from our proposed methods for better treatment estimation in network contexts.

The second study centers around the construction of novel measures for corporate risk assessment. The primary objective is to address a critical question: how do firms respond to technological risk in terms of their innovation strategies? This question remains understudied in IS research primarily due to a lack of direct measures for ex-ante technological risk, which poses a significant challenge to empirical studies. To overcome this challenge, I identify and quantify firm-level technological risk using risk disclosures in 10-K reports by text-mining. This new measure provides valuable insights into a firm’s awareness and perception of technological risk exposure, as well as the types of technologies involved. Subsequently, I investigate the impact of technological risk on future innovation performance, measured by patent counts and citations. The empirical results indicate that firms exposed to technological risk are more likely to improve their innovation performance, as evidenced by increased patenting activities and the development of more impactful patents. Furthermore, firms operating in highly competitive environments adopt more proactive innovation strategies to mitigate technological risk, while firms with a narrow technological knowledge base focus on pursuing high-quality innovation.