Towards Real-World Generalizability of Misinformation Detection
COM1 Level 2
SR9, COM1-02-09
closeAbstract:
This thesis investigates generalizable patterns for detecting two prevalent types of misinformation: rumors, unverified statements in circulation; and fake news, articles presented as news reports that contain false statements. Specifically, we aim to identify veracity indicators that are widely applicable across three key aspects: (1) emergent newsworthy events, (2) diverse writing styles, and (3) social graphs from different domains.
First, we address scenarios where a detector is presented with suspicious microblogs arising from new, unseen events. Observing that microblog publishers often post within a consistent veracity class (e.g., “False Rumor”) regardless of the event, we introduce Publisher Style Aggregation (PSA), an event-generalizable rumor detection approach that learns event-invariant publisher characteristics from each publisher's posting records to enhance microblog representations. Experimental results demonstrate PSA's superiority in various settings, including cross-event, cross-dataset, and early rumor detection.
Next, we explore generalizability across writing styles, specifically how to detect fake news articles disguised in the style of reputable media. We propose SheepDog, a style-agnostic framework that combines the task-specific, fine-tuned capabilities of pre-trained language models (LMs) with the versatile strengths of large language models (LLMs) under a multi-task learning paradigm. Integrating style-diverse news reframings and content-centric veracity attributions, SheepDog yields robust predictions resilient to stylistic variations and provides valuable insights for debunking news flagged as fake.
Finally, we identify generalizable structural patterns related to veracity in social contexts represented as graphs, where news dissemination and user engagement patterns vary between datasets and domains. We find that social users often show a consistent preference for either fake or real news. Based on this, we introduce a unified social graph formulation, connecting news article nodes by shared readerships, which facilitates effective detection even under severe label sparsity. Additionally, we distill veracity-indicative structural degree patterns applicable to news article graphs across different domains. Guided by these patterns, we further develop a lightweight degree-corrected method for learning a refined social graph structure to suppress edge noise, which is 7.6 to 34.1 times faster than existing graph structure learning approaches.