Modeling Information Reliability of Online User Generated Content

Ms Lahari Poddar
Dr Wynne Hsu, Provost'S Chair Professor, School of Computing

  13 Nov 2019 Wednesday, 01:30 PM to 03:00 PM

 MR2, COM1-03-28


User Generated Content (UGC) in forms of reviews, numeric ratings, blogs, posts in forum and social media are present in an overwhelming amount to help users make informed decisions about various products or services. Even though helpful, unfortunately many of these posts are not accurate, and might be biased by an individual's opinion or idiosyncratic experiences. This limits their usability as general reliable information sources. As opposed to prior work on binary truth discovery, we argue that UGC can not be judged by such a harsh universal lens of credibility; due to the fine grained subjectivity present in them owing to individual preferences. We also believe that the notion of reliability is strongly domain-dependent and it is crucial to capture the domain-specific nuances for modeling user feedbacks properly.

In this thesis, we focus on a few widely popular domains where people increasingly rely on UGC, namely, e-commerce/services and e-health. In these domains people can share their feedback on an entity (e.g. products in e-commerce, hotels or restaurants in services, drugs and treatments in health forums) in various forms (such as ratings, reviews, posts, lists of observed side effects). We hypothesize that such user feedbacks might be influenced by some underlying confounding factors, that make one user's experience different from another, even for the same entity. Faced with such varying opinions about the same entity, it becomes difficult for a person to make a decision about its quality. For instance, in the context of products, when one looks at conflicting ratings given by users on different aspects of an item, he/she needs to be aware of the biases which influenced their ratings, to estimate the true quality of the item. While going through diverse reviews written by strangers, it is important to know whether a particular opinion expressed in are view is prevalent or rare, before relying on it completely. For health-related information, having a long list of side effects associated with a particular drug, reported by various people with diverse backgrounds, is confusing and intimidating for a person to whom they might not even apply.

We propose a range of data driven methods to automatically handle such inherent subjectivity in user opinions, and identify the roles of the confounding factors behind the observed UGC footprint. We devise new frameworks based on probabilistic graphical models as well as neural networks accordingly. We have validated our models by using them for practical applications such as, (1) quantifying the aspect biases of users to better interpret their observed ratings, (2) retrieving supporting reviews for an individual's opinion to facilitate consensus modeling, and (3) predicting user specific drug side effects. Experimental evaluation on a number of real world datasets show the effectiveness of our
models for handling user generated content and sets new benchmarks across domains.