CS SEMINAR

Approximate lifted inference with probabilistic databases

Speaker
Wolfgang Gatterbauer
Assistant Professor
Business Technologies and Computer Science
Carnegie Mellon University

Chaired by
Dr Anthony TUNG Kum Hoe, Professor, School of Computing
atung@comp.nus.edu.sg

02 Oct 2015 Friday, 03:00 PM to 04:30 PM

Executive Classroom, COM2-04-02

Abstract:

Probabilistic inference over large data sets is becoming a central data management problem. Recent large knowledge bases, such as Yago, Nell or DeepDive have millions to billions of uncertain tuples. Yet probabilistic inference is known to be #P-hard in the size of the database, even for some very simple queries. This talk shows a new approach that allows ranking answers to hard probabilistic queries in guaranteed polynomial time, and by using only basic operators of existing database management systems (e.g., no sampling required).
(1) The first part of this talk develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables. Our new bounds shed light on the connection between previous relaxation-based and model-based approximations and unify them as concrete choices in a larger design space.
(2) The second part then draws the connection to lifted inference and shows how application of this theory allows a standard relational database management system to both upper and lower bound hard probabilistic queries in guaranteed polynomial time. We give experimental evidence on synthetic TPC-H data that our approach is by orders of magnitude faster and also more accurate than currently used sampling-based approaches.
(Talk based on joint work with Dan Suciu from TODS 2014 and VLDB 2015:
http://arxiv.org/abs/1409.6052, http://arxiv.org/pdf/1412.1069)


Biodata:

Wolfgang Gatterbauer is an Assistant Professor in Business Technologies and Computer Science at CMU. His current research focus is on scalable approaches to perform inference over large and uncertain data. He received degrees in Mechanical Engineering, Electrical Engineering & Computer Science, and Technology & Policy, and then got his PhD in Computer Science from Vienna University of Technology. Prior to joining CMU, he was a Post-Doc in the Database group at University of Washington. In earlier times, he won a Bronze medal at the International Physics Olympiad, worked in the steam turbine development department of ABB Alstom Power, and in the German office of McKinsey & Company.