CS SEMINAR

Non-invasive cleaning of uncertain data

Speaker
Professor Sebastian Link
University of Auckland

Chaired by
Dr Sanjay JAIN, Provost's Chair Professor, School of Computing
sanjay@comp.nus.edu.sg

21 Jul 2017 Friday, 02:30 PM to 04:00 PM

SR@LT19

Abstract:

We depart from the classical view of data cleaning that considers data itself dirty and, instead, consider the degrees of uncertainty attributed to data as dirty. Applying possibility theory, tuples are assigned degrees of possibility with which they occur, and constraints are assigned degrees of certainty that say to which tuples they apply. In contrast to classical data cleaning we do not remove or modify some minimal set of tuples, but marginally reduce their degrees of possibility. This reduction requires us to investigate a new qualitative version of the vertex cover problem. We establish an algorithm that is NP-complete but fixed parameter tractable in the size of the qualitative vertex cover. Experiments with benchmark and real-world data show that our algorithm is very fast, outperforms the classical algorithm, and performance improves with higher numbers of uncertainty degrees.


Biodata:

Sebastian received his PhD in Information Systems from Massey University in 2005, and is currently Professor in the University of Auckland. His research interest is in the application of logic, algebra, combinatorics, and statistics to computer science. He has mainly worked in the area of database theory, conceptual modeling and XML. However, he is also very much interested in the motivation of the research area, in particular in the perceived drivers and barriers to concepts in databases and modeling.