Techniques for Efficient Query Reverse Engineering
https://zoom.us/j/94476744722?pwd=aEpZZnlxb214Rmk4S2xmUkdiVkFGZz09
Meeting ID: 944 7674 4722
Password: 917008
Abstract:
Query Reverse Engineering (QRE) is a useful query processing technique that has diverse applications including data analysis, query discovery, query construction and generating query explanations. Given an output table T that is the result of some unknown query on a database D, the goal of QRE is to find a target query Q such that the result of Q on D is equal to T. In this thesis, we address two fundamental challenges of QRE that arise from the efficiency-expressiveness tradeoff.
The first challenge is the issue of search efficiency; i.e., how to efficiently find a target query from the large search space of candidate queries. All the existing solutions for QRE are based on a generate-and-test paradigm that enumerates a space of candidate queries and validates each candidate query Q to determine whether the result of Q on D is T. To address the efficiency of searching for candidate queries, two main strategies have been explored. The first strategy is to design heuristics to efficiently prune the search space and the second strategy is to reduce the search space by restricting the expressiveness of target queries. In this thesis, we propose a novel query-centric approach to address the efficiency challenge that is based on partitioning the database tables into fragments such that each table fragment corresponds to a project-join query. By exploiting the queries associated with the table fragments, our approach is able to optimize both candidate query generation as well as query validation.
The second challenge is the issue of supporting more expressive target queries. Existing research on QRE has focused on different query fragments including project-join queries, select-project-join queries, top-k queries and aggregation queries. In this thesis, we pro- pose efficient techniques to reverse engineer a more expressive fragment of select-project- join queries that allow scalar subqueries involving aggregate functions.