PH.D DEFENCE - PUBLIC SEMINAR

Towards an Integrated Framework for Designing Concurrency Control and Logging for In-memory Databases

Speaker
Mr Yao Chang
Advisor
Dr Ooi Beng Chin, Lee Kong Chian Centennial Professor, School of Computing


28 Nov 2017 Tuesday, 10:00 AM to 11:30 AM

Executive Classroom, COM2-04-02

Abstract:

As memory is getting cheaper and larger, it is the trend that memory gradually replaces disk as the primary storage for database systems. Compared to disk-based systems, data operations in in-memory databases are much more efficient. However, simply replacing the storage layer of a traditional disk-based system with memory does not satisfy the real-time performance requirements. While in-memory databases dramatically reduce the disk I/O cost of transaction processing, they still need to flush the logs to disk to provide durability, which incurs a substantial number of disk I/Os and takes more proportion of time in the processing. Moreover, as data operations are no longer the performance bottleneck, concurrency control protocols that are designed for disk-based systems usually failed to further exploit computation parallelism for in-memory systems. Thus, it is urge to redesign both concurrency control protocols and logging schemes for in-memory database systems.

First, command logging that tracks transaction???s information instead of recording how data are changed is proposed to improve the logging efficiency. However, command logging incurs expensive recovery cost especially in distributed settings. We show that the only bottleneck of recovery caused by command logging in the distributed setting is the synchronization process that attempts to resolve the data dependency among the transactions. We then propose an adaptive logging approach by combining data logging and command logging. The percentage of data logging versus command logging becomes a tuning knob between the performance of transaction processing and recovery to meet different requirements.

Second, multi-core CPUs and large memories are increasingly becoming the norm in modern computer systems. It is vital important to exploit the parallelism in an efficient way. We propose a new concurrency control protocol called DGCC (Dependency Graph based Concurrency Control) that separates contention resolution from transaction execution. DGCC builds dependency graphs for batched transactions before executing them. Using these dependency graphs, contentions within the same batch of transactions are resolved before the execution. As a result, the execution of the transactions does not need to deal with contention while maintaining full equivalence to that of serialized execution.

Third, existing approaches treat both transaction management and recovery as two separate problems, even though recovery is dependent on the sequence in which transactions are executed. We propose to treat the transaction management and recovery problems as one. We first propose an efficient Distributed Dependency Graph based Concurrency Control (DistDGCC) protocol for handling transactions spanning multiple nodes, and propose a new novel and efficient logging protocol called Dependency Logging that also makes use of dependency graphs for efficient logging and recovery. DistDGCC optimizes the average cost for each distributed transaction by processing transactions in batches. Moreover, it also reduces the effects of thread blocking caused by distributed transactions and consequently improves the runtime performance. Further, dependency logging exploits the same data structure that is used by DistDGCC to reduce the logging overhead, as well as the logical dependency information to improve the recovery parallelism.