PH.D DEFENCE - PUBLIC SEMINAR

Blockchains: Novel Data Systems and Beyond

Speaker
Mr Ruan Pingcheng
Advisor
Dr Ooi Beng Chin, Lee Kong Chian Centennial Professor, School of Computing


08 Feb 2022 Tuesday, 10:00 AM to 11:30 AM

Zoom presentation

Abstract:

The success of Bitcoin brings tremendous interest to its underneath technology, the blockchain. A blockchain is a decentralized system that orchestrates mutually distrusting parties towards a unanimous agreement on a ledger of the transaction history, without resorting to third parties. For years, blockchain applications restrict themselves to cryptocurrencies, serving monetary transfers between anonymous transactors. However, this situation is now radically changed: smart contracts enable arbitrary data transitions on blockchains. Meanwhile, demands are running high to tap the blockchains' potential for use cases where accountability overrides anonymity. Blockchains are then shifted away from their original purpose. They enter the data-processing domain, where decades of experience from the database community is too copious to overlook. In this thesis, we focus on the optimization of blockchains from the perspective of a data system. Particularly, we pivot to permissioned blockchains which operate with identified parties and lend themselves to a more variety of applications.

First, we treat a blockchain (either permissioned or permissionless) also as a generic distributed system, and as such it shares some similarities with distributed database systems. Existing works that compare blockchains and distributed database systems focus mainly on high-level properties, such as security and throughput. They stop short of showing how the underlying design choices contribute to the overall differences. This work is to fill this important gap. To be particular, we perform a twin study of blockchains and distributed database systems as two types of transactional systems. We propose a taxonomy that illustrates the dichotomy across four dimensions, namely replication, concurrency, storage, and sharding. Within each dimension, we discuss how the design choices are driven by two goals:security for blockchains, and performance for distributed databases. To expose the impact of different design choices on the overall performance, we conduct an in-depth performance analysis of two permissioned blockchains, namely Quorum and Hyperledger Fabric, and two distributed databases, namely TiDB, and etcd. Lastly, we propose a framework for back-of-the-envelope performance forecast of blockchain-database hybrids.

Secondly, with a tamper-evident ledger for recording transactions that modify some global states, a blockchain system captures the entire evolution history of the states. The management of that history, also known as data provenance or lineage, has been studied extensively in database systems. However, querying data history in existing blockchains can only be done by replaying all transactions. This approach is applicable to large-scale, offline analysis, but is not suitable for online transaction processing. We hence present FabricSharp, a fine-grained, secure and efficient provenance system for blockchains. FabricSharp exposes provenance information to smart contracts via simple and elegant interfaces, thereby enabling a new class of blockchain applications whose execution logics depend on provenance information at runtime. FabricSharp captures provenance during contract execution, and efficiently stores it in a Merkle tree. FabricSharp provides a novel skip list index designed for supporting efficient provenance query processing. We have implemented FabricSharp on top of Hyperledger Fabric v2.2 and a blockchain-optimized storage system called ForkBase. Our extensive evaluation of FabricSharp demonstrates its benefits to the new class of blockchain applications, its efficient query, and its small storage overhead.

Thirdly, catering for emerging business requirements, a new architecture called execute-order-validate has been proposed in Hyperledger Fabric to support parallel transactions and improve the blockchain's throughput. However, this new architecture might render many invalid transactions when serializing them. This problem is further exaggerated as the block formation rate is inherently limited due to other factors besides data processing, such as cryptography and consensus. In this work, we propose a novel method to enhance the execute-order-validate architecture, by reducing invalid transactions to improve the throughput of blockchains. Our method is inspired by state-of-the-art optimistic concurrency control techniques in modern database systems. In contrast to existing blockchains that adopt database's preventive approaches which might abort serializable transactions, our method is theoretically more fine-grained. Specifically, unserializable transactions are aborted before ordering and the remaining transactions are guaranteed to be serializable. For evaluation, we implement our method on top of our FabricSharp. We compare the performance of FabricSharp with carefully-chosen baselines. The results demonstrate that FabricSharp achieves remarkably greater throughput compared to the other systems in nearly all experimental scenarios.

Last but not the least, borrowing the idea from databases again, we extend on permissioned blockchains for access-control view management. This is to cater for applications, where the access to sensitive information should be limited, that is, concealed from peers and users who do not have proper access permissions. We present two types of views' irrevocable and revocable, according to whether access to sensitive information can or cannot be revoked. We show how to implement the two types of view by using cryptographic hash functions and encryption keys. An implementation of the proposed methods and a comparison to cross-chain transactions show that the approach is viable. Experiments with supply chain transactions illustrate the incurred costs of the different types of views, including latency, transaction rate and storage overhead.

We release the source code of FabricSharp for public use. In the end, we conclude with future works.