SCALING DATA STREAM PROCESSING ON MULTICORE ARCHITECTURES
COM2 Level 2
MR3, COM2-02-26
closeAbstract:
Data stream processing systems (DSPSs) enable users to express and run stream applications to continuously process data streams. Witnessing the emergence of modern commodity machines with massively parallel processors, researchers and practitioners find shared-memory multicore architectures an attractive platform for DSPSs. However, fully exploiting the computation power delivered by multicore architectures can be challenging, and scaling DSPSs on modern multicore architectures remains to be notoriously challenging. This is because processing massive amounts of data can confront several performance bottlenecks inherited from different DSPS components, which altogether put a strict constraint on the scalability of the DSPSs on multicore architectures.
In this thesis, we present the evaluation, design and implementation of DSPSs aiming at high-performance stream processing on multicore architectures. First, as modern DSPSs are mainly designed and optimized for scaling-out using a cluster of low-end servers, we present a comprehensive performance study of popular DSPSs (e.g., Apache Storm, Flink) on multicore architectures and try to identify whether the current implementation matches with modern hardware (e.g., non-uniform memory access - NUMA, multicore). Second, our detailed profiling study shows that existing DSPSs severely underutilized the underlying complex hardware micro-architecture and especially show poor scalability due to the unmanaged resource competition and unaware of NUMA effect. We hence present our efforts on a complete revolution in designing next-generation stream processing platform, namely BriskStream, specifically optimized for shared-memory multicore architectures. A novel NUMA-aware execution plan optimization scheme, namely Relative-Location-Aware-Scheduling (RLAS) is proposed to address the NUMA effect for stream computation. Third, DSPS with transactional state management relieves users from managing state consistency by themselves. We introduce TStream, a DSPS with efficient transactional state management. Compared to previous works, it guarantees the same consistency properties while judiciously exploits more parallelism opportunities - both within the processing of each input event and among a (tunable) batch of input events. To confirm the effectiveness of our proposal, we compared TStream against three prior solutions on a four-socket multicore machine. Our extensive experiment evaluations show that TStream yields up to 6.8 times higher throughput comparing with existing approaches with comparable end-to-end processing latency under the same consistency guarantee.