SCALING DATA STREAM PROCESSING ON MULTICORE ARCHITECTURES
24 Jan 2019 Thursday, 10:00 AM to 11:30 AM
COM2 Level 4
Executive Classroom, COM2-04-02
Data stream processing system (DSPS) is a software that allows users to run their streaming applications which continuously process data streams in real-time. Witnessing the emergence of modern commodity machines with massively parallel processors, researchers and practitioners find shared-memory multicore architectures an attractive platform for DSPSs. However, fully exploiting the computation power delivered by multicore architectures can be challenging, and scaling DSPSs on modern multicore architectures remains to be notoriously challenging. This is because processing massive amounts of data can confront several performance bottlenecks inherited from different DSPS components, which altogether put a strict constraint on the scalability of the DSPSs on multicore architectures.
In this thesis proposal, we describe the evaluation, design and implementation of DSPSs aiming at achieving high performance for stream processing on multicore architectures. We also discuss our ongoing work towards a complete thesis. First, as modern DSPSs are mainly designed and optimized for scaling-out using a cluster of low-end servers, we present a comprehensive performance study of popular DSPSs (e.g., Apache Storm, Flink) on multicore architectures and try to identify whether the current implementation matches with modern hardware (e.g., non-uniform memory access (NUMA), multicore). Second, our detailed profiling study shows that existing DSPSs underutilized the underlying complex hardware micro-architecture and especially show poor scalability due to the unmanaged resource competition and unaware of NUMA effect. We hence present our efforts on a complete revolution in designing next-generation stream processing platform, namely BriskStream, specifically optimized for shared-memory multicore architectures. A novel NUMA-aware execution plan optimization scheme, namely Relative-Location-Aware-Scheduling (RLAS) is proposed to address the NUMA effect for stream computation. Third, DSPSs with transactional state management relives users from managing state consistency by themselves. However, scaling stream processing with transactional state management on modern multicores is challenging. On the one hand, DSPSs need to exploit parallelism aggressively to achieve both low latency and high throughput. On the other hand, higher parallelism leads to higher chances of violating transactional state consistency, consequently yielding incorrect results. As a future work, we are designing T-Stream, a highly scalable DSPS with built-in transactional state management. The initial results of both microbenchmark and real use case workloads comparing four existing schemes have confirmed the superiority of our mechanisms and designs.