PH.D DEFENCE - PUBLIC SEMINAR

Towards Fine-Grained Runtime Optimization for Stream Processing

Speaker
Mr. Mao Yancan
Advisor
Dr Richard Ma Tianbai, Associate Professor, School of Computing


02 May 2024 Thursday, 10:00 AM to 11:30 AM

SR21, COM3 02-60

Abstract:

Stream processing has emerged as a critical component for managing the immense volume, rapid velocity, and heterogeneous nature of data generated by Internet-scale services. To achieve the demanding goals of low latency and high throughput, specialized stream processing engines (SPEs) have been developed to parallelize and manage streaming jobs. However, the conventional approach involves executing these streaming jobs with statically configured execution plans, which often lack the adaptability needed to respond to the dynamic characteristics of incoming data streams. In particular, there is a pressing need for flexible runtime reconfigurations, efficient state migration techniques, and adaptive scheduling strategies for varying stream processing scenarios.

This seminar presents a cohesive narrative focused on fine-grained runtime optimization for stream processing. It encompasses three interconnected works: Trisk, Spacker, and MorphStream, which address the dynamic challenges in runtime reconfiguration, state migration, and transaction scheduling of stream processing respectively.

Trisk supports efficient and configurable reconfigurations to handle dynamic changes in input workload characteristics during general stream processing. This is achieved by applying fine-grained task-level reconfiguration primitives along three dimensions: resources, workloads, and execution logic. Trisk’s partial pause-and-resume mechanism ensures efficient execution, and its task-centric abstraction exposes easy-to-use programming APIs to enhance usability.

Spacker offers configurable state migration to adapt to dynamic state workload characteristics during state migration execution. It enables flexible planning of key-level operations through a configurable planning strategy. Spacker’s non-disruptive sync-update-resume protocol minimizes data processing blocking during state migration. Spacker achieves flexible performance trade-offs for state migration, and provides efficient and adaptable state migration solutions.

MorphStream proposes an adaptive scheduling strategy for dynamic transaction workload characteristics in transactional stream processing. It maps the transaction scheduling problem to a graph scheduling problem, allowing the selection of suitable scheduling strategies based on workload characteristics. MorphStream leverages fine-grained dependency identification and resolution, parallel task precedence graph (TPG) construction, and stateful TPG implementation for efficient and correct scheduling.