PH.D DEFENCE - PUBLIC SEMINAR

On Vertical Elasticity in the Cloud: A Framework for Stream Processing

Speaker
Mr. Dou Rengan
Advisor
Dr Richard Ma Tianbai, Associate Professor, School of Computing


19 Feb 2024 Monday, 04:00 PM to 05:30 PM

MR20, COM3-02-59

Abstract:

Cloud computing has become an increasingly popular paradigm for organizations to manage and utilize computing resources. Long-running applications hosted on cloud platforms experience natural workload fluctuations and skewness, resulting in variability of resource demands. Therefore, the incorporation of elastic resource management in cloud platforms becomes crucial, enabling adaptability to dynamic workload changes through the provisioning and de-provisioning of resources. Specifically, the elasticity of resource management can be further categorized into horizontal elasticity (adding or removing resource entities) and vertical elasticity (increasing or decreasing the resource capacity of existing entities). Notably, vertical elasticity provides a more rapid means of resource adjustment, which is beneficial for latency-sensitive applications, such as stream applications. Thus, this thesis delves into the vertical elasticity in cloud resource management, with a focusing on the context of stream processing. According to different resource isolation granularities and user expectations, this thesis consists of three parts.

Firstly, we study the vertical elasticity at the virtual machine level as a means of achieving performance fairness among various applications.
A significant amount of prior work about fair share allocation in the cloud targets resource fairness, i.e., the relative share of an application is associated with its allocated or used resources. However, users are often not aware of and concerned with the resource usage of their applications but rather care about whether the performance meets their expectations. To this end, we propose a memory manager that aims for performance fairness, where the share of each application is proportional to its performance. To address the main challenge, i.e., comparing performance between diverse applications, we propose a universal method for characterizing the performance of arbitrary applications. Subsequently, we present an online learning solution designed to implement this universal method, obviating the need for pilot runs. Combined with an adaptive memory allocation algorithm, the manager can handle dynamic workloads. We deploy the manager within a cloud environment, where the virtual machine serves as the fundamental resource entity, and we dynamically resize virtual machines via the Ballooning mechanism. Our evaluation shows that the manager can effectively achieve good performance fairness for diverse applications under various shares.

Secondly, we study the vertical elasticity at the container level for latency optimization of stream applications. To enable large-scale data processing, stream processing engines generally create multiple executors to process workload in parallel. Each executor is essentially a JVM, hosted in a container. The skew and variation of the workload can significantly affect latency. Stream applications are required to provide fast or even real-time responses, and thus are sensitive to latency. To optimize latency, we propose Emma, an elastic multi-resource manager that takes CPU and memory into consideration. Emma utilizes the resource elasticity of lightweight virtualization containers, e.g., Linux containers, to resize the resources of executors in the runtime. By fully leveraging the resource characteristics of executors, Emma is able to efficiently and adaptively allocate appropriate resource combinations to executors, ensuring that each executor has enough processing capacity to handle its workload. Our evaluation shows that compared to existing solutions, Emma can reduce the latency by orders of magnitude on real-world applications.

Thirdly, we delve into the vertical elasticity at the slot level, also focusing on the latency optimization of streaming applications. A slot represents a unit of resource isolation at the task level in some specialized systems, such as Apache Flink. A task serves as the minimum processing unit in a stream application, essentially a thread. In this work, we introduce our task-level memory manager, EMM. EMM leverages our quantitative modeling of memory and task-level latency to adaptively allocate optimal memory sizes to individual tasks, ultimately minimizing end-to-end latency. Upon integration of EMM into Apache Flink, our experiments demonstrate that EMM could significantly reduce end-to-end latency across a spectrum of applications at various scales and configurations, in comparison to the Flink native setting.

In summary, this thesis offers valuable insights into the vertical elasticity of resource management on clouds, considering various resource isolation granularities and diverse objectives.