PH.D DEFENCE - PUBLIC SEMINAR

Cost Performance of Scaling Applications on the Cloud

Speaker
Mr. Mr Rathnayake Mudiyanselage Sunimal Rathnayake
Advisor
Dr Teo Yong Meng, Associate Professor, School of Computing


25 Sep 2020 Friday, 02:00 PM to 03:30 PM

Zoom presentation

Join Zoom Meeting
https://nus-sg.zoom.us/j/97759285451?pwd=S0RwdHE0bGRLWFBuKzhUdmlYLzhEQT09

Meeting ID: 977 5928 5451
Password: 381688

Abstract:

The inherent scaling capability and pay-per-use charging have led to the growth of cloud computing as a choice of compute resource infrastructure for a wide range of users from large corporations to individuals. While the inherent resource scaling on cloud has been well explored, the cost performance of scaling applications on cloud has received much less attention. Leveraging application scalability on inherently scalable cloud resources opens up new opportunities for cloud consumers to yield the maximum advantage from cloud. This thesis investigates the premise of scaling applications on cloud and its implications on cost performance.

Motivated by Amdahl's fixed-workload scaling and Gustafson's fixed-time scaling for high performance computing, we propose fixed-cost scaling for cloud computing and investigate the implications of the proposed fixed-cost law under fixed-workload and fixed-time. Under fixed-cost application scaling, we address three key issues: large cloud resource configuration space, scaling application problem size, and scaling accuracy.

To address the challenge presented by a large cloud resource configuration space, we propose a measurement-driven analytical modeling approach for determining cost-time Pareto-optimal cloud resource configurations. Our approach exposes the existence of multiple Pareto-optimal configurations that meet the application cost budget and time deadline constraints. We investigate the impact of fixed-workload scaling on cloud and discuss the effect of resource configuration on cost and time performance. Our results show that up to 30% cost savings can be achieved with Pareto-optimal resource configuration for our example application.

Given a fixed-cost budget and a time deadline, we investigate the effect of scaling the problem size of an application. Through a measurement-driven analytical modeling approach, we show that cost-time-problem size Pareto-frontier exists with multiple sweet spots meeting cost budget as well as time deadline constraints. Among the Pareto-optimal problem sizes, we show that there are opportunities for saving cost and time for a comparatively smaller reduction in problem size. For example, the cost and time deadline could be halved with a one-fifth reduction in maximum problem size for n-body simulation. To characterize the cost-performance of cloud resources, we introduce a new metric Performance Cost Ratio (PCR) and demonstrate the use of PCR to efficiently derive cost-time efficient cloud resource configurations for executing applications on cloud.

In contrast to traditional applications that produce exact results, there exists a class of applications that produce approximate results with a notion of accuracy. For such applications, accuracy can be traded-off for reduced execution cost and time. With a measurement-driven approach, we investigate the cost-time-accuracy performance for executing applications on cloud using Convolution Neural Networks (CNN) as an example. In contrast to studies that focus on CNN learning phase that incurs largely a one-off cost, we focus on improving the cost-performance of CNN inference. Without a significant reduction in inference accuracy, our approach focuses on determining multiple sweet-spots where cost and time could be reduced. We show that selecting the right degree of pruning reduces inference cost and time by half with a one-tenth reduction in accuracy. We expose the cost-accuracy and time-accuracy Pareto-optimal configurations spanning large cost and time range where selecting the right configuration results in halving the inference cost and time. To address the challenge of having multiple resource and application configurations that give the same accuracy but with different cost and time, we introduce two new metrics Cost Accuracy Ratio (CAR) and Time Accuracy Ratio (TAR) for quantifying the performance of cloud resources with respect to accuracy. Using CAR and TAR to guide our heuristic, we present a polynomial-time algorithm to select cloud resources.