DOCTORAL SEMINAR

Efficient Sliding-Window Algorithms for Real-Time Data Stream Analytics

Speaker
Mr Wang Yanhao
Advisor
Dr Tan Kian Lee, Tan Sri Runme Shaw Senior Professor, School of Computing


11 Apr 2019 Thursday, 02:00 PM to 03:30 PM

MR1, COM1-03-19

Abstract:

In recent years, data streams have become ubiquitous because many different applications, e.g., social media, communication networks, and sensors, are continuously generating huge volumes of data at an unprecedented rate.

Data streams provide both great opportunities and immense challenges for data analysis. On the one hand, they are important sources of valuable information and knowledge. On the other hand, the volumes and velocities of streams pose new challenges for data analytics. In practice, streaming algorithms have been widely used for data stream analytics. However, streaming algorithms still have some limitations in the time-sensitive scenarios because they cannot capture the highly dynamic and time-evolving patterns of streaming data.

To address the above problems, we adopt the sliding-window model where only recent data in a stream is used for computation to avoid including outdated information and guarantee the timeliness in real-time analysis. Nevertheless, since sliding-window algorithms are generally more challenging than their (append-only) streaming counterparts, efficient sliding-window algorithms for real-time analytics of time-sensitive data streams are still largely unexplored yet.

In this thesis proposal, we investigate the problem of proposing efficient algorithms for real-time data stream analytics over sliding windows. In particular, we focus on two problems that are useful for many real-world applications yet have not been well studied in the sliding window model: (1) we propose a novel model and several algorithms for influence maximization (IM) in data streams, which tracks a set of influential users from a stream of user interactions in a social network; (2) we devise general frameworks for representative subset selection (RSS) that maintains a set of elements based on some measure of representativeness over the sliding window in a data stream. We conduct both rigorous theoretical analysis and extensive experimental studies to demonstrate the effectiveness and efficiency of our proposed algorithms for both problems.