Parallel Graph Processing on CPU-GPU Architectures

Mr Guo Wentian
Dr Tan Kian Lee, Shaw Senior Professor, School of Computing

  03 May 2019 Friday, 10:00 AM to 11:00 AM

 MR6, AS6-05-10


Graph is a useful data representation that has seen applications in various fields, such as social network, semantic web and etc. Despite its great importance, graph processing is still faced with difficult challenges: first, many graph algorithms are compute-intensive by nature. Second, modern graphs are usually large and evolve rapidly. These make the graph processing difficult to achieve scalability and efficiency. In this proposal, we aim to take advantage of the recent advancements in hardware technology and design efficient solutions to accelerate graph computation on the heterogenous CPU-GPU architectures.

First, we study personalized PageRank on dynamic graphs and propose a parallel approach that can be deployed on both multi-core CPUs and GPUs. Different from existing sequential approaches that perform computation on each single graph update, our approach can ingest the graph updates in batch and parallelize the critical process of computation. To speed up the parallel processing, we also devise optimization techniques to accelerate the convergence and reduce synchronization overheads.

Second, we design a novel framework for parallel subgraph enumeration to leverage the distinct characteristics of CPUs and GPUs that are collocated on a single machine. To find all matched subgraphs, i.e., instances, our framework offloads each partition of the data graph into the GPUs for searching the instances within the partition, while the CPUs handle the instances that cross different partitions. Based on this framework, we propose designs for the GPU component and CPU component, as well as a co-processing method that integrates GPUs and CPUs into a unified workflow such that GPUs can undertake the heavy part of the workload.

Finally, we propose our ongoing work for subgraph enumeration. In this work, we exploit the reuse of intermediate result during subgraph enumeration, and invent a new variant of breadth-first search approach, called reusable BFS. Our reusable BFS can be efficiently parallelized on both multi-core CPUs and GPUs. To facilitate reusable BFS, we devise a theoretical framework that assists parallel threads to maintain the result for reuse in a distributed way; we also propose a reuse-aware method of order selection that can optimize both the extent of reuse and the effect of pruning search space.