PH.D DEFENCE - PUBLIC SEMINAR

Efficient and Secure Machine Learning over Distributed Graphs

Speaker
Mr. Aashish Kolluri
Advisor
Dr Prateek Saxena, Associate Professor, School of Computing


02 Dec 2024 Monday, 01:00 PM to 02:30 PM

MR20, COM3-02-59

Abstract:

Machine learning on graphs is a fundamental problem with far-reaching applications, including personalized recommendations, financial fraud detection, and drug discovery. Practical scenarios often involve large and sensitive graphs, necessitating the partitioning of graph data to increase decentralization. This partitioning ensures that processing is distributed among several servers or client devices, adhering to external constraints such as data residency laws and privacy regulations. Existing state-of-the-art graph learning methods face multiple challenges and struggle in setups where data is stored decentralized.

In this thesis, we present three innovative solutions tailored to enhance the efficiency and privacy of machine learning on distributed graphs across all levels of decentralization. First, Retexo is introduced as a framework to address the prohibitive costs of training Graph Neural Networks (GNNs) on distributed graphs. By optimizing the communication overheads associated with network communication, Retexo achieves significant reductions in data costs—up to 2 orders of magnitude—while maintaining accuracy of the state-of-the-art methods. Importantly, Retexo scales gracefully across diverse decentralization levels, spanning centralized datacenter networks to mobile and edge networks.

The second solution, LPGNet, addresses privacy concerns inherent in distributed graph learning. Traditional graph convolutional networks (GCNs) are vulnerable to link-stealing attacks, compromising the privacy of sensitive edges. LPGNet introduces a novel neural network architecture designed for training on graphs with privacy-sensitive edges. While providing differential privacy guarantees, LPGNet strikes a delicate balance between privacy and utility, outperforming existing mechanisms in preserving sensitive information and mitigating link stealing attacks. LPGNets are compatible with Retexo and effective across all decentralization levels.

The thesis further extends privacy-preserving machine learning to unattributed graphs lacking node features. We introduce PrivaCT, a novel approach to compute hierarchical cluster trees using local differential privacy. Concretely, it can enable analytics on federated communication networks while ensuring privacy for users' contacts, even when node features are unavailable. Empirical results demonstrate comparable tree quality with minimal utility loss and reasonable privacy guarantees. Additionally, the application of private hierarchical cluster trees is illustrated through the redesign of social recommendation algorithms for federated social network setups, showcasing significant improvements over non-private baselines. PrivaCT being designed to provide local differential privacy, can also work for all levels of decentralization.