PH.D DEFENCE - PUBLIC SEMINAR

Towards Practical Vertical Federated Learning

Speaker
Mr. Wu Zhaomin
Advisor
Dr He Bingsheng, Professor, School of Computing


03 May 2024 Friday, 03:00 PM to 04:30 PM

SR12, COM3 01-21

Abstract:

The increasing need for high-quality data to train advanced machine learning models is evident. However, the challenge lies in the fact that data is often sensitive and distributed across multiple parties, making it difficult to share due to privacy regulations. Federated learning has emerged as a promising solution, allowing for the training of machine learning models on distributed data while preserving privacy. Despite the attention accorded to horizontal federated learning, vertical federated learning (VFL) - wherein parties share the same instance set but hold distinct features - has gained comparatively less focus.

Several challenges obstruct the practical application of VFL. The first challenge is the lack of a comprehensive benchmark due to a limited pool of real VFL datasets, resulted from privacy concerns. The second challenge lies in the efficacy of VFL algorithms, particularly when considering record linkage. Traditional linkage methods may not capture key features, and the separation of the linkage process from the training process can negatively impact performance. The third challenge is the communication efficiency of VFL, often impeded by substantial communication costs and unstable connections, which present significant obstacles for existing VFL algorithms. The imposition of privacy constraints on linkage keys necessitates further adaptations in models and algorithms designed for learning without direct data linkage.

In addressing the challenges presented by VFL, we have embarked on several critical initiatives. Firstly, we have developed VertiBench, a robust benchmark system designed to pinpoint key factors influencing VFL performance. This system also introduces tailored evaluation metrics and dataset splitting methodologies, enhancing the accuracy and relevance of our analyses. Secondly, we designed FedSim, an innovative training paradigm that integrates a one-to-many linkage process into training, thereby significantly improving model performance in federated environments, especially those with fuzzy identifiers. Thirdly, we have crafted a one-shot VFL algorithm, FedOnce, specifically targeting challenges related to communication and synchronization in VFL. Lastly, our continuous endeavor is the design of a model named FeT capable of effectively learning from vertically distributed data, without the dependency on explicit linkage, pushing the boundaries of current federated learning techniques.