CS SEMINAR

Programmable Networks for Distributed Deep Learning: Advances and Perspectives

Speaker
Marco Canini, Professor of Computer Science, Computer, Electrical and Mathematical Sciences & Engineering, King Abdullah University of Science and Technology (KAUST)
Chaired by
Dr Jialin LI, Sung Kah Kay Assistant Professor, School of Computing
lijl@comp.nus.edu.sg

26 Jan 2026 Monday, 11:00 AM to 12:30 PM

SR11, COM3 01-20

Abstract:

Training large deep learning models is challenging due to high communication overheads that distributed training entails. Embracing the recent technological development of programmable network devices, this talk describes our efforts to rein in distributed deep learning's communication bottlenecks and offers an agenda for future work in this area. We demonstrate that an in-network aggregation primitive can accelerate distributed DL workloads, and can be implemented using modern programmable network devices. We discuss various designs for streaming aggregation and in-network data processing that lower memory requirements and exploit sparsity to maximize effective bandwidth use. We also touch on gradient compression methods, which contribute to lower communication volume and adapt to dynamic network conditions. Lastly, we consider how to continue our research in light of the enormous costs of training large models at scale, which make it quite hard for researchers to tackle this problem area.

Bio:

Marco asked a swarm of AI agents about the next big thing. They negotiated, escalated, and concluded: “it depends on the infrastructure.” He took the hint. Marco’s research spans distributed systems, large-scale/cloud computing, and computer networking, with an emphasis on programmable networks. His current focus is building better systems support for AI/ML—practical implementations that deploy in the real world. He is a Professor of Computer Science at KAUST. Marco earned his Ph.D. in Computer Science and Engineering from the University of Genoa in 2009, after a visiting year at the University of Cambridge. He was a postdoctoral researcher at EPFL and a senior research scientist at Deutsche Telekom Innovation Labs & TU Berlin. Before joining KAUST, he was an assistant professor at UCLouvain. He has also held positions at Intel, Microsoft, and Google.