CS SEMINAR

I/O Coordination for Better Resource Sharing -- From HPC to AI Storage

Speaker
Xiaosong Ma, Department Chair and Professor of Computer Science, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE
Chaired by
Dr HE Bingsheng, Professor, School of Computing
hebs@comp.nus.edu.sg

17 Nov 2025 Monday, 10:00 AM to 11:30 AM

MR1, COM1-03-19

Abstract:
Despite tremendous improvement in absolute capacity and speed, the storage subsystem remains one of the slower, less predictable, and less scalable components of large-scale applications. In this talk, through a personal journey of parallel and distributed storage systems, I hope to share observations and lessons from these past projects. In particular, across the many layers of the storage hierarchy studied, from the CPU caches to supercomputer and cloud storage clusters, we repeatedly encounter the same underlying challenge of efficient sharing of I/O resources. Without effective regulation of I/O flows, the storage system can easily be a performance bottleneck while remaining largely underutilized. The discussion of solutions continues into related new (and old) storage problems with today's parallel AI training and inference workloads.

Short bio:
Xiaosong Ma is currently professor and chair of the Computer Science department at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), UAE. Her research interests are in storage, distributed/cloud computing, and graph systems. Xiaosong has published over 100 research papers, with multiple design/monitoring solutions adopted in industry and large-scale deployments. She served on the PC of major conferences such as OSDI, ASPLOS, EuroSys, and FAST. Prior to joining MBZUAI, she worked at Qatar Computing Research Institute (QCRI) and NC State University, where she received both the DOE Early Career Principal Investigator Award and the NSF CAREER Award. Xiaosong received her Ph.D. from the University of Illinois at Urbana-Champaign in 2003 and her B.S. from Peking University in 1997.