COMPUTER SCIENCE RESEARCH WEEK JANUARY 2025
Dr Gus Xia, Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence
Professor Sakriani Sakti, Human-AI Interaction (HAI) Research Laboratory at the Nara Institute of Science and Technology (NAIST) in Japan
COM3 Level 1
Multipurposed Hall 1, 2 and 3 [COM3 01-26, 01-27 and 01-28]
closeThis is a distinguished talk as part of the NUS Computer Science Research Week 2025 https://researchweek.comp.nus.edu.sg/
10:00 - 11:20 High-level Abstractions for Network Programming - Nate Foster
Abstract: Programmable networks have gone from a dream to a reality. Software-defined networking (SDN) architectures provide interfaces for specifying network-wide control algorithms, and emerging hardware platforms are exposing programmability at the forwarding plane level as well. But despite much progress, several fundamental questions remain: What are the right abstractions for writing network programs? How do they differ from the abstractions we use to write ordinary software? Can we reason about programs automatically and implement them efficiently in hardware? This talk will attempt to answer these questions by exploring the design and implementation of high-level abstractions for network programming. I will present NetKAT, a language for programming the forwarding plane based on a surprising connection to regular languages and finite automata, along with several extensions.
Bio: Nate Foster is a Professor of Computer Science at Cornell University and a Visiting Researcher at Jane Street. The goal of his research is to develop languages and tools that make it easy for programmers to build secure and reliable systems. He received PhD in Computer Science from the University of Pennsylvania. His awards include the ACM SIGPLAN Robin Milner Young Researcher Award, the ACM SIGCOMM Rising Star Award, a Sloan Research Fellowship, and an NSF CAREER Award.
13:00 – 14:20 Knowledge Incorporation and Emergence for Music AI - Gus Xia
Abstract: Large language models have demonstrated remarkable capabilities in both symbolic and audio music generation. However, they still fall short in embodying human-like music knowledge, which limits their interpretability and control. In this talk, Gus will explore two approaches to enhance the interpretability of music generative models. The first approach involves directly incorporating hierarchical music structures into the model, leading to state-of-the-art results in whole-song pop music generation. The second approach leverages metaphysical inductive biases to allow human-like music knowledge to "emerge" naturally from the learning process. Pioneer studies in this direction have already given rise to fundamental music concepts like pitch and timbre. Together, these strategies pave the way for more controllable and interpretable music AI systems.
Bio: Dr Gus Xia is an assistant professor of machine learning at MBZUAI, as well as an affiliated faculty at NYU Shanghai, CILVR at the Center for Data Science, and MARL at Steinhardt. He received his Ph.D. in the machine learning department at Carnegie Mellon University (CMU) in 2016, and he was a Neukom Fellow at Dartmouth from 2016 to 2017. Xia’s research is very interdisciplinary and lies in the intersection of machine learning, HCI, robotics, and computer music. Some representative works include interactive composition via style transfer, human-computer interactive performances, autonomous dancing robots, and haptic guidance for flute tutoring. Xia is also a professional Di and Xiao (Chinese flute and vertical flute) player. He plays as a soloist in the NYU Shanghai Jazz Ensemble, Pitt Carpathian Ensemble, and Chinese Music Institute of Peking University. In 2022, Xia and his students held a Music AI concert in Dubai.
15:00 – 16:20 Machine Speech Chain: Modeling Human Speech Perception and Production with Auditory Feedback - Sakriani Sakti
Abstract: The development of automatic speech recognition (ASR) and text-to-speech synthesis (TTS) has enabled computers to learn how to listen or speak, imitating the capability of human speech perception and production. However, computers still cannot hear their own voice, as the learning and inference to listen and speak are made separately and independently. Consequently, the separate training of ASR and TTS in a supervised fashion requires a large amount of paired speech-text data—furthermore, there is no ability to grasp the situation and overcome the problem during inference.
On the other hand, humans learn how to talk by constantly repeating their articulations and listening to the sounds produced. By simultaneously listening and speaking, the speaker can monitor her volume, articulation, and the general comprehensibility of her speech. Therefore, a closed-loop speech chain mechanism with auditory feedback from the speaker’s mouth to her ear is crucial.
In this talk, I will introduce a machine speech chain framework based on deep learning. First, I will describe the training mechanism that learns to listen or speak and to listen while speaking. The framework enables semi-supervised learning in which ASR and TTS can teach each other given unpaired data. Applications of multilingual and multimodal machine speech chains to support low-resource ASR and TTS will also be presented. After that, I will also describe the inference mechanism that enables TTS to dynamically adapt (“listen and speak louder”) in noisy conditions, given the auditory feedback from ASR.
Bio: Sakriani Sakti is currently the head of the Human-AI Interaction (HAI) Research Laboratory at the Nara Institute of Science and Technology (NAIST) in Japan. She also serves as a full professor at NAIST, an adjunct professor at the Japan Advanced Institute of Science and Technology (JAIST) in Japan, a visiting research scientist at the RIKEN Center for Advanced Intelligent Project (RIKEN AIP) in Japan, and an adjunct professor at the University of Indonesia. A member of JNS, SFN, ASJ, ISCA, IEICE, and IEEE, she currently serves on the IEEE Speech and Language Technical Committee (2021-2026) and as an associate editor for IEEE/ACM TASLP, Frontiers in Language Sciences, and IEICE. Recently, she was appointed as the Oriental-COCOSDA Convener.
Previously, she was actively involved in international collaboration activities such as Asian Pacific Telecommunity Project (2003-2007) and various S2ST research projects, including A-STAR and U-STAR (2006-2011). She served as a visiting scientific researcher at INRIA Paris-Rocquencourt, France (2015-2016) under JSPS Strategic Young Researcher Overseas Visits Program for Accelerating Brain Circulation. She served also as the general chair for SLTU 2016, chaired the "Digital Revolution for Under-resourced Languages (DigRevURL)" Workshops at INTERSPEECH 2017 and 2019, and was part of the organizing committee for the Zero Resource Speech Challenge in 2019 and 2020. She played a pivotal role in establishing the ELRA-ISCA Special Interest Group on Under-resourced Languages (SIGUL), where she has been chair since 2021 and organizes the annual SIGUL Workshop. In collaboration with UNESCO and ELRA, she was the general chair of the "Language Technologies for All (LT4All)" Conference in 2019, focusing on "Enabling Linguistic Diversity and Multilingualism Worldwide," and will lead LT4All 2.0 in 2025 under the theme "Advancing Humanism through Language Technologies.