Making LLMs Listen, Speak, and Think: A Journey to End-to-End Speech Language Models
COM3 Basement
MR92, COM3 B1-15

Abstract:
This presentation charts the evolution of Speech Language Models (SLMs) through the lens of our recent research. We begin by revisiting the landscape of encoder-based speech foundation models, then shift our focus to the development of end-to-end systems capable of listening and responding. We describe the process of integrating speech encoders with text-based LLMs, emphasizing critical training strategies to prevent catastrophic forgetting. Next, we address the challenge of enabling LLMs to speak. To overcome the computational bottlenecks of conventional waveform-to-token autoregressive models, we introduce novel speech representation techniques that optimize sequence length and efficiency. The talk concludes with an approach to empowering SLMs with reasoning capabilities, advancing toward models that can "think while speaking."
Bio:
Hung-yi Lee is a professor of the Department of Electrical Engineering at National Taiwan University (NTU), with a joint appointment at the Department of Computer Science & Information Engineering of the university. His recent research focuses on developing technology that can reduce the requirement of annotated data for speech processing (including voice conversion and speech recognition) and natural language processing (including abstractive summarization and question answering). He won Salesforce Research Deep Learning Grant in 2019, AWS ML Research Award in 2020, Outstanding Young Engineer Award from The Chinese Institute of Electrical Engineering in 2018, Young Scholar Innovation Award from Foundation for the Advancement of Outstanding Scholarship in 2019, Ta-You Wu Memorial Award from Ministry of Science and Technology of Taiwan in 2019, and The 59th Ten Outstanding Young Person Award in Science and Technology Research & Development of Taiwan. He is a Fellow of International Speech Communication Association (ISCA). He owns a YouTube channel teaching deep learning technology in Marian, which has more than 350,000 subscribers.

