CS SEMINAR

Plumbing the Breadth and Depth of LLM Evaluation

Speaker

Tim Baldwin, Professor, Mohamed bin Zayed University of Artificial Intelligence

Chaired by

Dr NG Hwee Tou, Provost's Chair Professor, School of Computing

nght@comp.nus.edu.sg

01 Aug 2024 Thursday, 02:00 PM to 03:00 PM

Abstract:

The recent surge in generative large language models (LLMs) has created even greater challenges for NLP evaluation. In this talk, I will cover a range of LLM evaluation initiatives covering issues including: LLM capabilities across a broad range of tasks; multilingual and multicultural capabilities; the ability of models to capture different aspects of negation; model calibration; and model safety.

Biodata:

Tim Baldwin is Provost and Professor of Natural Language Processing at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), in addition to being a Melbourne Laureate Professor in the School of Computing and Information Systems, The University of Melbourne and Chief Scientist of LibrAI.

Tim completed a BSc(CS/Maths) and BA(Linguistics/Japanese) at The University of Melbourne in 1995, and an MEng(CS) and PhD(CS) at the Tokyo Institute of Technology in 1998 and 2001, respectively. He joined MBZUAI at the start of 2022, prior to which he was based at The University of Melbourne for 17 years. His research has been funded by organisations including the Australian Research Council, Google, Microsoft, Xerox, ByteDance, SEEK, NTT, and Fujitsu, and has been featured in MIT Tech Review, Bloomberg, Reuters, CNN, The Economist, Financial Times, IEEE Spectrum, The Times, and ABC News. He is the author of around 500 peer-reviewed publications across diverse topics in natural language processing and AI, with over 25,000 citations and an h-index of 75 (Google Scholar), in addition to being an ARC Future Fellow, and the recipient of a number of awards at top NLP conferences.

Plumbing the Breadth and Depth of LLM Evaluation

COM1 Level 2