CS SEMINAR

TWEO: FP8 Training And Quantization For Dummies

Speaker
Dr Jianxin Wu, Professor, School of Artificial Intelligence at Nanjing University, China
Chaired by
Dr HE Bingsheng, Professor, School of Computing
hebs@comp.nus.edu.sg

09 Jul 2026 Thursday, 04:00 PM to 05:00 PM

MR1, COM1-03-19

Abstract:

Native FP8 support is essential for training large Transformers, but is severely hindered by extreme activation outliers. Existing solutions either rely on complex mixed-precision engineering or invasive architectural modifications. We fundamentally challenge the conventional wisdom that outliers are data-driven, and demonstrate that extreme outliers are a data-independent, mechanically-produced artifact of training. In this talk, I will introduce TWEO, a novel, non-invasive solution. TWEO effectively prevents extreme outliers (from 10000+ to < 20). It is very simple, neatly enables full-model FP8 pre-training for both LLM and ViT, achieves performance comparable to the BF16 baseline, while delivers a 36\% increase in training throughput. Also, TWEO enables a new quantization paradigm: hardware-friendly W8A8 per-tensor static quantization of LLMs.

Bio:

Jianxin Wu received his BS and MS degrees from Nanjing University, and PhD degree from the Georgia Institute of Technology, all in computer science. He is a professor in the School of Artificial Intelligence at Nanjing University and the National Key Laboratory for Novel Software Technology, China. He has served as a program chair for CVPR'24, (senior) area chair for NeurIPS, CVPR, ICCV, ECCV, AAAI and IJCAI, and as an associate editor for IEEE T-PAMI. His research interests are computer vision and machine learning.