CS SEMINAR

Optimizing large-scale artificial intelligence systems with persistent memory

Speaker
Dr. Jun Yang, 4Paradigm, Singapore
Chaired by
Dr HE Bingsheng, Professor, School of Computing
hebs@comp.nus.edu.sg

13 Oct 2021 Wednesday, 02:00 PM to 03:00 PM

via Zoom

Abstract:
In this talk, we will share our experience in designing large-scale artificial intelligence (AI) systems for real-world applications, such as antifraud and recommendation systems. Furthermore, we will also show how AI systems can be further enhanced by state-of-the-art non-volatile memory technology (e.g., Intel(R) Optane(TM) Persistent Memory, or PMem). (1) We will first introduce a distributed in-memory database system OpenMLDB (https://github.com/4paradigm/OpenMLDB), which is designed to efficiently support on-line feature extraction. Then we explore the use of the PMem to enhance OpenMLDB to reduce the cost, shorten the tail latency, and provide the capability of fast recovery in a minute. (2) We will also introduce an efficient and scalable parameter server OpenEmbedding (https://github.com/4paradigm/OpenEmbedding), which is specifically optimized for training with high dimensional sparse features. PMem is also adopted for OpenEmbedding to reduce the total cost but still keep efficiency by employing certain cache-aware techniques.


Biodata:
Dr. Jun Yang is currently a system architect in 4Paradigm's Singapore R&D department. Before joining 4Paradigm in Oct 2018, he worked as a research scientist in both DSI and IHPC of A*STAR. He got the Bachelor degree in Shanghai Jiao Tong University (2003-2007), and the Ph.D. degree in Hong Kong University of Science & Technology (2007-2013). He has a number of publications in international prestigious conferences and journals including FAST, VLDB, TC and etc. He has been working on storage system architecture and performance optimization for many years. His current work mainly focuses on the adoption of new hardware (such as Persistent Memory, FPGA...) in large-scale storage systems to achieve better performance in a cost-efficient way. He is also the contributor and reviewer of the open-source Persistent Memory Development Kit (PMDK) community.