Optimizing LLM serving at scale with open-source project llm-d
Abstract:
Large Language Models have fundamentally redefined the computational landscape, but their deployment and operation present optimization challenges that defy the playbooks developed for traditional cloud workloads. Unlike conventional workloads, LLM operations must contend with extreme variability across multiple dimensions: workload characteristics, diverse SLO requirements, and heterogeneous accelerator landscapes. This multidimensional complexity creates a challenging trilemma between operational costs, energy consumption, and performance requirements.
Academic research has made significant advances in isolated optimization areas; however, these solutions often operate in silos, missing the compounding benefits of coordinated optimization strategies. This keynote presents llm-d, an open-source project pioneered by IBM Research, Red Hat, Google, and other community partners to address this fragmentation and accelerate progress. llm-d provides a holistic, modular, and extensible architecture that harmonizes multiple optimization areas—including distributed KV-caching, auto-scaling, and intelligent scheduling —enabling contextually appropriate combinations of techniques based on workload characteristics and deployment constraints. In this talk I will deep dive into the multiple llm-d optimization areas, share lessons learned from co-leading this open-source initiative, and discuss how the architecture enables rapid experimentation and evolution as the technology landscape shifts, along with practical insights from community adoption and real-world deployments
Bio:
Dr. Tamar Eilam is an IBM Fellow and a Chief Scientist for Sustainable Computing in the IBM T. J. Watson Research Center in New York. She leads research in Broad AI Sustainability, with current focus on LLM serving optimization at scale. Dr. Eilam serves as an Ex-Officio member of the National Academies Roundtable on AI and Climate, and also represents IBM on the Green Software Foundation, where she advises on AI efficiency standards. She is a prolific researcher and a frequent international keynote speaker with numerous publications and patents spanning cloud computing, DevOps, and sustainability.

