
Multiturn Evals (and RL) for LLMs by Kartikeya Badola
August 8 @ 12:00 pm - 1:00 pm
Title: Multiturn Evals (and RL) for LLMs
Details: 8th August, 12 pm, SIT001
Abstract: LLMs often fail at multi-step tasks requiring memory and strategic planning, a gap not captured by traditional single-turn evals. To address this, we’ve developed a suite of human and automated evals that stress test Gemini on these capabilities. This talk will cover the motivation and design behind these evals, a discussion on latest results and will also touch upon some of the early promising experiments using multiturn RL methods to address some of these losses.
Bio: Kartikeya Badola is a Software Engineer at Google DeepMind in London, where he works with the Gemini evals and Gemini thinking teams. Prior to this, he was with Google Research in India, working on multilingual semantic parsing. Kartikeya is a graduate of IIT Delhi, where he worked with Prof. Mausam and Prof. Parag Singla on Distantly Supervised Relation Extraction.