
- This event has passed.
The Case for Decentralised Scheduling in Modern Data Centres
March 25 @ 12:00 pm - 5:00 pm
Abstract: Modern data centres serve as a backbone for executing diverse workloads. The growing demand for resources has led to high traffic volumes, requiring clusters to operate at high utilisation. In this talk, I will examine how current data centre schedulers, responsible for mapping workload tasks to resources, perform under such challenging conditions. I will discuss how centralised schedulers struggle to scale under high load, as they generate significant network traffic by continuously transferring up-to-date node data. Conversely, distributed schedulers scale well but lack a global cluster view, leading to suboptimal task allocations. As a result, existing schedulers impose up to three times longer wait times on tail tasks, that is, tasks that finish last among tasks of a job, leading to increased task and job completion times.
I will then introduce our work on decentralised scheduling, focusing on performance, scalability, and load balancing. These schedulers have been under-explored due to their design complexity. However, we demonstrate that Murmuration, our job-aware decentralised scheduler, achieves high performance despite its simple approach using approximate load information. It does so by reducing scheduler-node communication overhead while still achieving balanced load distribution across nodes. Prototype evaluations show that Murmuration reduces task wait times under both normal and high cluster loads, improving median job completion times by 25% as compared to default Kubernetes’ centralised scheduler. Simulations further show that it outperforms various distributed and hybrid schedulers by two orders of magnitude. By the end of this talk, I hope to convince you that decentralised schedulers strike the right balance between performance and scalability, and are indeed a practical solution for today’s high utilisation data centres.
Bio: Smita Vijayakumar recently completed her PhD from the Department of Computer Science and Technology at the University of Cambridge, under the supervision of Evangelia Kalyvianaki. As a part of her thesis, she developed a novel decentralised scheduling framework to reduce tail task latencies in highly utilised data centres. She has over twelve years of industry experience at companies like Cisco and Juniper, working on networking, cloud computing, and distributed systems. She also has an MS from The Ohio State University, where her work investigated cloud resource allocation to bottleneck stages for processing streaming applications. Her research has been published in top-tier conferences, and also as a book. She has also been actively involved in mentoring, teaching, and community leadership, including founding Women Who Go, India. Smita’s expertise spans resource management, scheduling, and scalable distributed systems.