Carl A. Waldspurger and William E. Weihl. Lottery Scheduling: Flexible Proportional-Share Resource Management. In First Symposium on Operating Systems Design and Implementation (OSDI), pages 1-11. USENIX Association, 1995.
Notes by Snorri Gylfason
February 26, 1998
The main idea is to use economic mechanism to provide efficient and dynamic resource management. Lottery scheduling "guarantees" fairness and prevent starvation, it is simple and elegant and does not introduce much overhead.
Each client (thread) gets a number of lottery tickets. In each round (time quantum) one winner is picked. The client that owns the winning ticket gets the shared resource for the next time quantum. The winning ticket is selected from all active tickets. That means that when more tickets are added to the poll the value of each ticket decreasesthis is called ticket inflation. We usually think of inflation as a bad thing but here it is very usefulit is the key to dynamic balancing between clients. Instead of assigning some priority to a client we decides how well it should do compared to others. If a client A has twice as many tickets as B it should get the resource twice as often as B; no matter how many other clients are around. A user has a direct control over the service rates and even more important he/she has much more feeling what is going on than in a priority based system.
With a little extra effort we can give each client the same load-balancing control over its sub-clients as the system has of the top level clients. We can think of this as each client having its own lottery for its sub-clients. We are of course still playing for the shared resource so there must be base tickets behind the tickets in the local lottery. We say that each local lottery have its own currency. We can think of different currency as a re-naming of the base tickets.
A client with p% of the tickets should get p% of the shared resource. If all clients use all the quantum this is the case, but not if for some reason a client is not able to do that (e.g. if it is blocked while reading file stream). To fix this the system gives that client a bunch of compensation tickets (the number of tickets depends on how much of the quantum it used). With more tickets in its pocket it has better chance to win relative to others. The compensation tickets expires when the client gets the quantum next.
I have the feeling that there is no need for compensation tickets - this should even out between clients on the long run. There is a overhead due to this because we must keep track of number of compensation tickets each client has. But on the other hand I think this is a good example what new possibilities the dynamic of the system offers.
Lottery tickets can be transferred temporarily from one client to another. If a high priority client need a service from another program it can send a bunch of tickets along with the service request. This way it gets better service. On reply the client destroys those ticketsnote that it must be the responsibility of the client to destroy the tickets otherwise a malicious server could collect tickets.
A good example of where this could be useful is in interactive programs. A window system could send along with events (e.g. on keystrokes) tickets to improve response time.
In the paper the authors propose a solution to the priority inversion problem. If a client tries to acquire a locked resource (e.g. a mutex) it lends its tickets to the client that currently has the resource.
This makes sense because the importance of a mutex increases with more clients (tickets) waiting for it. But there is a problem with this solution the authors didn't mention in the paper. A malicious low priority client could lock some shared resources and just keep them, now when a higher priority client tries to get the resource the malicious client "borrows" lots of tickets and can run on higher priority as long as the high priority client waits.
One must keep in mind other scheduling mechanism when reading this paper. For
time-sharing resources Round-Robin is probably the simplest scheduling algorithm. It is
fair, keeps the resource busy and has good throughput of clients. But it is too simple. It
does not give the system any control of how much each client gets of the resource.
Multilevel Queue Scheduling is an attempt to solve this. The user can classify clients and
give each class different priority. This gives us control but introduces new problems,
like starvation and priority inversion. To solve that more some features have been added
such as aging and priority inheritance. For this reason the scheduling algorithms have
become complicated and hard to understand.
The lottery scheduling has all the desired properties. In addition it gives the user a
direct dynamic control how to share the resource.
It is indeed simple but it probably has more overhead than other scheduling mechanism. For instance in Round-Robin or Multilevel Queue scheduling there is close to none overhead (just pick the next one). Here, on the other hand, we have to search for the winning ticket ( O(lg n) time ).
The paper was well written, easy to understand and fun to read. But I would have liked to see more comparison with conventional scheduling mechanism. I also thought that the graphs were unreadable and it would have been better to use normalized graphs (after all we are interested in how well each client does relative to others).
This page is part of the CS614 (Advanced Systems) web, Dept. of Computer Science, Cornell University.