polastre.com: Boom 2001: Decision Theory in OS Scheduling

    Music
    Pictures

    Talks/Pubs
    Research
    Resume
    Education

Joe Polastre '01 (jrp17@cornell.edu) - February 28, 2001

     In the traditional view of Operating Systems, jobs are submitted to the system with the intention of executing before some specified deadline. A job is simply defined by its length and its deadline. All of the jobs waiting to execute are in a ready queue, and a scheduler executes each job in the best order it sees such that as many jobs as possible will fnish before their deadline.
     This model is naive and does not give much information about a job, such as priority. Additionally, what if failing to finish a job by its deadline results in catastrophic failure--like nuclear meltdown. One would certainly want to model this in a real-world system.
     Besides the deadline not providing much information, it is rare that a job will always take a certain amount of time to execute. Typically, there will be a wide range of possible running times for each job, and each running time is associated with a probability that the job will take that much time. The running time, or length, of a job is left to chance. Now, one can see why it may be useful to apply decision theory to short-term scheduling in operating systems.
     Most of this is very similar to Stochastic Scheduling in systems. The difference is that stochastic schedul- ing typically only defines a probability distribution over running times for a specific job, but does not apply any utility to completing that job at a specific time. By using utilities, we can model all of tra- ditional scheduling, plus add additional power to the model through the use of the utility functions.

The use of utility functions and a histogram introduces a more powerful method of describing jobs. A utility function could model the traditional view of jobs having release times and deadlines. The value of the function would be positive (providing some benefit to the system) if the job finished between the release time and the deadline, otherwise the value of the function would be zero or negative. The task of maximizing the overall utility of the system would also result in scheduling as many jobs as possible that could finish in a time between its release time and deadline. If a job has a fixed length, then that length has probability 1 of occurring.

The system is defined as follows:
o Each job has a continuous utility function and a probability distribution that describes the importance and running time of the job.
o Maximizes the utility of a real-world system (instead of scheduling by fixed running time or priorities)
o Users can have a utility "budget" (such as in a supercomputing system) and assign utility to jobs of their importance over time.
o Previous research shows usefulness of utility in database systems [Halpern, Chu, Seshadri, 1999]

Two variations:
o Bounded Decision Scheduling (BDS): Is there a schedule such that the total utility is greater than or equal to a bound B?
o Optimal Decision Scheduling (ODS): Given a schedule S, is the total utility of S optimal?

BDS is NP-Complete. It is reducible to Scheduling with Release Times and Deadlines.

ODS is co-NP-Complete. It is reducible to Scheduling with Release Times and Deadlines.

In order to efficiently schedule jobs in real systems, we have compiled and tested four heuristics.

First In First Out (FIFO). FIFO is rather self-explanatory. For all the jobs in the ready queue, run them in the order they were submitted to the system. The implementation of this heuristic is very simple. FIFO takes constant time to decide which job to schedule next.

Shortest Expected Job First (SEJF). SEJF is analogous to Shortest Job First in traditional scheduling. The expected running time of each job in the ready queue is calculated, and the job with the shortest expected running time is executed first.

Greatest Expected Benefit First (GEBF). FIFO and SEJF do not take into account utility. It would seem that by finding the expected utility of each job, and greedily running the one with the greatest expected utility first would result in a better schedule.

Greatest Expected Rate of Return First (GERR).Often times we want to perform options that give us the best return for the least amount of time. For example, if you are offered $5 to work for one hour, or $10 to work for 10 hours, it seems that most would choose the $5 since another option may come along after the hour of work is completed. Likewise, GERR computes and runs the job with the highest expected rate of return.

To test the real-world performance of the heuristics, a simulation was created in Java. The simulation included all four of the heuristics described earlier. It also implemented linear and exponential utility functions. A hard-deadline utility function was implemented to test the way each heuristic responds to making or missing a deadline. The simulator has two modes: (1) all the jobs are submitted before the CPU starts and no jobs are submitted while other jobs are being processed, and (2) an online method that allows jobs to be added to the ready queue while other jobs are running. Each job's actual running time could follow a uniform or a normal distribution.
The test harness takes as input the number of tests to perform, the maximum number of jobs in each test, and the maximum time that a job should take. A random number of jobs of at most the maximum number specified is generated for each test. With equal probability, a job follows either the uniform or normal distribution. Likewise, with equal probability, the utility function associated with each job is either linear, exponential, or hard-deadline. The parameters for each type of utility function are randomly generated.

     One set of graphs depicts the results of 500 tests with jobs defined with the following properties:
o Probability Distribution
    x Uniform
    x Normal
o Utility Function
    x Linear
    x Exponential
    x Hard Deadline
The second set of graphs only has Linear Utility Functions.

One hundred tests were run in all cases. The maximum number of jobs was tested at 50, 100, and 200. The maximum time for each job tested was 10, 50, and 100. In more than 60% of the tests, GERR resulted in the highest overall utility. SEJF, on average, produced the second highest utility, followed by GEBF and lastly FIFO. As the number of jobs increased and the maximum length of the job increased, SEJF was more a more stable algorithm and performed better than GERR. For many small jobs, or few jobs, GERR performed best. The fact that SEJF was more stable and produced better results in the case of many jobs and large running times is surprising since this heuristic does not consider utility when scheduling. This information can be viewed in the graphs published below.

50 jobs

100 jobs

200 jobs

linear utility, 50 jobs

linear utility, 100 jobs

linear utility, 200 jobs

Unexpected Results:
o Why do SEJF and FIFO become much better as the number of jobs increase?
o What relationship is there between scheduling and distribution of the jobs (and probability distributions or utility functions)?
o Are there other heuristics that perform better that we haven't tested?

... this work is ongoing.

Created from main.xml / style.xsl with Xalan and Xerces-J XML tools.