CS410, Summer 1998, Homework 6
Forty Versions of Quicksort
Due: 11:35 AM, Thursday July 30
Last Updated: 27 July, 9:30 AM.
Note: You are responsible for following the course policy on academic integrity.
Warning
For the most part, this is not a programming
assignment, although programming is a necessary part of it. This is a
scientific experiment. You will conduct an experiment, collect the
data, and interpret the results. Only a fraction of your grade will
depend on the correctness, style, and efficiency of your program.
Overview
In class we discussed several possible improvements to the quicksort
algorithm. These include:
- Switching to insertion sort for subproblems below a particular size
threshold.
- Always doing the smaller subproblem first.
- Pivoting around the median of k randomly-selected elements.
Your job is to create a lab report interpreting the results of
applying various combinations of these strategies. You will sort
arrays of randomly-generated Sortable objects, where
Sortable is a Java interface. Your arrays will
actually consist of OrderedInteger objects where
OrderedInteger implements Sortable. Basically,
OrderedInteger is a wrapper for int that
provides a lessThan method. (The extra credit explores
the implications of using a wrapper class.)
In other words:
- Your sorting algorithm must be polymorphic over
Sortable objects.
- Collect data using arrays of
OrderedInteger objects where each object's value is a
randomly-generated integer.
What We Have Done For You
Since much of the assignment is not about programming, we are
providing you quite a bit. This includes the files:
-
Sortable.java
Defines the interface described above.
-
OrderedInteger.java
Defines the implementation of the interface described above.
-
Tester.java
A fully-functioning testing apparatus provided you have implemented the
sorting algorithms appropriately. Part of the assignment is
understanding what this file does.
-
Sorter.java
A file with holes for you.
Code you need to write
You must implement the sort method in
Sorter.java. Your implementation should use quicksort on
problems of size greater than switchSize and insertion
sort on smaller problems. If doShortFirst is
true then quicksort should always solve the smaller subproblem
first. Else it should always solve the "left" subproblem
first. To choose a pivot element, your partition algorithm should
choose medianOf random indices in the appropriate range
and choose the median of the values in those array indices.
The values of switchSize, doShortFirst, and
medianOf are set in the constructor of a
Sorter object and should not be subsequently modified.
Constant-factor efficiency matters since the whole purpose of the
experiment is to compare asymptotically-equivalent methods. Your
sorting implementation should not make any calls to new
except the one in the constructor already written for you. Your
method for finding the median of medianOf elements should
re-use the same private possible array each time (already
written for you). Use the modified form of insertion sort we
discussed in class for finding the median.
Advice: Make sure your implementation is correct. Test separately
that your median-finding code works. Test that your sorting code
works. Experimental data has little meaning if you are not conducting
the experiements you think you are. We will test your sorting program
for correctness. This includes that it sorts its input
and that it follows the method-switching,
subproblem-doing, and median-finding specifictions.
What you need to do
The test program will produce a 160 line output file which records 160
data points using your sorting methods. Specifically, every
combination of the following parameters is recorded:
- Number of elements sorted: 10000, 20000, 30000, 40000
- Threshold below which insertion sort is used: 0, 10, 25, 40
- Is the smaller subproblem always done first: yes, no
- Number of random elements from which the median is the pivot:
1, 3, 5, 7, 9
The output file has each of these parameters separated by tab
characters, followed by the running time in milliseconds. Actually,
the running time is the average of four runs with the parameters.
This format makes it convenient to use other software (such as a
spreadsheet or math program) to help interpret the data.
Your job is to create a lab report interpreting the data. Your report
should be formatted appropriately. Although you do not need to follow
this outline exactly, make sure you have:
- Title
- Introduction -- A short summary of the experiment, why it is relevant,
and what it showed.
- Experimental Procedures -- Be thorough! Include at least:
- The hardware used.
- The software used.
- The type of data used.
- The number of experiments conducted.
- The data gathered.
Some of this information can be taken from this document,
although be sure
to emphasize any deviations. Some of it will be different for each
report. Include enough information so that someone could reasonably
re-create the experiment. Give a short overview of your
implementation. Describe the different improvements you implemented.
Refer to your complete source code, which should be attached
as an appendix to your report. For the sake of completeness, include
all source files.
- Results -- Do not just quote numbers; organize and summarize
them. Explain what trends are observed. Investigate
any issues which are relevant and interesting, including:
- Whether the data supports the theoretical running time of quicksort.
- How each of the alleged improvements
affects running time.
- How combinations of improvements interact.
This should probably be the longest section of your report. It is unlikely
that you can present the data effectively without the use of several
well-chosen graphs.
- Conclusions -- Hypothesize why you observed the trends you did.
Do not just quote numbers; interpret them. Discuss
what trade-offs are probably occurring. Mention further experiments
you might perform to verify your conclusions or test new hypotheses.
- Summary -- Briefly discuss how your conclusions could be
appropriately used by others making implementation decisions.
- Raw Data -- Include the data you collected in an appendix.
Making Changes
You are welcome to make changes to the testing code in order to
collect different data which you feel will make for a better report.
Since your report will discuss the data in detail, any changes will be
apparent to the grader. Also, realize that the instructor's sample
solution takes about 2.5 minutes to run on a 266MHz Pentium II with
64MB RAM using Visual J++. You may need to change data sizes so that
the running time is reasonable for your environment (too short causes
bad data, too long causes impatience).
The code uses a PrintWriter object to produce output. It
should be trivial to change this to be compatible with older versions
of Java.
Advice
- Do not use large data sets until you are sure your implementation
is correct. Otherwise, you will be wasting your time.
- Spend most of your time on the report. You will be graded on
presentation. Style is just as important in English as it is in Java.
Be well-organized, clear, and coherent.
- Think while you crunch the numbers. The goal is to do good
science/engineering and discover natural patterns.
Extra Credit
Making our sorting implementation polymorphic greatly increases its
usefulness as we create new kinds of data which must be sorted. But
this might be at the expense of making it less efficient for storing
small numbers. For extra credit, explore the running time effect
of polymorphism in this context. Do this by comparing/contrasting
with two other possible implementations:
- Sorting
OrderedInteger objects instead of
Sortable objects. That is, make a version of your program which
knows it is sorting OrderedInteger objects.
- Sorting
ints. That is, make a version of your
program that does not sort Objects.
Make the appropriate additions to your report to present, interpret,
and conclude your findings.