CS410, Summer 1998, Homework 6

Forty Versions of Quicksort

Due: 11:35 AM, Thursday July 30

Last Updated: 27 July, 9:30 AM.

Note: You are responsible for following the course policy on academic integrity.

Warning

For the most part, this is not a programming assignment, although programming is a necessary part of it. This is a scientific experiment. You will conduct an experiment, collect the data, and interpret the results. Only a fraction of your grade will depend on the correctness, style, and efficiency of your program.

Overview

In class we discussed several possible improvements to the quicksort algorithm. These include:

Switching to insertion sort for subproblems below a particular size threshold.
Always doing the smaller subproblem first.
Pivoting around the median of k randomly-selected elements.

Your job is to create a lab report interpreting the results of applying various combinations of these strategies. You will sort arrays of randomly-generated Sortable objects, where Sortable is a Java interface. Your arrays will actually consist of OrderedInteger objects where


OrderedInteger implements Sortable

. Basically, OrderedInteger is a wrapper for int that provides a lessThan method. (The extra credit explores the implications of using a wrapper class.)

In other words:

Your sorting algorithm must be polymorphic over Sortable objects.
Collect data using arrays of OrderedInteger objects where each object's value is a randomly-generated integer.

What We Have Done For You

Since much of the assignment is not about programming, we are providing you quite a bit. This includes the files:

Sortable.java
Defines the interface described above.
OrderedInteger.java
Defines the implementation of the interface described above.
Tester.java
A fully-functioning testing apparatus provided you have implemented the sorting algorithms appropriately. Part of the assignment is understanding what this file does.
Sorter.java
A file with holes for you.

Code you need to write

You must implement the sort method in Sorter.java. Your implementation should use quicksort on problems of size greater than switchSize and insertion sort on smaller problems. If doShortFirst is


true

then quicksort should always solve the smaller subproblem first. Else it should always solve the "left" subproblem first. To choose a pivot element, your partition algorithm should choose medianOf random indices in the appropriate range and choose the median of the values in those array indices.

The values of switchSize, doShortFirst, and medianOf are set in the constructor of a Sorter object and should not be subsequently modified.

Constant-factor efficiency matters since the whole purpose of the experiment is to compare asymptotically-equivalent methods. Your sorting implementation should not make any calls to new except the one in the constructor already written for you. Your method for finding the median of medianOf elements should re-use the same private possible array each time (already written for you). Use the modified form of insertion sort we discussed in class for finding the median.

Advice: Make sure your implementation is correct. Test separately that your median-finding code works. Test that your sorting code works. Experimental data has little meaning if you are not conducting the experiements you think you are. We will test your sorting program for correctness. This includes that it sorts its input and that it follows the method-switching, subproblem-doing, and median-finding specifictions.

What you need to do

The test program will produce a 160 line output file which records 160 data points using your sorting methods. Specifically, every combination of the following parameters is recorded:

Number of elements sorted: 10000, 20000, 30000, 40000
Threshold below which insertion sort is used: 0, 10, 25, 40
Is the smaller subproblem always done first: yes, no
Number of random elements from which the median is the pivot: 1, 3, 5, 7, 9

The output file has each of these parameters separated by tab characters, followed by the running time in milliseconds. Actually, the running time is the average of four runs with the parameters. This format makes it convenient to use other software (such as a spreadsheet or math program) to help interpret the data.

Your job is to create a lab report interpreting the data. Your report should be formatted appropriately. Although you do not need to follow this outline exactly, make sure you have:

Title
Introduction -- A short summary of the experiment, why it is relevant, and what it showed.
Experimental Procedures -- Be thorough! Include at least:
- The hardware used.
- The software used.
- The type of data used.
- The number of experiments conducted.
- The data gathered.
Some of this information can be taken from this document, although be sure to emphasize any deviations. Some of it will be different for each report. Include enough information so that someone could reasonably re-create the experiment. Give a short overview of your implementation. Describe the different improvements you implemented. Refer to your complete source code, which should be attached as an appendix to your report. For the sake of completeness, include all source files.
Results -- Do not just quote numbers; organize and summarize them. Explain what trends are observed. Investigate any issues which are relevant and interesting, including:
- Whether the data supports the theoretical running time of quicksort.
- How each of the alleged improvements affects running time.
- How combinations of improvements interact.
This should probably be the longest section of your report. It is unlikely that you can present the data effectively without the use of several well-chosen graphs.
Conclusions -- Hypothesize why you observed the trends you did. Do not just quote numbers; interpret them. Discuss what trade-offs are probably occurring. Mention further experiments you might perform to verify your conclusions or test new hypotheses.
Summary -- Briefly discuss how your conclusions could be appropriately used by others making implementation decisions.
Raw Data -- Include the data you collected in an appendix.

Making Changes

You are welcome to make changes to the testing code in order to collect different data which you feel will make for a better report. Since your report will discuss the data in detail, any changes will be apparent to the grader. Also, realize that the instructor's sample solution takes about 2.5 minutes to run on a 266MHz Pentium II with 64MB RAM using Visual J++. You may need to change data sizes so that the running time is reasonable for your environment (too short causes bad data, too long causes impatience).

The code uses a PrintWriter object to produce output. It should be trivial to change this to be compatible with older versions of Java.

Advice

Do not use large data sets until you are sure your implementation is correct. Otherwise, you will be wasting your time.
Spend most of your time on the report. You will be graded on presentation. Style is just as important in English as it is in Java. Be well-organized, clear, and coherent.
Think while you crunch the numbers. The goal is to do good science/engineering and discover natural patterns.

Extra Credit

Making our sorting implementation polymorphic greatly increases its usefulness as we create new kinds of data which must be sorted. But this might be at the expense of making it less efficient for storing small numbers. For extra credit, explore the running time effect of polymorphism in this context. Do this by comparing/contrasting with two other possible implementations:

Sorting OrderedInteger objects instead of Sortable objects. That is, make a version of your program which knows it is sorting OrderedInteger objects.
Sorting ints. That is, make a version of your program that does not sort Objects.

Make the appropriate additions to your report to present, interpret, and conclude your findings.