grader 1.6
----------

This program is designed to read in a set of exam scores and generate an
appropriate graphical representation. The problem with displaying exam
scores is that scores are an inaccurate measure of student performance.
If they are displayed as delta spikes, the display implies a level of
accuracy that the exam does not provide; it is difficult for the reader
to judge the density of exam scores in any given region; and there is
some reason to be concerned about loss of privacy. Another typical
approach is to clump scores together by ranges. This approach discards
useful information about the scores, since an 80 and an 89, say, are
treated as identical.

The approach taken in this program is to treat each score as a data     
item generated by a random variable (the student). Given the data item, 
we construct the distribution of this hypothesized random variable. The 
distributions of each of the students are summed to construct a         
hypothesized distribution for scores of an infinitely-large student     
population. It is this distribution that is plotted.                    

The main input parameter is the 'spread', which is an estimate of       
the accuracy of the measurement performed by the test. If a gaussian    
distribution is assumed for the student's performance, the spread is    
the standard deviation of that performance. The default form of the     
student random variable is binomial. The fiction is that the test is    
composed of some number of subproblems, and the student has a fixed     
probability of getting each of them right. The spread determines the    
number of sub-problems so that for a score of 50, the distribution has  
the specified standard deviation.                                       

The student distribution is constructed according to aposteriori
principles.  Given a data item D and a hypothesis H, the probability of
H given D is P(H|D) = P(D|H) * P(H)/P(D). The term P(D|H) is given by
the form of the hypothesized distribution. The term P(H) is an apriori
estimate of the probabilities of various student distributions. The
term P(D) is determined by integrating all the terms P(D|H)*P(H) and
normalizing the integral to 1.  The default apriori distribution for
H is p*(1-p) where p is the fractional score. This can perhaps be
justified on description-length principles; in practice it gives the
most pleasing plots. Other apriori distributions, including the uniform
distribution, are also available.

Andrew Myers, March 2001

May be used freely for non-profit purposes so long as this notice is
maintained in the code.
