The
system plays checkers using a standard minimax search algorithm and uses alpha-beta
pruning to reduce the size of the search tree, thereby increasing the number
of moves it can look ahead in a reasonable amount of time. As it is implemented,
the minimax search is cut off at an arbitrary, uniform depth. Using forward
pruning might significantly improve performance by allowing the search to
progress deeper on the more "promising" paths; combining this with a form
of iterative deepening would allow the computer to play in time-limited games
and always make the best move it has found so far.
[Look at a minimax search example]
The utility function is a simple linear combination of a set of factors, which are taken from A. L. Samuel's checkers program of 1959. Most of the factors return a small integer value that is a function of the number and distribution of the pieces on the game board. Samuel's program was successful for its time, and the factors he used were based on discussions with checkers masters, research, and his own program's evaluation of their importance. Using these pre-tested factors removes some variability from the problem being studied.
In Samuel's original program, the coefficients of the terms of the utility function could assume any value between -218 and +218, and in his final results, he reports coefficients ranging from -218 to +216, with values distributed fairly evenly in between. In this program, the coefficients can have any value within the range of a Java double. For this reason, the standard genetic algorithm is modified slightly. Instead of creating the initial population at random, the coefficients of the utility function are always initialized to 1 at the start of the learning process. Ultimately, this should not have any effect on the ability of the system to learn, but it does introduce some consistency that makes evaluation of the system easier.
To determine the fitnesses of the
population members, and hence their probability of reproduction, the members
are randomly paired and play against each another in a single-elimination
tournament. One win in the tournament is worth 10 fitness points. In case
of a tie, the player with more pieces is awarded 5 fitness points. When the
tournament is complete, each individual's score is divided by the total number
of fitness points that were awarded.
[Look at a fitness tournament
example]
A single-point crossover operator is used. When the operator is applied to two parents, a random point is chosen in the weight vectors, and the individuals exchange the weights that fall to its right.
Because the weights vary over many
orders of magnitude, it was found that the mutation operator worked best when
the amount of mutation was proportional to the current value of the coefficient.
Instead of choosing a random number somewhere in the possible range of the
coefficients, a random number is chosen in the range [0, 1). This number is
squared so that the distribution is no longer even, but is heavily shifted
towards zero. The present value of the coefficient is multiplied by this result
to determine the mutation amount, which is then added or subtracted from the
current value to determine the new value. (If the current value of a coefficient
is zero, the new value is taken to be the squared random number itself.)
[Look at a crossover and mutation
example]
This process provides a better approximation of multiple inheritance in biology, where many genes code for a single trait (for example, height or intelligence). The trait is determined by the genes collectively, so a mutation in one of the genes is less likely to produce a dramatic change in an individual than a mutation in a gene that codes for a trait in its entirety. However, it is still possible for mutation to result in a major change; in this case, a coefficient can range from zero to twice its current value.