The Plan



As we have not posted our code (see The Goods) our methodology and how it works will be partially explained here. (see also The Report).

The idea is, basically, that somebody who loses enough eventually gets good. If that agent (human or otherwise) happens to win a couple of times as well, that's great. So basically we implemented some computer learning, ran it a lot, and the agent at the end was compared to the agent from the beginning.

Risk has several stages. At the beginning the players need to choose territories. Then they do a fortification stage, where they choose which territories to add armies to. Then you choose a territory to attack, and depending on how the dice turn out, you might gain a territory or lose an army (or armies). After all the attacks are done, a stage of fortification follows, followed by more attacks, and so on.

We decided that our agent would be entirely separate from the board interface. Thus it would have no advantage over a human player in terms of the amount of knowledge it had access to. All it knew was the territories each player had, and the number of armies in each territory.

What about cards, you may ask? We decided that for the purposes of this project, we would omit the cards aspect; it adds a whole new dimension to the game that we didn't have time to explore.

We decided that our agent would calculate the utility (aka "how desirable it is") of a board position. It would examine each of the possible board positions one move ahead, and decide which one had the highest utility (the most desirable one). Then it would take steps to achieve it. (I guess you could say that the grass is always greener on the other side of the fence.)

How did it decide what was possible? First, the board returned a list of "attackables" i.e. territories it could have. Then it went through a filtering stage. The Agent basically had some special knowledge of probability built in. We called it "simulateBattle". Basically it would run a mock battle, "rolling the dice" by itself, given the armies from its territory being used in the attack, and the armies in the territory it wanted to attack. Do this about a hundred times, and if you got, say, a 60% chance of winning, it's a good move to try (Nice fast quake gaming machine finally has an academic use...).

We also built-in some extended strategy, too, in the form of checking to see if the adjacent territories to the attacked one would be able to conquer it back. (It wouldn't make sense for the agent to mount a massive assault, suffer massive casualties, and then be conquered immediately.)

Once it had the list of possible moves, it would rate each one with the utility function. This is where the learning came in. We used a kind of learning known as "TD-learning" whereby the function updates itself. It has been used in backgammon with excellent results.

This, of course, lends itself to automation. The whole point of the project was that it would learn by itself; we wouldn't need to train it, it would train itself. Leave it on babbage for a couple of days, go out in the sun, while the computer does the work for you. If only all our homework worked that way. :)

If only our project worked that way. Babbage was really slow, probably due to the large numbers of floating point numbers we used in the program. A nice fast home computer did much better, but we still weren't able to crank out the ten thousand or so trials that we wanted to do overnight. (The situation was not helped either by the realisation, a day before the due date, that our update function wasn't working quite right, and about eight thousand training trials had been wasted. And the one-way bridge from East Africa to Middle East, accidentally coded, probably didn't help the training process either.)

However, with some optimizations and tweaking, we were able to get about six thousand training trials, and matched it against the original, untrained agents. It seemed to perform better. It still wasn't up to human standards, though. (At least not our standards.)

So we decided to train it over winter break. While it still is not up to human standards (or at least, not Dave's level of ability, which may be a little higher than the average human, due to extensive training over the duration of the project. It wasn't just the computer agent that was trained.....) As to whether it is smarter or dumber, a partly subjective interpretation, we suggest you see for yourself.