THERE's $1 million at stake in a competition being run by an online DVD rental firm. If you are thinking of entering, you might want to take note of an analysis by two computer scientists which has revealed a loophole that could be used as a cheat's charter.
Netflix, based in Los Gatos, California, is offering the prize to the first person who can achieve a 10 per cent improvement in the system it uses to recommend movies to its customers. The competition has already attracted more than 11,000 teams.
The existing recommendation service predicts what movies users will enjoy based on how they have rated films in the past. Netflix has now released movie ratings from almost half a million users, along with when they watched them - but it has withheld users' names and some of each user's ratings. Competitors have to predict what these hidden ratings are.
The loophole, discovered by computer scientists Arvind Narayanan and Vitaly Shmatikov at the University of Texas, Austin, shows that for some users it is possible to uncover ratings that Netflix has withheld. If the Netflix users have also rated films on the Internet Movie Database website (IMDB), they say, it's possible to correlate the two sets of records.
This is easiest with movies that are not in Netflix's top 100: fewer people rate these movies, so those who do are easier to identify. The pair found that if you know from the date of a user's IMDB ratings roughly when they watched six films not in the top 100, the ratings they gave make it possible to identify them in the Netflix record with an accuracy of 80 per cent (www.arxiv.org/cs.CR/0610105).
Once a Netflix user has been identified, you have access to their entire IMDB ratings record. You can then simply feed these ratings to your prediction algorithm, says Shmatikov. Identifying just 5000 Netflix users in this way could boost the accuracy of a contestant's algorithm enough to give them the winning edge.
Competition judge Charles Elkan at the University of California, San Diego, agrees that the method could work if enough Netflix users also use IMDB, but he believes it will be possible to detect and disqualify cheats when they submit their computer code. Narayanan and Shmatikov aren't convinced. "There are techniques to obscure how data is introduced within the thousands of lines of code you'd submit," Shmatikov says.
So why don't Narayanan and Shmatikov just cheat and claim the prize? "It wouldn't be ethical," Shmatikov says. "Netflix have done a wonderful thing for the computer science community by releasing real data for us to work with. We want to encourage them. We don't want their competition to be tainted by accusations of cheating."