KDD Cup 2003 - Download Estimation

Goal

The goal of this task is to estimate the number of downloads that a paper receives in its first two months in the arXiv.

Timeline

The task and data will be available April 6, 2003. Submissions must be completed by July 21, 2003.

Input

Contestants will be given:

all of the datasets available for Task 1: Citation Prediction.
for papers published in the following months, the downloads received from the main site in each of its first 60 days in the arXiv.
- February and March of 2000
- February and April of 2001
- March and April of 2002

Output

For each paper P submitted during the periods:

April 2000
March 2001
February 2002

contestants should report the estimated total number of downloads of P during its first 60 days in the arXiv. Note that this is a single number for each paper P, whereas the given data (3) provides a download log for the sixty days.

Evaluation

For each of the output periods (April 2000, March 2001, Feb 2002), the target result is a vector X with one coordinate for the top 50 papers with the greatest number of downloads in their first 60 days. For each of these papers P, the value of P-th coordinate is the number of downloads of P during its first 60 days.

Based on a contestant's download estimations, a vector Y will be constructed, over the same set of 150 papers (50 from each period); the P-th coordinate of Y will consist of the estimated number of downloads of P during its first 60 days.

The score of a prediction vector W will be equal to the L_1 difference between the vectors X and Y.

Tasks
Citation Prediction
Data Cleaning
Download Estimation
Open Task

KDD 2003