Results

Task I: Citation Prediction

  • Winner: J N Manjunatha, Raghavendra Pandey, Sivaramakrishnan R., and M Narasimha Murty (1329)
  • Second place: Claudia Perlich, Foster Provost, and Sofus Macskassy (1360)
  • Third place: David Vogel (1398)
The number in parentheses after each winner is the L_1 difference between the solution and the submission.

The solution for Task 1 is now available. The first column is the hep-th arxiv-id and the second column is (# of citations from May-July) - (# of citations from Feb-April) for all papers that received at least 6 citations between Feb and April.

In addition, the full list of new citations for all papers between May and July is also available.

Task II: Data Cleaning

  • Winner: David Vogel (421,582)
  • Second place: Sunita Sarawagi, Kapil M. Bhudhia, Sumana Srinivasan, and V.G.Vinod Vydiswaran (516,242)
  • Third place: Martine Cadot and Joseph di Martino (538,013)
The number in parentheses after each winner is the size of the symmetric difference between the submission and the solution.

The solution for Task 2 is a citation graph provided by SLAC/SPIRES for hep-ph papers available as a zip file. Papers in the left column cite papers in the right column.

Task III: Download Estimation

  • Winner: Janez Brank and Jure Leskovec (21,232)
  • Second place: Joseph Milana, Joseph Sirosh, Joel Carleton, Gabriela Surpi, Daragh Hartnett, and Michinari Momma (21,950.6)
  • Third place: Kohsuke Konishi (23,759)
The number in parentheses after each winner is the L_1 difference between the contestant's submission and the solution.

The actual download counts for the top 150 papers (50 from each of the three missing periods) are available here. The left column is the number of downloads the paper received in its first 60 days and the right column is the hep-th arxiv-id.

Task IV: Open Task

  • Winner: Amy McGovern, Lisa Friedland, Michael Hay, Brian Gallagher, Andrew Fast, Jennifer Neville, and David Jensen. "Exploiting Relational Structure to Understand Publication Patterns in High-Energy Physics"
  • Second place: Shou-de Lin and Hans Chalupsky. "Using Unsupervised Link Discovery Methods to Find Interesting Facts and Connections in a Bibliography Dataset"
  • Third place: Shawndra Hill and Foster Provost "The Myth of the Double-Blind Review"

The submissions for Task 4 were evaluated by a small program committee consisting of the three KDD Cup 2003 co-chairs, Mark Craven (University of Wisconsin-Madison), David Page (University of Wisconsin-Madison), and Soumen Chakrabarti (Indian Institute of Technology Bombay).