coauth-DBLP dataset
This is a temporal higher-order network dataset, which here means a sequence of timestamped simplices where each simplex is a set of nodes. In this dataset, nodes are authors and a simplex is a publication recorded on DBLP. Timestamps are the year of publication. The projected graph is a weighted undirected graph representing how many times each pair of nodes co-appears in a simplex. We restricted to simplices that consist of at most 25 nodes. Some basic statistics of this dataset are:
  • number of nodes: 1,924,991
  • number of timestamped simplices: 3,700,067
  • number of unique simplices: 2,599,087
  • number of edges in projected graph: 7,904,336
Data restricted to simplices with at most 25 nodes: Full data without restriction on simplex size: If you use this data, please cite the following paper:
  • Simplicial closure and higher-order link prediction.
    Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg.
    Proceedings of the National Academy of Sciences (PNAS), 2018. [bibtex]