pvc-email-W3C dataset
This is a dataset with a "planted vertex cover" core-periphery structure. Every edge in the network contains at least one node in the core. The network comes from the corpus of crawled W3C mailing lists. The core nodes correspond to email addresses with w3.org in the domain. The dataset contains the identification of the core nodes, the timestamped emails, and the email addresses. Some summary statistics of the network are:
  • number of nodes: 20,081
  • number of edges: 31,874
  • number of core nodes: 1,994
  • minimum vertex cover size: 1,107
Data files: If you use this data, please cite the following papers:
  • Found Graph Data and Planted Vertex Covers.
    Austin R. Benson and Jon Kleinberg.
    Advances in Neural Information Processing Systems, 2018. [bibtex]
  • Overview of the TREC 2005 Enterprise Track.
    Nick Craswell, Arjen P. de Vries, and Ian Soboroff.
    TREC, 2005. [bibtex]