phs-email-Enron dataset
This is a hypergraph dataset of Enron emails with a core-fringe structure. Nodes are labeled as either "core" or "fringe", with core nodes corresponding to email addresses of the individuals whose email inboxes were released as part of the investigation by the Federal Energy Regulatory Commission. Each hyperedge consists of a set of email addresses, which have all appeared on the same email. Each hyperedge has at least one core node, so the core forms a hitting set for the hypergraph. We studied ways of recorvering core labels from network structure, i.e., the case of finding a planted hitting set. Some summary statistics of the network are:
  • number of nodes: 4,423
  • number of hyperedges: 15,653
  • number of core nodes: 146
  • rank of hypergraph (maximum hyperedge size): 25
Data files: If you use this data, please cite the following paper:
  • Planted Hitting Set Recovery in Hypergraphs.
    Ilya Amburg, Jon Kleinberg, and Austin R. Benson.
    Journal of Physics: Complexity, 2021. [bibtex]