sos-email-Enron-core dataset
This dataset is a collection of sequences of sets. Each sequence is derived from the recipients of emails sent by a particular email address at Enron. We restrict the dataset to the "core" group of employees whose email inboxes were made public by the FERC investigation of the company (each sequence corresponds to one employee's emails). All sequences contain at least 10 sets, and only sets of size at most 5 are considered. Some basic statistics of this dataset are:
  • number of sequences: 93
  • number of unique elements appearing in sets: 141
  • number of sets: 10,428
  • number of unique sets: 649
Data: If you use this data, please cite the following paper:
  • Sequences of sets.
    Austin R. Benson, Ravi Kumar, and Andrew Tomkins.
    Proceedings of KDD, 2018. [bibtex]