sos-tags-mathoverflow dataset
This dataset is a collection of sequences of sets. Stack exchange is a collection of question-and-answer web sites. Users post questions and annotate them with up to 5 tags. In this dataset, each sequence is the time-ordered set of tags applied to questions asked by a user on MathOverflow. All sequences contain at least 10 sets, and only sets of size at most 5 are considered. Some basic statistics of this dataset are:
  • number of sequences: 1,594
  • number of unique elements appearing in sets: 1,399
  • number of sets: 44,950
  • number of unique sets: 24,157
Data: If you use this data, please cite the following paper:
  • Sequences of sets.
    Austin R. Benson, Ravi Kumar, and Andrew Tomkins.
    Proceedings of KDD, 2018. [bibtex]