temporal-reddit-reply dataset
This is a temporal network of reddit comments, derived from a large collection of comments curated by Jack Hessel et al., using data from Jason Baumgartner at pushshift.io. In this temporal network, an edge (i, j, t) means that user i commented on user j's post or comment at time t. Users whose acccounts were deleted were removed from the data. Nodes are indexed from 1 to n. Some basic summary statistics of the dataset are as follows:
  • number of nodes: 8,396,162
  • number of timestamped edges: 636,295,809
  • number of static edges: 517,201,096
  • time span of dataset: 10.06 years
Data: If you use this data, please cite the following work:
  • Sampling Methods for Counting Temporal Motifs.
    Paul Liu, Austin R. Benson, and Moses Charikar.
    Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), 2019. [bibtex]
  • Science, AskScience, and BadScience: On the Coexistence of Highly Related Communities.
    Jack Hessel, Chenhao Tan, and Lillian Lee.
    Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM), 2016. [bibtex]
  • Jason Baumgartner. pushshift.io.