## Data!

This is a collection of datasets from my research projects.
I strive to make the data used in my research easily accessible.
If you encounter problems, please email me at arb@cs.cornell.edu.

#### Temporal higher-order networks

Each of these datasets is a timestamped sequence of simplices, where
a simplex is a set of k nodes from some vertex set. The datasets
also contain weighted projected graphs, where the weight is the
number of times that two nodes co-appear in a simplex. These datasets
were used in the paper

- Simplicial closure and higher-order link prediction.

Austin R. Benson, Rediet Abebe, Michael T. Schaub, Ali Jadbabaie, and Jon Kleinberg.

arXiv:1802.06916, 2018.

- coauth-DBLP: co-authorship on DBLP papers.
- coauth-MAG-Geology: co-authorship on Geology papers.
- coauth-MAG-History: co-authorship on History papers.
- tags-stack-overflow: sets of tags applied to questions on stackoverflow.com.
- tags-math-sx: sets of tags applied to questions on math.stackexchange.com.
- tags-ask-ubuntu: sets of tags applied to questions on askubuntu.com.
- threads-stack-overflow: sets of users asking and answering questions on threads on stackoverflow.com.
- threads-math-sx: sets of users asking and answering questions on threads on math.stackexchange.com.
- threads-ask-ubuntu: sets of users asking and answering questions on threads on askubuntu.com.
- NDC-substances: sets of substances making up drugs.
- NDC-classes: sets of classifications applied to drugs.
- DAWN: sets of drugs used by patients recorded in emergency room visits.
- congress-bills: sets of congresspersons cosponsoring bills.
- email-Eu: sets of email addresses on emails.
- email-Enron: sets of email addresses on emails.
- contact-high-school: groups of people in contact at a high school.
- contact-primary-school: groups of people in contact at a primary school.

#### Discrete subset choices

These datasets are from people making choices from a discrete set of
alternatives. In datasets with "universal choice sets," the set of
alternatives is the same for every choice that is made. In datasets
with "variable choice sets," the set of alternatives changes with each
subset selection. These datasets were used in the paper

- A Discrete Choice Model for Subset Selection.

Austin R. Benson, Ravi Kumar, and Andrew Tomkins.

In*Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM)*, 2018.

Code available at github.com/arbenson/discrete-subset-choice.

- uchoice-Bakery: sets of items purchased at a bakery.
- uchoice-Walmart-Items: sets of items purchased at Walmart.
- uchoice-Walmart-Depts: sets of departments from which items were purchased at Walmart.
- uchoice-Kosarak: sets of web pages viewed in a browsing session.
- uchoice-Instacart: sets of items purchased from Instacart.
- uchoice-Lastfm-Genres: sets of genres of music played by users in listening sessions.

- vchoice-Yc-Items: sets of items purchased from the items viewed in a browsing session on an e-commerce web site.
- vchoice-Yc-Cats: sets of product categories from which purchases were made from a browsing session on an e-commerce web site.

#### Manhhatan taxi cab trajectories

This dataset contains 1,000 sequences of neighborhoods of Manhattan visited
by taxi cabs over a one year period. The dataset was used in the paper

- The spacey random walk: a stochastic process for higher-order data.

Austin R. Benson, David F. Gleich, and Lek-Heng Lim.

*SIAM Review (Research Spotlights)*59:2, 321–345, 2017.

Code available at github.com/arbenson/spacey-random-walks.

#### Flow cytometry

This flow cytometry dataset represents abundances of fluorescent
molecules labeling antibodies that bind to specific targets on the surface
of blood cells. The dataset was used in the paper

- Scalable methods for nonnegative matrix factorizations of near-separable tall-and-skinny matrices.

Austin R. Benson, Jason D. Lee, Bartek Rajwa, and David F. Gleich.

In*Proceedings of Neural Information Processing Systems (NIPS)*, 2014.

Code available at github.com/arbenson/mrnmf.