uchoice-Lastfm-Genres dataset
This is a universal subset choice dataset, so it consists of a collection of subsets that are chosen from some universal set of items. This dataset comes from the listening behavior of users from the music streaming service Last.fm. We break user behavior into sessions, where a new session is created if the user goes 20 minutes without starting a new song. We create subset choices by the genres of music played in the session, where genres are derived from user-provided tags for artists. We assign an artist to the most commonly provided tag for that artist. Many subset selections contain repeated genres, corresponding to cases when a user listens to the same genre more than once in a session. This dataset was derived from data here and here. Some basic statistics of this dataset are:
  • number of items: 413
  • number of subset selections: 643,982
Data files: If you use this data, please cite the following paper:
  • A Discrete Choice Model for Subset Selection.
    Austin R. Benson, Ravi Kumar, and Andrew Tomkins.
    In Proceedings of the 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018. [bibtex]
  • Music Recommendation and Discovery in the Long Tail.
    Òscar Celma.
    Springer, 2010. [bibtex]
  • LastFM-ArtistTags2007 dataset.
    Paul Lamere, 2008. [bibtex]