Cornell Movie-Dialogs Corpus

Cornell Movie--Dialogs Corpus

DESCRIPTION:

This corpus contains a large metadata-rich collection of fictional conversations extracted from raw movie scripts:

- 220,579 conversational exchanges between 10,292 pairs of movie characters

- involves 9,035 characters from 617 movies

- in total 304,713 utterances

- movie metadata included:

- genres

- release year

- IMDB rating

- number of IMDB votes

- IMDB rating

- character metadata included:

- gender (for 3,774 characters)

- position on movie credits (3,321 characters)

- see the documentation for details

BibTeX ENTRY:

@InProceedings{Danescu-Niculescu-Mizil+Lee:11a,

author={Cristian Danescu-Niculescu-Mizil and Lillian Lee},

title={Chameleons in imagined conversations:

A new approach to understanding coordination of linguistic style in dialogs.},

booktitle={Proceedings of the

Workshop on Cognitive Modeling and Computational Linguistics, ACL 2011},

year={2011}

}

This material is based upon work supported in part by the National Science Foundation under grant IIS-0910664.

Any opinions, findings, and conclusions or recommendations expressed above are those of the author(s) and do

not necessarily reflect the views of the National Science Foundation.