Skip to main content




Here are links to some resources we have made publicly available. Download and enjoy!

Congressional floor-debates data

Positive/negative-labeled documents; agree/disagree classification output, etc.
Useful for work on sentiment analysis.

English verb-object co-occurrences

Drawn from newswire text.
Useful for work on distributional similarity.

Movie-review sentiment-analysis data

Sometimes referred to as the “Cornell movie-review corpus”. Positive/negative- and “number-of-stars”-labeled documents; positive/negative and subjective/objective-labeled sentences, etc.
Useful for work on sentiment analysis.

NuPrl verbalizations

Multiple (multi-parallel) English versions of computer-generated proofs; induced paraphrase thesaurus, etc.
Useful for work on data-driven generation and paraphrasing.