Here are some datasets used in previous NLP experiments.
Sentiment-classified movie reviews
Congressional floor-debate transcripts, with support/oppose labels
AP88 data for some similarity-based pseudoword disambiguation experiments
Multi-parallel proof/verbalization data
for a project on verbalizing NuPrl mathematical proofs using multiple-sequence alignment
Document sets
used for ordering and summarization experiments
And here are some results from experiments.
Extracted paraphrases together with human evaluation judgments
, from a project using multiple-sequence alignment to learn paraphrases from comparable corpora.
Lillian Lee's home page
.
Cornell NLP homepage