Search-engine results with subjectivity annotations

The dataset distributed on this webpage consists of 1346 hand-annotated documents drawn from the top 20 webpages retrieved by the Yahoo! search engine in response to 69 real and publicly-available user queries. The annotations indicate whether the documents are subjective or objective. Please see the README or the paper linked below for more details.


This data was introduced in Bo Pang and Lillian Lee, Using very simple statistics for review search: An exploration, Proceedings of COLING: Companion volume: Posters, pp. 73–76, 2008.

Data download

ss_data.tar.gz (23MB, tar.gz format), including ssdata.README.1.0.txt, September 2008.
The creation of this website is based upon work supported in part by the NSF under grant no. IIS-0329064, a Yahoo! Research Alliance gift, Google Anita Borg Memorial Scholarship funds, a Cornell Provost's Award for Distinguished Research, the Cornell Institute for the Social Sciences, and an Alfred P. Sloan Research Fellowship. Any opinions, findings, and conclusions or recommendations expressed are those of the authors and do not necessarily reflect the views or official policies, either expressed or implied, of any sponsoring institutions, the U.S. government, or any other entity.
Back to Lillian Lee's home page.
Go to the CUCS NLP home page.