Congressional speech data

This page is a distribution site for a congressional-speech corpus and related extracted information. This data includes speeches as individual documents, together with: If you have used this data, we would appreciate hearing about it (Lillian Lee is our designated contact person).

References

This data was introduced in Matt Thomas, Bo Pang, and Lillian Lee, Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. The original version of the paper appeared in the Proceedings of EMNLP, 2006, pp. 327–335. However, the paper has been updated since then; the link provided is to the most current version.

Data download

convote dataset v1.1 (9.8 Mb, tar.gz format), including README.v1.1.txt, January 2008. The only difference from v1.0 is that a typo in the first line of graph_edge_data/edges_individual_document.v1.0.csv has been corrected. (This affects just a single file and our calculations used the correct value.)

convote dataset v1.0 was released in December 2006. Please use the one-line-different newer version v.1.1.


The creation of this website is based upon work supported in part by the National Science Foundation (NSF) under grant no. IIS-0329064, an Alfred P. Sloan Research Fellowship, and Google Anita Borg Memorial Scholarship funds. Any opinions, findings, and conclusions or recommendations expressed above are those of the authors and do not necessarily reflect the views of the National Science Foundation or Sloan Foundation and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity.
Back to Lillian Lee's home page.
Go to the CUCS NLP home page.