Lecture Notes and Assigned Readings

Date	Lecture Topic and Handouts	Readings	Assignments
Thur 8/23	Introduction to Human Language Technologies (.pdf)
Tue 8/28	Text categorization	Either (but not both) of: (1) Fabrizio Sebastiani. Machine Learning in Automated Text Categorization, ACM computing surveys , 2002. (.pdf). Read up to and including Section 5.1; then intro to Section 6, and Section 6.2., OR (2) Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze. Introduction to Information Retrieval. Cambridge University Press. 2008. Read Chapter 13.1, 13.2 (skip 13.2.1); Chapter 14.1, 14.2, 14.3 (skip 14.3.1). (2) is shorter than (1) but might be too difficult for people without any background in machine learning.	Read the Joachims paper for next week, but no critique is required for this paper (we will be modeling what paper presentations/discussions should look like with this paper on next Tuesday) Sign up on CMS for your first paper presentation slot (you should get an email announcing the due date for signup)
Thurs 8/30	Finish slides from Tues		(First presenter(s) should start preparing their presentation for next week. In general, all students should start preparing their presentation and/or do the reading a week ahead of time; students are responsible for keeping track of due dates for presentations, critiques, etc.)
Tues 9/4	Handout: geometric intuitions behind (linear) text classification Handout: notes by Lillian on (1)	(1) T. Joachims. Transductive Inference for Text Classification using Support Vector Machines, Proceedings of the International Conference on Machine Learning (ICML), 1999. (You can skim the theorems and section 4.1 if those parts are difficult.)
Thurs 9/6	Handout: sample "proposal" (re-branded critique)	(1)Evgeniy Gabrilovich and Shaul Markovitch, Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge, AAAI 2006. (2) Bo Pang, Lillian Lee, Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. EMNLP 2002.
Tues 9/11	Detecting deception (lecture by Myle Ott)(.pdf)
Thurs 9/13	Lexical Semantics and Word Sense Disambiguation	J&M 19-19.3; 20-20.5
Tues 9/18		(1) C. Akkaya, J. Wiebe and R. Mihalcea. Subjectivity Word Sense Disambiguation, EMNLP, 2009. (2)C. Santamaria, J. Gonzolo, and J. Artiles. Wikipedia as Sense Inventory to Improve Diversity in Web Search Results. ACL, 2010.
Thurs 9/20	All-hands meeting
Thurs 9/25	Class canceled (sick)
Thurs 9/27		(1) R. Barzilay and L. Lee, Learning to paraphrase: An unsupervised approach using multiple-sequence alignment, NAACL 2003. (2) C. Bannard and C. Callison-Burch, Paraphrasing with bilingual parallel corpora, ACL 2005.
Tues 10/2	Information Extraction	J&M 19.4, 20.9, 22-22.2; 22.4	PROJECT PROPOSAL due FRI 10/5 via CMS. Describe the problem or application to be addressed; the method or approach you'll employ; what data or corpora you'll use or create; evaluation plans.
Thurs 10/4	No class
Tues 10/9	Fall Break		RELATED WORK SUMMARY due FRI 10/19 via CMS. Write up short descriptions of relevant previous work in the context of your project. Focus on organizing the related work into coherent sets rather than just describing one paper after another. In addition, make clear how your work differs from (or in what respects it is the same as) the previous work.
Thurs 10/11		(1) T. Mohamed et al. Discovering relations between noun categories. EMNLP 2011. (2) M. Banko and O. Etzioni. The tradeoffs between open and traditional relation extraction. ACL, 2008.
Tues 10/16	In-class short presentations of project proposals.
Thurs 10/18		(1) Y. Choi, C. Cardie, E. Riloff, S. Patwardhan. Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns. EMNLP, 2005. (2) Y. Choi, E. Breck, C. Cardie. Joint Extraction of Entities and Relations for Opinion Recognition. EMNLP, 2006.
Tues 10/23		(1) S. Pradhan; W. Ward; K. Hacioglu; J. Martin; D. Jurafsky. Semantic Role Labeling Using Different Syntactic Views. ACL, 2005. (2) N. Chambers and D. Jurafsky. Template-based information extraction without the templates. ACL 2011.	PROGRESS REPORT due FRI 10/26 via CMS. Report progress w.r.t. coding, corpus creation, results, etc. Include a plan (with dates) for how you will complete the project.
Thurs 10/25		(1) R. Swier and S. Stevenson. Unsupervised Semantic Role Labelling. EMNLP, 2004. (2) M. Connor, Y. Gertner, C. Fisher and D. Roth. Starting from Scratch in Semantic Role Labeling. ACL, 2010.
Tues 10/30	Summarization	J&M 23.3-23.5, 23.7 E. Hovy. Automated Text Summarization. 2005. In R. Mitkov (ed), The Oxford Handbook of Computational Linguistics, p. 583-598. Oxford: Oxford University Press. (1) A. Hassan, D. Radev, J. Cho, and A. Joshi. Content based recommendation and summarization in the blogosphere. ICWSM, 2009. (2) A. Haghighi and L. Vanderwende. Exploring Content Models for Multi-Document Summarization. HLT-NAACL, 2009.
Thurs 11/1	(Discourse Analysis)	(1) E. Bengtson and D. Roth. Understanding the Value of Features for Coreference Resolution. EMNLP, 2008. (2) A. Haghighi and D. Klein. Coreference Resolution in a Modular, Entity-Centered Model. HLT-NAACL, 2010.	INITIAL RESULTS due FRI 11/09 via CMS. Describe any results you've obtained thus far.
Tues 11/6		(1) R. Barzilay and M. Lapata. Modeling Local Coherence: An Entity-Based Approach. ACL, 2005. (2) E. Pitler, A. Louis, and A. Nenkova. Automatic Sense Prediction for Implicit Discourse Relations in Text. ACL, 2009.
Thurs 11/8		(1) C. Sauper, A. Haghighi, and R. Barzilay. Incorporating Content Structure into Text Analysis Applications. EMNLP, 2010. (2) D. Roth, M. Sammons and V. Vydiswaran. A Framework for Entailed Relation Recognition. ACL, 2009.
Tues 11/13	Other topics	(1) Y. Mao and G. Lebanon. Isotonic Conditional Random Fields and Local Sentiment Flow. Advances in Neural Information Processing Systems 19, pages 961-968, 2007. (2) Guerini, Marco, Carlo Strapparava, and Oliviero Stock. 2008. Trusting politicians' words (for persuasive NLP). In Proceedings of the 9th International Conference on Computational Linguistics and Intelligent Text Processing (Cicling).
Thurs 11/15		(1) Bramsen, Philip, Martha Escobar-Molana, Ami Patel, and Rafael Alonso. 2011. Extracting social power relationships from natural language. Proceedings of ACL HLT. (2) Ranganath, Rajesh, Dan Jurafsky, and Dan McFarland. 2009. It's not you, it's me: Detecting flirting and its misperception in speed-dates. Proceedings of EMNLP.
Tues 11/20		(1) Shaparenko, Benyah and Thorsten Joachims. 2007. Information genealogy: Uncovering the flow of ideas in non-hyperlinked document databases. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 619-628. (2) Harper, F Maxwell, Daniel Moy, and Joseph A Konstan. 2009. Facts or friends?: Distinguishing informational and conversational questions in social Q&A sites. In Proceedings of CHI, 759-768.
Thurs 11/22	Thanksgiving Break
Tues 11/27	Project presentations
Thurs 11/29	Project presentations
			FINAL PROJECT WRITEUP due Dec 11 for Thursday presenters, Dec 13 for Tuesday presenters