TAs: , , , , , , ,
Office hours schedule listed on Piazza (Resources -> Staff)
Course homepage http://www.cs.cornell.edu/Courses/cs4300/2017sp/
Summary How to make sense of the vast amounts of information available online, and how to relate it and to the social context in which it appears? This course introduces basic tools for retrieving and analyzing unstructured textual information from the web and social media. Applications include information retrieval (with human feedback), sentiment analysis and social analysis of text. The coursework will include programming projects that play on the interaction between knowledge and social factors.
Strong Python skills and familiarity with IPython Notebooks.
Date | Lecture | Agenda | Assignments | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Th, Jan 26, 2017 |
#1 |
Intro: Dimensions of Information Systems Conversational Behavior and Social information Related material:Linguistic Coordination Toolkit (Beta: feedback welcome) NPR Story: Before The Internet, Librarians Would 'Answer Everything' — And Still Do References:Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction. Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW, 2011. Kate G. Niederhoffer and James W. Pennebaker. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337. |
Setup Quiz out (on CMS) Environment instructions environment.yml Assignment 1 out (on CMS) |
||||||||
Tu, Feb 31, 2017 |
#2 |
Text similarity measures: Minimum Edit Distance Edit Distance worksheet (includes sketch of the Wagner Fisher algorithm we used in class) Related materialReadings:J&M Chapters 3.11 |
|||||||||
Th, Feb 2, 2017 |
#3 |
Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens Text similarity measures: Type Overlap, Jaccard similarity Classic (ad hoc) information retrieval systems Vector space model: binary representation In-class demo: Proto Information Retrieval System: IPython notebook and html Vector space model cheatsheet (useful to keep track of notation) Related material:Readings:J&M Chapters 3.8 and 23.1.1 |
|||||||||
Tu, Feb 7, 2017 |
#4 |
Vector Space Model: geometric intuition Cosine similarity Inverse document frequency (IDF) TF-IDF weighting In-class demo: (continued and updated) IPython notebook and html Readings:MRS Chapters 6.2, 6.3, 6.4.1 and 6.4.4 |
|||||||||
Th, Feb 9, 2017 |
#5 |
Inverted Index Posting merge algorithm Boolean search Assigment 1 discussion In-class demo: (continued and updated) IPython notebook and html Related Material:Readings:MRS Chapter 1 |
Assigment 2 out (on CMS) |
||||||||
Tu, Feb 14, 2017 |
#6 |
Term-document matrix Efficient cosine similarity scoring using the inverted index Fast cosine retrieval worksheet (includes sketch of the algorithm using the inverted index) In-class demo: (continued and updated) IPython notebook and html Before optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): After optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): Readings:MRS Chapter 6.3.3 |
|||||||||
Th, Feb 16, 2017 |
#7 |
Evaluation of ranked retrieval systems: Precision@k, Precision-recall curve, Mean Average Precision Thinking about evaluation metrics worksheet In-class demo: IPython notebook and html Readings:MSR Chapter 8 |
|||||||||
Tu, Feb 23, 2017 |
#8 |
Pooling, Annotation, K-statistic Relevance feedback, Rocchio's method for query rewriting, Pseudo-Relevance feedback Query update using relevance feedback worksheet (includes the Rocchio query update rule) Related material Readings:MSR Chapters 9 |
Assigment 3 out |
||||||||
Tu, Feb 28,2017 |
#9 |
Assignment 2 discussion Project discussion |
|||||||||
Th, Mar 2,2017 |
#10 |
Project team-making and brainstorming |
|||||||||
Tu, Mar 7, 2017 |
#11 |
Query expansion, Co-occurrence matrix, Pointwise Mutual Information Readings:MSR Chapter 9 |
|||||||||
Th, Mar 9, 2017 |
#12 |
Wrapping up Ad-hoc IR, Midterm practice |
|||||||||
Tu, Mar 14, 2017 |
#13 |
Snowday!!! |
|||||||||
Th, Mar 16, 2017 |
#14 |
MIDTERM (in class) |
|||||||||
Tu, Mar 21, 2017 |
#15 |
Lecture topics:Text mining, Classifiers, Feature Represeantaion Midterm discussion |
|||||||||
Th, Mar 23, 2017 |
#16 |
Lecture topics:Naive Bayes, Generative Models, Smoothing |
|||||||||
Tu, Mar 28, 2017 |
#17 |
Lecture topics:Practical unsupervised learning on textual data: SVD (by Jack Hessel) In-class demo: IPython notebooks Data exploration, LSI |
|||||||||
Tu, Mar 30, 2017 |
#18 |
Lecture topics:Practical unsupervised learning on textual data: Latent semandic indexing and topic modeling (by Jack Hessel) In-class demo: IPython notebooks Kickstarter success prediction Related material:Indexing by latent semantic analysis. Deerwester, Dumais and Harshman 1990 |
|||||||||
Tu, Apr 11, 2017 |
#19 |
Lecture topics:Smoothing revisited, Linear Classifiers, Conversational Features In-class demo: IPython notebooks notebook and html Related material: |
|||||||||
Th, Apr 12, 2017 |
#20 |
Lecture topics:Lexicons and off the shelf NLP tools (listed on Piazza), Feature Selection, Ordinal Regression |
|||||||||
Tu, Apr 18, 2017 |
#21 |
Lecture topics:Opinions and Trust: Link Analysis, Hubs and Authorities, Spectral Analysis Related material:NetworkX python package for link analysis Reading: |
|||||||||
Th, Apr 20, 2017 |
#22 |
Project Prototype Madness |
|||||||||
Tu, Apr 25, 2017 |
#23 |
Opinions and Trust: Sentiment Analysis, Lexicon Expansion, Pivot features Related material: |
|||||||||
Th, Apr 27, 2017 |
#24 |
Opinions and Trust: Using social information for sentiment analysis, Helpfulness Related Material: |
|||||||||
Tu, May 2, 2017 |
#25 |
Project presentations |
|||||||||
Th, May 4, 2017 |
#26 |
Project presentations |
|||||||||
Tu, May Apr 9, 2017 |
#27 |
Lecture topics:Opinions and Trust: Deception Analysis Trust in conversation: Betrayal, Confidence Course wrap-up... Related Material:Fin. |