Lecture and assignment information will be posted often on this website. Check back soon.

Date Lecture Agenda Assignments

Th, Jan 25, 2018

#1

Intro: Dimensions of Information Systems

Conversational Behavior and Social information

Projects Hall of Fame

Related material:

Linguistic Coordination Toolkit (Beta: feedback welcome)

Telephone

NPR Story: Before The Internet, Librarians Would 'Answer Everything' — And Still Do

Trailer for the movie "Her"

Slides

--

References:

Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction.
Proceedings of WWW 2012.

Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW, 2011.

Kate G. Niederhoffer and James W. Pennebaker. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337.

Setup Quiz out (on CMS)

Assignment 0 out (on CMS)

Tu, Jan 30, 2018

#2

Text similarity measures: Minimum Edit Distance

Edit Distance worksheet (includes sketch of the Wagner Fisher algorithm we used in class)

Related material


Readings:

J&M Chapters 3.11

Assignment 1 out (on CMS)

Th, Feb 1, 2018

#3

Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens

Text similarity measures: Type Overlap, Jaccard similarity

Classic (ad hoc) information retrieval systems

Vector space model: binary representation

In-class demo: Proto Information Retrieval System: IPython notebook and html

Vector space model cheatsheet (useful to keep track of notation)


Related material:


Readings:

J&M Chapters 3.8 and 23.1.1

Tu, Feb 6, 2018

#4

Vector Space Model: geometric intuition

Cosine similarity

Inverse document frequency (IDF)

TF-IDF weighting

In-class demo: (continued and updated) IPython notebook and html

Readings:

MRS Chapters 6.2, 6.3, 6.4.1 and 6.4.4

Assignment 2 out (on CMS)

Th, Feb 8, 2018

#5

Efficient retrieval

Inverted Index

Posting merge algorithm

Boolean search

In-class demo: (continued and updated) IPython notebook and html


Related Material:


Numpy tutorial and linear algebra refresher (IPython notebook on Piazza)


Readings:

MRS Chapter 1

Tu, Feb 13, 2018

#6

Efficient cosine similarity scoring using the inverted index (algorithm)

Fast cosine retrieval worksheet (includes sketch of the algorithm using the inverted index)


Related Material:

Inspiration for Assignment 3: QUOTUS project and interactive visualization


Readings:

MRS Chapter 6.3.3

Assignment 3 out (on CMS)

Th, Feb 15, 2018

#7

Efficient cosine similarity scoring using the inverted index (implementation)

In-class demo: (continued and updated) IPython notebook and html

Before optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances):

class start

After optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances):

class end

Th, Feb 22, 2018

#8

Evaluation of ranked retrieval systems: Precision@k, Precision-recall curve, Mean Average Precision

Thinking about evaluation metrics worksheet

In-class demo: IPython notebook and html


Readings:

MSR Chapter 8

Assignment 4 out (on CMS)

Tu, Feb 27, 2018

#9

Pooling, Annotation, K-statistic

Relevance feedback, Rocchio's method for query rewriting

Query update using relevance feedback worksheet (includes the Rocchio query update rule)


Related material


Readings:

MSR Chapters 9

Assignment 5 out (on CMS)

Th, Mar 1, 2018

#10

Geometric interpretation for query rewriting, Pseudo-relevance feedback

Query expansion, Co-occurence matrix, Scikit Learn basics


Readings:

MSR Chapters 9

Tu, Mar 6, 2018

#11

Pointwise Mutual Information

Project ideas brainstorming


Readings:

MSR Chapters 9

Project Milestone 0 out

Th, Mar 8, 2018

#12

Wrapping up Ad-hoc IR, Midterm practice

Tu, Mar 13, 2018

#13

MIDTERM - in class

Th, Mar 15, 2018

#14

Lecture topics:

Midterm discussion

Tu, Mar 20, 2018

#15

Lecture topics:

Text mining, Classifiers, Feature Represeantaion

In-class demo: IPython notebook and html

Th, Mar 22, 2018

#16

Lecture topics:

Naive Bayes, Generative Models, Smoothing

In-class demo: IPython notebook and html

Tu, Mar 27, 2018

#17

Lecture topics:

Practical unsupervised learning on textual data: Singular Value Decomposition (SVD)

In-class demo: IPython notebooks Data exploration, SVD

Th, Mar 28, 2018

#18

Lecture topics:

Practical unsupervised learning on textual data: Latent semandic indexing and topic modeling

In-class demo: IPython notebooks Kickstarter success prediction


Related material:

Indexing by latent semantic analysis. Deerwester, Dumais and Harshman 1990

Tu, Apr 10, 2018

#19

Lecture topics:

Applications of SVD: Question typologies

Lexicons and off the shelf NLP tools (listed on Piazza)


Related material:

Asking too much. The rhetorical role of questions in political discourse. Justine Zhang, Arthur Spirling, Cristian Danescu-Niculescu-Mizil. ENNLP 2017

Conversational analysis toolkit

Th, Apr 12, 2018

#20

Lecture topics:

Opinions and Trust: Link Analysis, Hubs and Authorities, Spectral Analysis


Related material:

NetworkX python package for link analysis


Reading:

Chapters 14.2 & 14.6 A. from Networks, Crowds, and Markets

Tu, Apr 17, 2018

#21

Project Prototype Madness

Th, Apr 19, 2018

#22

Project Prototype Madness

Tu, Apr 24, 2018

#23

Opinions and Trust: Sentiment Analysis, Lexicon Expansion, Pivot features


Related material:

In-class demo: Building sentiment lexicon with supervision notebook and html

In-class demo: Building sentiment lexicon without supervision notebook and html

Pulse of the Nation

Thumbs up? Sentiment Classification using Machine Learning Techniques

NLTK part of speech tagging

Predicting the Semantic Orientation of Adjectives |

                    |

                       |

Th, Apr 26, 2018

#24

Opinions and Trust: Using social information for sentiment analysis, Helpfulness, Deception

Related Material:

Telephone

Tu, May 1, 2018

#25

Project presentations

Th, May 3, 2018

#26

Project presentations