Date Lecture Agenda Assignments

Th, Jan 26, 2017


Intro: Dimensions of Information Systems

Conversational Behavior and Social information

Related material:

Linguistic Coordination Toolkit (Beta: feedback welcome)


NPR Story: Before The Internet, Librarians Would 'Answer Everything' — And Still Do

Trailer for the movie "Her"


Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction.
Proceedings of WWW 2012.

Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW, 2011.

Kate G. Niederhoffer and James W. Pennebaker. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337.

Setup Quiz out (on CMS) Environment instructions environment.yml

Assignment 1 out (on CMS)

Tu, Feb 31, 2017


Text similarity measures: Minimum Edit Distance

Edit Distance worksheet (includes sketch of the Wagner Fisher algorithm we used in class)

Related material


J&M Chapters 3.11

Th, Feb 2, 2017


Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens

Text similarity measures: Type Overlap, Jaccard similarity

Classic (ad hoc) information retrieval systems

Vector space model: binary representation

In-class demo: Proto Information Retrieval System: IPython notebook and html

Vector space model cheatsheet (useful to keep track of notation)

Related material:


J&M Chapters 3.8 and 23.1.1

Tu, Feb 7, 2017


Vector Space Model: geometric intuition

Cosine similarity

Inverse document frequency (IDF)

TF-IDF weighting

In-class demo: (continued and updated) IPython notebook and html


MRS Chapters 6.2, 6.3, 6.4.1 and 6.4.4

Th, Feb 9, 2017


Inverted Index

Posting merge algorithm

Boolean search

Assigment 1 discussion

In-class demo: (continued and updated) IPython notebook and html

Related Material:


MRS Chapter 1

Assigment 2 out (on CMS)

Tu, Feb 14, 2017


Term-document matrix

Efficient cosine similarity scoring using the inverted index

Fast cosine retrieval worksheet (includes sketch of the algorithm using the inverted index)

In-class demo: (continued and updated) IPython notebook and html

Before optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): class start

After optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): class end


MRS Chapter 6.3.3

Th, Feb 16, 2016


Evaluation of ranked retrieval systems: Precision@k, Precision-recall curve, Mean Average Precision

Thinking about evaluation metrics worksheet

In-class demo: IPython notebook and html


MSR Chapter 8

Tu, Feb 23, 2016


Pooling, Annotation, K-statistic

Relevance feedback, Rocchio's method for query rewriting, Pseudo-Relevance feedback

Query update using relevance feedback worksheet (includes the Rocchio query update rule)

Related material


MSR Chapters 9

Assigment 3 out

Tu, Feb 28,2016


Assignment 2 discussion

Project discussion

Th, Mar 2,2016


Project team-making and brainstorming

Tu, Mar 7, 2016


Query expansion, Co-occurrence matrix, Pointwise Mutual Information


MSR Chapter 9

Th, Mar 9, 2016


Wrapping up Ad-hoc IR, Midterm practice

Tu, Mar 14, 2016


MIDTERM (in class)