Lecture and assignment information will be posted often on this website. Check back soon.

Date Lecture Agenda Assignments

Wed, Jan 21, 2015

#1

Intro: Dimensions of Information Systems

Conversational Behavior and Social information

Slides on linguistic style coordination (most of course content is chalk-on-blackboard only)


Related material:

Telephone

NPR story: Before The Internet, Librarians Would 'Answer Everything' — And Still Do

Trailer for the movie "Her"


References:

Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction.
Proceedings of WWW 2012.

Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW, 2011.

Kate G. Niederhoffer and James W. Pennebaker. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337.

Assignment 1 out [Description, ZIP]

Due Feb 3, 5pm

Mon, Jan 26, 2015

#2

Lecture topics:

Text similarity measures: Minimum Edit Distance, Jaccard Similarity

Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens

Edit Distance worksheet (includes sketch of the Wagner Fisher algorithm we used in class)

In-class demo: Proto Information Retrieval System: IPython notebook and html


Related material:


Readings:

J&M Chapters 3.9 and 3.11

Wed, Jan 28, 2015

#3

Lecture topics:

Classic (ad hoc) information retrieval

Vector space model

Document preprocessing: stemming, deduplication, shingling

Vector space model cheatsheet (useful to keep track of notation)


Related material:


Readings:

J&M Chapters 3.8 and 23.1.1; MRS Chapter 19.6

Mon, Feb 2, 2015

#4

Lecture topics:

Vector Space Model: geometric intuition

Cosine similarity

Inverse document frequency (IDF)

TF-IDF weighting

In-class demo: (continued and updated) IPython notebook and html


Readings:

MRS Chapters 6.2, 6.3 and 6.4.1

Wed, Feb 4, 2015

#5

Lecture topics:

Inverted index

Postings merge algorithm

Boolean search

Efficient cosine similarity scoring using the inverted index

In-class demo: (continued and updated) IPython notebook and html

Before optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): class start After optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): class end


Related Material:


Readings:

MRS Chapters 1 and 6.3.3

Mon, Feb 09, 2015

#6

Lecture topics:

Discussion of Assignment 1: "Keeping up with social information" (+ importance of data pre-processing)

Evaluation of IR systems

Fast cosine retrieval worksheet (includes sketch of the algorithm using the inverted index)


Related Material:


Readings:

MRS Chapters 8.1 and 8.3

Wed, Feb 11, 2015

#7

Lecture topics:

Evaluation of ranked retrieval systems: Precision@k, Precision-recall curve, Mean Average Precision, Pooling

Relevance feedback, Rocchio's algorithm

In-class demo: IPython notebook and html


Related Material:


Readings:

MSR Chapters 8.4 and 9.1

Assignment 2 out [Description, ZIP]

Due Feb 24, 5pm

Wed, Feb 18, 2015

#8

Lecture topics:

Guest lecture by Prof. David Mimno

Query expansion: Pointwise mutual information


Related Material:


Readings:

J&M Chapter 20.7

Mon, Feb 23, 2015

#9

Lecture topics:

Classic (ad hoc) information retrieval wrap-up

Query expansion, Relevance feedback

Pseudo-relevance feedback

Query update using relevance feedback worksheet (includes the Rocchio query update rule)


Related Material:


Readings:

MSR Chapter 9.2

Wed, Feb 25, 2015

#10

Lecture topics:

Guest lecture by Josh Moore: Music Information Retrieval

Mon, Mar 2, 2015

#11

Lecture topics:

Text Classification, Naive Bayes

In-class demo: IPython notebook and html

Readings:

MSR Chapters 13.1, 13.3

Wed, Mar 4, 2015

#12

Lecture topics:

Discussion of Assignment 2: "Search your transcripts. You will know it to be true." (+ updating the inverted index)

Document representation

Generative models: Multinomial Naive Bayes, Bernoulli Naive Bayes

Smoothing

Readings:

MSR Chapter 13.4

Mon, Mar 9, 2015

#13

Midterm (Gates 203)

Wed, Mar 11, 2015

#14

Lecture topics:

Vector space classification

Conversational features

In-class demo: IPython notebook and html

Readings:

MSR Chapter 14.4

Project milestone 1 (proposal abstract)

Due Mar 24, 5pm

Mon, Mar 16, 2015

#15

Lecture topics:

Midterm solutions

Project planning

Feature selection, Feature design

Readings:

MSR Chapter 13.5

Wed, Mar 18, 2015

#16

Lecture topics:

Result re-ranking using machine learning

Paired classification

Cognitive biases: Anchoring effect


Related Material:


Readings:

MSR Chapter 15.4

Project milestone 2 (full proposal)

Due April 8, 5pm

Mon, Mar 23, 2015

#17

Lecture topics:

PageRank


Readings:

MSR Chapter 21.2

Assignment 3 out [Description, ZIP]

Wed, Mar 25, 2015

#18

Lecture topics:

Weighted PageRank, Topic-specific PageRank


Related Material:


Readings:

MSR Chapter 21.2

Mon, Apr 6, 2015

#19

Lecture topics:

Case study: Discussions in task-oriented groups

Statistical Significance: Permutation Test


Related Material:

Wed, Apr 8, 2015

#20

Lecture topics:

Statistical Significance Tests

Hubs and Authorities


Related Material:


Readings:

MSR Chapter 21.3; Networks, Crowds, and Markets: Reasoning about a Highly Connected World (Chapter 14)

Project milestone 3 (data)

Due April 15, 5pm