Lecture and assignment information will be posted often on this website. Check back soon.

Date Lecture Agenda Assignments

Thu, Jan 28, 2016

#1

Lecture topics:

Intro: Dimensions of Information Systems

Conversational Behavior and Social information


Related material:

Telephone

NPR story: Before The Internet, Librarians Would 'Answer Everything' — And Still Do

Trailer for the movie "Her"


References:

Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction.
Proceedings of WWW 2012.

Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW, 2011.

Kate G. Niederhoffer and James W. Pennebaker. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337.

Assignment 1 out [Description, ZIP]

First part due on Thursday Feb 4 at noon.

Second part due on Thursday Feb 11 at noon. You might want this pickle

Tue, Feb 2, 2016

#2

Lecture topics:

Text similarity measures: Minimum Edit Distance

Edit Distance worksheet (includes sketch of the Wagner Fisher algorithm we used in class)


Related material:


Readings:

J&M Chapters 3.11

Th, Feb 4, 2016

#3

No lecture

Second part of Assignment 1 due on Thursday Feb 11 at noon. You might want this pickle

Tu, Feb 9, 2016

#4

Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens

Text similarity measures: Overlap, Jaccard similarity

Classic (ad hoc) information retrieval systems

Vector space model: binary representation

In-class demo: Proto Information Retrieval System: IPython notebook and html

Vector space model cheatsheet (useful to keep track of notation)


Related material:


Readings:

J&M Chapters 3.8 and 23.1.1

Th, Feb 11, 2016

#5

Lecture topics:

Vector Space Model: geometric intuition

Cosine similarity

Inverse document frequency (IDF)

TF-IDF weighting

Pivot length normalization

In-class demo: (continued and updated) IPython notebook and html


Readings:

MRS Chapters 6.2, 6.3, 6.4.1 and 6.4.4

Th, Feb 18, 2016

#6

Lecture topics:

Assigment 1 discussion

Inverted Index

Posting merge algorithm

Boolean search

In-class demo: (continued and updated) IPython notebook and html


Related Material:


Readings:

MRS Chapter 1

Assigment 2 out

[Description, ZIP]

Due: Wednesday, March 2, 11:59pm

Tu, Feb 23, 2016

#7

Lecture topics:

Efficient cosine similarity scoring using the inverted index

Fast cosine retrieval worksheet (includes sketch of the algorithm using the inverted index)

In-class demo: (continued and updated) IPython notebook and html

Before optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): class start

After optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances): class end


Readings:

MRS Chapter 6.3.3

Th, Feb 25, 2016

#8

Lecture topics:

Evaluation of ranked retrieval systems: Precision@k, Precision-recall curve

Search at Facebook (Guest speaker Ves Stoyanov)

Thinking about evaluation metrics worksheet

In-class demo: IPython notebook and html


Readings:

MSR Chapter 8

Tu, Mar 1, 2016

#9

Lecture topics:

Evaluation of ranked retrieval systems: Mean Average Precision, Pooling, K-statistic

Relevance feedback

In-class demo: IPython notebook and html


Related material


Readings:

MSR Chapters 8, 9.1

Th, Mar 3, 2016

#10

Lecture topics:

Rocchio's method for query rewriting, Pseudo Relevance feedback

Query expansion, Coocurence matrix

Query update using relevance feedback worksheet (includes the Rocchio query update rule)


Readings:

MSR Chapter 9

Assigment 3 out

[Description, ZIP]

Due: Wednesday, March 9, 11:59pm

Midterm date: March 15, durring class time

Tu, Mar 8, 2016

#11

Lecture topics:

Term-document matrix recap, Pointwise Mutual Information


Readings:

MSR Chapter 9

Th, Mar 10, 2016

#12

Lecture topics:

Wrapping up ad hoc IR

Tu, Mar 15, 2016

#13

Lecture topics:

In-class midterm

Th, Mar 17, 2016

#14

Lecture topics:

Midterm discussion

Project discussion

Project milestone 1

Due date 1 (piazza): Monday, March 21 at midnight

Due date 2 (CMS): Thursday, March 23 at midnight

Tu, Mar 22, 2016

#15

Lecture topics:

Text Mining

Th, Mar 24, 2016

#16

Lecture topics:

Practical text mining (by Xanda Schofield)

Slides with text mining tips and libraries

Tu, Apr 5, 2016

#17

Lecture topics:

Text mining, naive bayes, generative models

In-class demo: IPython notebook and html

Th, Apr 7, 2016

#18

Lecture topics:

One on one project meetings

Tu, Apr 12, 2016

#19

Lecture topics:

Feature selection, Conversational Features, Ordinal ranking

In-class demo: IPython notebook and html

Th, Apr 14, 2016

#20

Lecture topics:

Practical unsupervised learning on textual data: SVD (by Jack Hessel)

Slides

In-class demo: IPython notebooks Data exploration, LSI

Tu, Apr 19, 2016

#21

Lecture topics:

Practical unsupervised learning on textual data: Latent semandic indexing and topic modeling (by Jack Hessel)

Slides

In-class demo: IPython notebooks Kickstarter success prediction


Related material:

Indexing by latent semantic analysis. Deerwester, Dumais and Harshman 1990

Th, Apr 21, 2016

#21

Lecture topics:

Opinions and Trust: Link Analisys, Hubs and Authorities, Spectral Analysis


Related material:

NetworkX python package for link analysis


Reading:

Chapters 14.2 & 14.6 A from Networks, Crowds, and Markets

Tu, Apr 26, 2016

#22

Project Prototype Madness (project links posted on Piazza): fun, fun, fun!

Th, Apr 28, 2016

#23

Opinions and Trust: Sentiment Analyis