Lecture and assignment information will be posted often on this website. Check back soon.

Date Lecture Agenda Assignments

Tu, Jan 21, 2020

#1

Intro: Dimensions of Information Systems

Conversational Behavior and Social information

Projects Hall of Fame

Related material:

Linguistic Coordination Toolkit

Telephone

NPR Story: Before The Internet, Librarians Would 'Answer Everything' — And Still Do

Trailer for the movie "Her"

Google duplex example and writeup in The Verge

Slides

--

References:

Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang and Jon Kleinberg. Echoes of power: Language effects and power differences in social interaction.
Proceedings of WWW 2012.

Cristian Danescu-Niculescu-Mizil, Michael Gamon and Susan Dumais. Mark my words! Linguistic style accommodation in social media. Proceedings of WWW, 2011.

Kate G. Niederhoffer and James W. Pennebaker. Linguistic Style Matching in Social Interaction. Journal of Language and Social Psychology 2002 21: 337.

Filip Radlinsky and Nick Craswell. A Theoretical Framework for Conversational Search. Proceedings of CHIIR 2017.

Setup Quiz out (on CMS)

Assignment 0 out (on CMS)

Th, Jan 23, 2020

#2

Text similarity measures: Mimimum edit distance

Edit Distance worksheet (includes sketch of the Wagner Fisher algorithm we used in class)

Related material


Readings:

J&M Chapters 3.11

Assignment 1 out (on CMS)

Tu, Jan 28, 2020

#3

Basic text processing concepts: Sentence Splitting, Word Tokenization, Types, Tokens

Text similarity measures: Type Overlap, Jaccard similarity

Classic (ad hoc) information retrieval systems

Vector space model cheatsheet (useful to keep track of notation)

In-class demo: Proto Information Retrieval System: IPython notebook and html


Related material:


Readings:

J&M Chapters 3.8 and 23.1.1

Th, Jan 30, 2020

#4

Vector Space Model

Dot product similarity, Cosine similarity, Geometric intuition

Inverse document frequency (IDF)

TF-IDF weighting

In-class demo: (continued and updated) IPython notebook and html

Readings:

MRS Chapters 6.2, 6.3, 6.4.1 and 6.4.4

Assignment 2 out (on CMS)

Tu, Feb 4, 2020

#5

Term document matrix

Efficient retrieval

Inverted Index

Posting merge algorithm

Boolean search

In-class demo: (continued and updated) IPython notebook and html

Postings merge quiz (includes sketch of the algorithm we used in class)


Related Material:


Readings:

MRS Chapter 1

Th, Feb 6, 2020

#6

Efficient cosine similarity scoring using the inverted index (algorithm)

Fast cosine retrieval worksheet (includes sketch of the algorithm using the inverted index)


Related Material:

Inspiration for Assignment 3: QUOTUS project and interactive visualization


Readings:

MRS Chapter 6.3.3

Assignment 3 out (on CMS)

Tu, Feb 11, 2020

#7

Efficient cosine similarity scoring using the inverted index (implementation)

In-class demo: (continued and updated) IPython notebook and html

Before optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances):

class start

After optimizing retrieval with inverted indexes (one query on a collection of 40,000 reality TV utterances):

class end

Th, Feb 13, 2020

#8

Evaluation of ranked retrieval systems: Intuition, Precision, Recall and F1

Thinking about evaluation metrics worksheet


Readings:

MSR Chapter 8

Assignment 4 out (on CMS)

Tu, Feb 18, 2020

#9

Evaluation of ranked retrieval systems: Precision@K, Recall@K, Precision-Recall Plot, Mean Average Precission, Discounted Cumulative Gain

In-class demo: IPython notebook and html


Readings:

MSR Chapter 8

Th, Feb 20, 2020

#10

Relevance feedback, Rocchio's method for query rewriting, Pseudo-relevance feedback

Annotation: Pooling, K-statistic

Query update using relevance feedback worksheet (includes the Rocchio query update rule)


Related material


Readings:

MSR Chapters 9, MSR Chapter 8

Tu, Feb 25, 2020

FALL BREAK

Th, Feb 27, 2020

#11

Query expansion, Co-occurrence matrix, Pointwise Mutual Information

Scikit Learn basics

In-class demo: IPython notebook and html


Readings:

MSR Chapters 9

Assignment 5 out (on CMS)

Tu, Mar 3, 2020

#12

Wrapping up Ad-hoc IR, Midterm practice

Th, Mar 6, 2020

#13

Project discussion and brainstorming session

Tu, Mar 10, 2020

#14

MIDTERM - in class

Th, Mar 12, 2020

#15

MIDTERM discussion

Tu, Apr 7, 2020

#16

Lecture topics:

Text mining, Classifiers, Feature Representation

Th, Apr 9, 2020

#17

Lecture topics:

Bernoulli Naive Bayes, Smoothing

In-class demo: IPython notebook and html

Tu, Mar 14, 2020

#18

Lecture topics:

Multinomial Naive Bayes, Generative Models, Linear Classifiers

Tu, Apr 16, 2020

#19

Lecture topics:

Practical unsupervised learning on textual data: Singular Value Decomposition (SVD)

In-class demo: IPython notebook


Related material:

Indexing by latent semantic analysis. Deerwester, Dumais and Harshman 1990

Example recent research using SVD

Tu, Apr 23, 2020

#20

Project Prototype Madness

Th, Apr 25, 2020

#21

Project Prototype Madness

Tu, Apr 28, 2020

#22

Lecture topics:

Opinions and Trust: Link Analysis, Hubs and Authorities, Spectral Analysis


Related material:

NetworkX python package for link analysis


Reading:

Chapters 14.2 & 14.6 A. from Networks, Crowds, and Markets

Th, Apr 30, 2020

#23

Opinions and Trust: Sentiment analysis, Opinion mining, Helpfulness, Credibility

Related Material:

Tu, May 5, 2020

#24

Project presentations

|

|

|

Th, May 7, 2020

#25

Project presentations

|

|

|

Tu, May 11, 2020

#26

Misinformation and Anti-Social behavior

|

                     |   |