Lecture Notes and Assigned Readings

Date Lecture Topic and Handouts

Readings

   
Thurs 8/29 Introduction to the field Some of the lecture material covered in: Croft et al., Ch. 1; Manning et al., first few pages of Ch. 1.
Tues 9/03 Relevance, evaluation and users  
Thurs 9/05 Search engine architecture: indexing Croft et al., Ch. 2
Tues 9/10 Search engine architecture: querying  
Thurs 9/12 Web crawlers I

Croft et al., Ch. 3-3.4
MRS Ch. 20

Critique 1 due (Weds 9/11, 11:59pm via CMS): Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the seventh international conference on World Wide Web 7 (WWW7).

Tues 9/17 Web crawlers II

Croft et al., Ch. 3.5-3.8

Thurs 9/19 No Class (no classroom)  
Tues 9/24 Web statistics, tokenization, stopping

Croft et al., Ch.4.1-4.3.3
MRS Ch.2-2.4

Thurs 9/26

Stemming and link analysis

Croft et al.,Ch.4.34-4.5
MRS Ch. 2.2.4;Ch.21
Tues 10/1 Phrases and named entities Croft et al., Ch. 4.3-4.7
MRS 2.2, 2.4

Project 1 - Analytical Questions due Mon 9/30, 11:59pm via CMS; hardcopy in class Tues 10/1.

Thurs 10/3

Guest lecture: PhD Candidate Joshua Moore

Playlist prediction and music retrieval

Croft et al., Ch 11.6

Project 1 - Programming Questions due Wed 10/2, 11:59pm via CMS; hardcopy in class Thurs 10/3.

Tues 10/08

Inverted indexes

Croft et al., Ch. 5-5.3
MRS 1.1-1.2
Thurs 10/10 Text categorization: detecting fake on-line reviews Ott et al. (ACL 2011)
Tues 10/15 FALL BREAK - NO CLASSES  
Thurs 10/17 Index compression Croft et al. Ch 5.4
MRS Ch. 5
Tues 10/22

Index construction

Croft et al., 5.5-5.7
Thurs 10/24 Evaluation basics

Croft et al. 8-8.4.1, 8.4.3 / MRS 8-8.4

Tues 10/29 Retrieval models I

Croft et al. 7-7.2

Critique 2 due (Mon 10/28, 11:59pm via CMS): Henry Feild, James Allan, and Rosie Jones. 2010. Predicting Searcher Frustration. In Proceedings of the 33rd International ACM Conference on Research and Development in Information Retrieval  (SIGIR 2010).

MRS Ch.1, Ch.6

Thurs 10/31

Retrieval models II
More on metrics

Croft et al. 7.2, 8.4.2 / MRS Ch.11
Tues 11/5 Language models I

Project 2 - Analytical Questions due Mon 11/4, 11:59pm via CMS; hardcopy in class Tues 11/5.

Thurs 11/7 Language models II and relevance feedback

Croft et al. 7.3, 6.2.4
Project 2 - Programming Questions due Wed 11/6, 11:59pm via CMS; hardcopy in class Thurs 11/7.

Tues 11/12

Guest lecture: Prof. David Mimno

Topic Models

Demo materials are here: http://mimno.infosci.cornell.edu/sotu-model.zip

Thurs 11/14

Guest lecture: PhD Candidate Parvaz Mahdabi

Patent Retrieval

 
Tues 11/19

Guest lecture: PhD Candidate Jon Park

Query refinement

Croft et al. 6-6.2.1, 6.2.3

Thurs 11/21

Clustering

Croft et al. 9.2
MRS 16-16.1, 16.4, 17-17.4

Tues 11/26 Learning to rank

Croft et al. 7.5, 7.6, 9.1.2
MRS 15-15.2.2; 15.4.4

Thurs 11/28 THANKSGIVING BREAK  
Tues 12/3 Text classification applications: Spam detection and sentiment analysis

Croft et al. 9.1.5
MRS 19.2.2

Thurs 12/5 Online advertising
Semester review

Croft et al. 9.1.5
MRS 19.3

Project 3 -Analytical Questions due Weds 12/4, 11:59pm via CMS; hardcopy in class Thurs 12/5.

Project 3 -Programming Questions due Fri 12/6, 11:59pm via CMS; hardcopy to Claire or the TAs on Mon 12/9.

Critique 3 due Fri 12/6 11:59pm (no late points until Mon 11:59pm, however); hardcopy to Claire or the TAs on Tues 12/10.