CS 430
Information Discovery
Fall 2003

Readings


General Books

There is no text book for this course. The following books cover much of the material for this course.

Discussion Classes

Readings for discussion classes are to be studied in preparation for the classes on Wednesday evenings.

Discussion Class 1, September 3, 2003

In preparation for this class, explore three information retrieval systems and compare them:

Consider the two information discovery tasks:

Study each search service in two ways. (a) From a technical viewpoint. Does the service search full text or surrogates? Are fielded searched offered? What Boolean operators are supported? What regular expressions? How does it handle non-Roman character sets? What is the stop list? How are results ranked? Are they sorted, if so in what order? (b) From a usability viewpoint. What style of user interface(s) is provided? What training or help services? If there are basic and advanced user interfaces, what does each offer?

Overall, how effective is each service? What do you consider its strengths and its weaknesses? When would you use it?

Discussion Class 2, September 10, 2003

Read and be prepared to discuss:

G. Salton, A. Wong and C. S. Yang, A vector space model for automatic indexing. Communications of the ACM Volume 18 , Issue 11 (November 1975) pages: 613 - 620. http://doi.acm.org/10.1145/361219.361220

This paper describes many of the concepts behind the vector space model and the SMART system.

{Note that to access this paper from the ACM Digital Library, you need to use a computer with a Cornell IP address.}

Discussion Class 3, September 17, 2003

Read and be prepared to discuss:

M. F. Porter, An algorithm for suffix stripping. (Originally published in Program, 14 no. 3, pp 130-137, July 1980.) http://www.tartarus.org/~martin/PorterStemmer/def.txt

This paper describes one of the standard algorithms uses for stemming English text.


[CS 430 Home Page]

William Y. Arms
(wya@cs.cornell.edu)
Last changed: Septmeber 15, 2003