CS 430
Information Discovery
Spring 2001

Readings and References


Text Book

William B. Frakes and Ricardo Baeza-Yates, Information Retrieval Data Structures and Algorithms.  Prentice Hall, 1992.

Readings

Readings for discussion classes are to be studied in preparation for the classes on Wednesday evenings..

Week 1: Introduction to information retrieval

Discussion class
  • Frakes, W.B., Introduction to information storage and retrieval systems. (Frakes and Baeza-Yates, Chapter 1)
Other readings
  • W. Bruce Croft, What Do People Want from Information Retrieval? D-Lib Magazine, November 1995. http://www.dlib.org/dlib/november95/11croft.html
  • Baeza-Yates, R.A., Introduction to data structures and algorithms related to information retrieval. (Frakes and Baeza-Yates, Chapter 2)
  • Zipf, G. K., Human Behaviour and the Principle of Least Effort. Adison-Wesley, 1949

Week 2: File structures

Discussion class
  • Harman, D., Fox, E., Baeza-Yates, R.A., Inverted files. (Frakes and Baeza-Yates, Chapter 3)
Other readings
  • Faloutsos, C., Signature files. (Frakes and Baeza-Yates, Chapter 4)
  • Gonnet, G.H., Baeza-Yates, R.A., Lee, W., New Indices for Text: PAT trees and PAT arrays.  (Frakes and Baeza-Yates, Chapter 5)

Week 3: Descriptive metadata 1

Discussion class
Other readings

Week 4:  Descriptive metadata 2 / Automatic indexing 1

Discussion class
  • Fox, C.,  Lexical Analysis and Stoplists.  (Frakes and Baeza-Yates, Chapter 7) 
    [Do not study the details of the computer codes in 7.8, 7.9, 7.10.]
Other readings
  • Andy Powell, DC-Dot, Dublin Core Metadata Editor.  http://www.ukoln.ac.uk/metadata/dcdot/

  • Charlotte Jenkins and Dave Inman, Server-side Automatic Metadata Generation using Qualified Dublin Core and RDF.  2000 Kyoto International Conference on Digital Libraries: Research and Practice, Kyoto, Japan, November 13 -16 2000

  • Forest Press, Dewey Decimal Classification. http://www.oclc.org/oclc/fp/

Week 5: Automatic indexing 2

Discussion class [No discussion class.]
Other readings

Week 6: Retrieval evaluation

Discussion class
  • Frakes, W.B., Stemming Algorithms.  (Frakes and Baeza-Yates, Chapter 8)
Other readings
  • Cleverdon, Cyril William. Report on the testing and analysis of an investigation into the comparative efficiency of indexing system. Cranfield,England, College of Aeronautics;1962. 305p LC:63-60414.
  • Cleverdon, Cyril William. The Cranfield tests on index language devices, in ASLIB proceedings, June 1967, v.19, n.6, pp173-194.
  • Text Retrieval Conferences (TREC).  http://trec.nist.gov/

 

Week 7: Thesauruses 1

Discussion class
  • [no discussion class]
Other readings

Week 8: Thesauruses 2

Discussion class
  • Srinivasdan, P., Thesaurus construction. (Frakes and Baeza-Yates, Chapter 9)
Other readings

Week 9: Ranking algorithms

Discussion class
  • Harman, D., Ranking algorithms (Frakes and Baeza-Yates, Chapter 14)
Other readings

Week 10:  User interfaces

Discussion class
Other readings

Week 11: User interfaces / query modification

Discussion class
  • Harman, D., Relevance feedback and other query modification techniques. (Frakes and Baeza-Yates, Chapter 11)
Other readings

Week 12: Beyond text / web search systems

Discussion class
Other readings

Week 13: Beyond text/Boolean methods

Discussion class
Other readings
  • William Y. Arms, "Automated digital libraries." D-Lib Magazine, July/August 2000. http://www.dlib.org/dlib/july20/07contents.html 
  • S. Wartik, Boolean operators (Frakes and Baeza-Yates, Chapter 12)
  • E. Fox, S. Betrabet, M. Koushik, W. Lee, Extended boolean models. (Frakes and Baeza-Yates, Chapter 15)
  • D. Harman, Relevance feedback and other query modification techniques. ((Frakes and Baeza-Yates, Chapter 11, Section 11.2.6)

Week 14: Clustering algorithms

Discussion class
  • Rasmussen, E., Clustering algorithms.  (Frakes and Baeza-Yates, Chapter 16)
Other readings

[CS 430 Home Page]

William Y. Arms
(wya@cs.cornell.edu)
Last changed: April 26, 2001