CS 501 Software Engineering: Project Suggestion

CS 501
Software Engineering
Spring 2005

Project Suggestion: Library of Congress Classifications

CS 501 Home

Syllabus

Library of Congress Classifications

Organizing materials by subject in libraries has been the practice for many decades, if not longer. Librarians have come up with systems to linearly sort materials by subject, while maintaining the concept of a subject hierarchy, with broader and narrower terms. One such system is the Library of Congress Classification (LCC).

Two projects have been proposed:

A Platform for Exploring Scalability of the Hierarchical Interface to Library of Congress Classification in the Virtual (Book) Spine Viewer

Client

Adam Chandler, Cornell University Library, alc28@cornell.edu.

Description

The Hierarchical Interface to Library of Congress Classification (HILCC) is a method for mapping Library of Congress classification number to subject headings within a tree structure. Originally, it was designed for the Columbia University Libraries collection of 25,000 electronic journals and is in production at: http://www.columbia.edu/cu/lweb/eresources/ejournals/subjects/index.html.

For more information about Columbia HILCC, see: http://www.columbia.edu/cu/libraries/inside/projects/metadata/classify/.

As more and more academic print collections are moved offsite, the need to create better interfaces for browsing will gain urgency. With that problem in mind, Jim LeBlanc and Adam Chandler of the Cornell University Library, with the involvement of Karen Calhoun, investigated what would be required to scale up Columbia HILCC to accommodate an undergraduate (or "core") collection of some 150,000 titles. The details of the study are available at: http://www.library.cornell.edu/cts/browseandextend/.

Scaling up a tree structure such as HILCC is dependent on how well browsing within the interface functions. Even the best Web site interfaces really only allow the user to browse a few dozen titles before they get bored and move on. Naomi Dushay's Virtual (Book) Spine Viewer is an intriguing companion to the Cornell Library HILCC investigation for print collections. The purpose of this project would be to insert the HILCC tree structure (using data from the Uris print collection) into the Virtual (Book) Spine Viewer and debug and refine the interface. Upon reaching that point some very interesting usability testing could be conducted using this demo system, findings applicable to both print and digital collections.

A Visualizer for the Library of Congress Classifications

Client

Naomi Dushay, National Science Digital Library, Naomi@cs.cornell.edu.

Description

See: http://infoviz.comm.nsdl.org/cgi-bin/wiki.pl?MEngLCCViz

Information visualizations of the LCC would be of interest to the NSDL, if not the greater library community. LCC visualization could be used to inform NSDL users about the scope and contents of an online collection such as the National Science Digital Library (http://nsdl.org), while also helping educate NSDL users about subject hierarchies in general -- a concept they will encounter in libraries all over the world.

We need a clear way to communicate classification system concepts to end users:

what are the topics/concepts?
how are the topics/concepts related to each other?
what is the general shape/flow/relations of all the topics?

Goal of this project: one or more useful interactive visualizations of LCC or a subset of it, preferably with free tools or open source code. One or more of these visualizations should be ready to be incorporated into the National Science Digital Library, as well as a toolkit for other library collections. The visualizations need to be as easily configured as possible to accommodate different contexts of use. Ideally, the work will allow a variety of data to be visualized.

The NSDL has an electronic version of the entire LCC (600,000 records). Each record contains information about how it fits into the entire LCC hierarchy. We would initially be interested in a visualization of only those topics pertinent to the NSDL, scoping the problem somewhat.

For more information on some LCC visualization thoughts, see: http://infoviz.comm.nsdl.org/cgi-bin/wiki.pl?ClassVizReq.

For information on why bibliographic metadata poses visualization challenges: http://infoviz.comm.nsdl.org/cgi-bin/wiki.pl?VizChallenges

For more information about how libraries organize materials: http://infoviz.comm.nsdl.org/cgi-bin/wiki.pl?CallNumbers

For more general information about information visualization: http://infoviz.comm.nsdl.org/cgi-bin/wiki.pl

(Technical) Requirements: This project requires self-directed research of available information visualization tools and approaches (we have the documents for a decent start; see above), and a variety of programming and non-programming skills to get from the available electronic records to a demonstrable visualization. XML parsing may or may not be required. Extensive programming may or may not be required, depending on the software chosen. Excellent documentation skills are needed to track software investigated, to justify decisions made, and make it clear how to use end products.

William Y. Arms
(wya@cs.cornell.edu)
Last changed: January 22, 2005