CS 5150 Software Engineering: Project Suggestion

CS 5150
Software Engineering
Fall 2009

Project Suggestion:
Legal Information Institute

Home

Syllabus

The Legal Information Institute

Client

Tom Bruce, Director of the Legal Information Institute, Cornell Law School, trb2@cornell.edu.

The Legal Information Institute

Cornell Law School's Legal Information Institute (LII) is a pre-eminent publisher of open access electronic legal information. It accounts for over 20 percent of Cornell's Web traffic, reaches users in more than 200 countries and territories, and receives more than a million page views each day. It is leader in developing applications that work with legal information and make it more accessible to the public.

In previous years, there have been several successful CS 501 projects for the Legal Information Institute.

The Spaeth database of Supreme Court statistics

The LII receives numerous questions about the voting records of Supreme Court justices. These come from a wide range of people, including high-school and college students writing papers, political-science researchers, ordinary citizens, and journalists. For example, the LII has done research projects for the staff of Sixty Minutes, and for a reporter at the Washington Post writing a book on Clarence Thomas.

The main source of answers for such questions is the Spaeth database, a comprehensive database of Supreme Court statistics developed and maintained by political scientists. It is very difficult to understand and use, which is one reason that people come to the LII for answers rather than consult the Spaeth database directly. Another reason is that the LII is easier to find, and widely recognized as a publisher of Supreme Court opinions.

The purpose of this high-visibility project is to make the information in the Spaeth database easy for an average person to use, and to capture collective wisdom about its contents. Part of the challenge is that its underlying data model is hard to understand, and part is that the database itself is very compactly (some would say cryptically) encoded, in the style of social scientists of perhaps 40 years ago.

Development and capture of user-contributed queries

The project will take a generalized approach to the development and capture of user-contributed queries. Researchers and other data-compilers, such as governments, increasingly have the ability (and the desire) to expose large compilations of data to the public via the Internet. Typically the data is made available to the public through dynamic web pages that, on one level, are simply the output of those database queries that the publisher believes the audience might like to make, expressed as a web page or pages.

A difficulty with this approach is that the public is simply viewing the data in those ways that the publisher can anticipate and hard-wire into prepackaged query-and-display systems. It would be better if we could build systems that allowed users to build and capture their own queries about the underlying datasets, since it is virtually impossible for the publishers of datasets to anticipate all of the audiences for their information. The need for this is particularly urgent at a time when government is about to release large quantities of data to the government as bulk XML.

The project, then, is to build a system that will show an (arbitrary) XML database to an audience that may or may not be knowledgeable about its contents and allow that audience to build queries about the data, store those queries under descriptive labels, and share them with others. The user should not need to understand either XML or any particular query language. In specific terms:

The system should present the structure and content of a suitably-annotated XML database in a way that allows the user to select those elements she wishes to know about (or combine into a query). It would be reasonable to require that the XML schema for the database be annotated in ways that would be helpful to the system (for example, that elements be described in ways that the system can recycle into labels for UI purposes, etc.)
The system should allow the user to create queries based on those elements, taking advantage of both structure and content of the database, possibly via a visual interface.
The system should allow the user to store those queries under descriptive labels for reuse, and to publish those queries for use by others (possibly under a three-level system similar to Facebook -- me, my friends, the world).

This generalized problem is a major software challenge. Therefore, the targets for this project is the development and capture of queries to the Spaeth database of Supreme Court voting records.

wya@cs.cornell.edu
Last changed: May, 2009