CS 501
Software Engineering
Fall 2000

Project Suggestions

Legal Information Institute


Client

Thomas R. Bruce, Co-Director of the Legal Information Insititute (trb2@cornell.edu)

Project outline

Cornell's Legal Information Institute is the premier open access legal service on the Internet (http://www.law.cornell.edu/).  This project requires the development of tools that will take raw legal information and convert it to XML for reuse.

United States Code Conversion 
 
The United States Code is released to the general public by the US House of Representatives on its Web site (at http://uscode.house.gov/download.htm ).  This is a fairly plain-vanilla ASCII version to which the Legal Information Insitute adds value (visible at http://www4.law.cornell.edu/uscode/ ). This is presently done by an "HTMLizer" that performs formatting, internal crosslinking, and so on. The Legal Information Insitute would like a program, or suite of programs, to convert the raw ASCII output of the House of Representatives to XML, for subsequent reuse in various settings. 
 
This project would be a flagship of the Legal Information Insitute. The US Code currently gets about half a million hits daily. 
 
Publication of the decisions of the New York Court of Appeals
 
The New York Court of Appeals issues its opinions in a couple of formats (HTML and WordPerfect, currently). The Legal Information Insitute would like conversion software to take the "raw" opinions into XML.
This is an exercise in flexible design and dealing with exceptions to hazy rules. The principal challenges involved are, simply, that the input in both cases gives an appearance of regularity and consistency but is in fact highly variable in subtle but important ways.  The requirements for this project put a great deal of emphasis on transparent design, reusability, and structure since inevitably the input formats change from year to year, and code maintenance is a major concern.  

Technical

You can select the technical environment for this project.  Work in this area is typically carried out in Perl.


[CS 501 Home Page]

William Y. Arms

(wya@cs.cornell.edu)
Last changed: August 22, 2000