![]() |
CS 501
Software Engineering
Spring 2007
Project Suggestions: Legal Information Institute
|
Client Tom Bruce, Director Legal Information Institute, trb2@cornell.edu. The Legal Information Institute Cornell Law School's Legal Information Institute (LII) is one of the most highly ranked source of legal information on the Web. It is also one of the most heavily used web sites at Cornell, with approximately ten million hits per week. In previous years, there have been several successful CS 501 projects for the Legal Institute. This year, two projects have been proposed. E-rulemaking Each year, the Federal government creates thousands of new regulations covering everything from the labelling of organic foods to the operation of nuclear power plants. Many of these regulations are created through a process known as "notice and comment" rulemaking, in which the public has an opportunity to comment directly to the agency about proposed regulations. Often, there will be a number of different issues associated with a particular rulemaking; analysts working on one recent Department of Transportation regulation identified 38 separate issues within the regulation that are likely to attract public response; some or all of these might be addressed in a single comment. While the median number of comments for a given rulemaking is fairly small, some high-profile rulemakings have attracted as many as half a million comments, making management and categorization of these submissions an important challenge for all Federal regulatory agencies. Natural-language processing techniques promise to help greatly in this process. A multidisciplinary team at Cornell is investigating the use of natural language processing (NLP) to automatically tag or sort comments depending on which of a set of issues they address. If these experimental techniques are to be successful in the field, agency personnel will need to be comfortable with an application that will permit them to interact in various ways with the NLP software as they read, analyze, annotate, and consider the comments submitted by the public. This is a high-profile prototype application that will, we expect, be the model for future systems deployed in Federal agencies. It (or some evolution of it) will be tested at the Departments of Commerce and Transportation. The project is to construct a client application for the handling, sorting, and annotation of comments, modelled on popular e-mail clients such as Thunderbird and Outlook. It will need to support the management of comments in a set of issue-oriented folders, interacting with the NLP software that does initial categorization of comments, feeding corrections back into the NLP system as human analysts interact with its output. It must also support the annotation of comments in a manner similar to applications used by NLP researchers (such as Callisto). The application needs to permit simultaneous use by multiple users, and should operate cross-platform. The tentative plan is that an AJAX application using a back-end native-XML database might be best, but this is open to discussion as the specification evolves. We have a preference for inexpensive, open-source solutions. Project supervision will be by Tom Bruce, Director of the Cornell Legal Information Institute. His colleagues in the e-rulemaking project are Claire Cardie (FCIS), Cynthia Farina (Law), and Erica Wagner (Hotel). Search Engine Interface The appeal of the Legal Information Institute for users is strongly based in search and navigation functionality, which they wish to improve. Specifically, they would like to add two features: the first is an AJAX-based "word wheel" interface similar to Google Suggest, to be added to the search interfaces for all LII collections. The second feature, specific to the LII's US Code collection, is a structure-based "drill down" interface that could be used to guide users more effectively to search results in areas that interest them. The appearance and functionality of this interface are open; the project team might use the information on categorized search result visualization at http://www.cs.umd.edu/hcil/categorizedsearch/ as a general guide to what is expected. Neither of these sub-projects is particularly challenging in itself (AJAX-based interfaces similar to Google Suggest are described in at least two elementary AJAX books), but each is constrained by the need to use legacy software, to retain existing functionality, and to create a minimal learning curve for staff programmers at the LII. In particular, solutions must be based on the swish-e search engine (preferably using its Perl API), must be bundled in such a way as to permit easy incorporation into existing LII page templates, and must permit, for example, the injection of metadata gathered from external sources into various subitems within lists of search results. The impact of this project, for which team members will receive full attribution on the LII web site, should extend well beyond the LII to the general community of swish-e users. |
[ CS 501 Home | Notices | Syllabus | Projects | Readings | Assignments | Quizzes | Academic Integrity | About ]
William Y. Arms
(wya@cs.cornell.edu)
Last changed: January 18, 2006