Legal Information Institute: Save the Constitution
Thomas Bruce, Director, Legal Information Institute, Cornell Law School
Email: <tom@liicornell.org>
Advisor
Craig Newton, Legal Information Institute, Cornell Law School
Email: <craig@liicornell.org>
Student contact
Evan King <esk79@cornell.edu> is setting up a team for this project. If you are interested in joining the team, please contact him.
Who we are
The Legal Information Institute operates the single most active web site at Cornell, with approximately 150,000 unique visitors per day and over 100 million page views during the last calendar year. This is about two thirds of all Cornell's web traffic.
Last year, the LII provided Federal statutes and regulations to an audience of more than 32 million people from 246 countries. Our work, and the work of the students who work with us, has high visibility at Cornell and within the legislative and executive branches of the Federal government.
Since 1992, the LII has been a leader in the application of Internet-based technologies to legal data. It was the first legal website, and one of the first 30 websites in the world. We have worked with several successful CS 5150 project teams over the years.Project summary
Each year, the Congressional Research Service (CRS) produces a document called "The Constitution of the United States, Analysis and Interpretation". Popularly known as the "Constitution Annotated", or "CONAN", it provides legal analysis and interpretation of the Constitution, and particularly of Constitutional case law as decided by the Supreme Court. It is a very highly regarded source of information about the fundamentals of the American system of government, and one of a very few sources of information about the Constitution that is free of partisan bias.
In 1996, the LII received a copy of CONAN, in XML, from the editors at CRS who were responsible for its preparation. We have been unable to obtain an XML version since then; the only regularly updated, publicly-available version is a PDF version published by the Government Publication Office. Repeated requests for an XML edition -- from the LII, the Sunlight Foundation, and members of the Senate Judiciary Committee -- have gone unanswered for nearly a decade.
Why is this such a big deal? For one thing, the PDF edition published by GPO is effectively unreadable on mobile devices. For another, CONAN has great value as data, associating very specific parts of the text of the Constitution with the court cases that interpret them. Finally, we know from experience that the public cares a great deal about this. During the middle 20 minutes of the first GOP Presidential primary debate of the 2016 election season, half a million people came to view the Fourteenth Amendment on our web site (it is essential to an understanding of current Federal policy on both healthcare and immigration). No doubt some up-to-date, non-partisan explanation would have been helpful.
We would like to create an XML version by extracting the text from the PDF edition published by the GPO, and from that XML version create a number of RDF repositories that model the important data contained in CONAN. Major features of such a project would include accurate extraction and inlining of footnotes, identification of "lines of cases" associated with different facets of the analysis, and (optionally) extraction and identification of print and other analytic resources identified in the text. We can provide some code libraries that would assist in the extraction of legal citations, and have fairly well-developed data models for both caselaw and for Constitutional concepts.