CS 5150
Software Engineering
Fall 2012

Project Suggestion: Federal Data Search and Discovery Software


Federal Data Search and Discovery Software

Client

Bill Block
wcb67@cornell.edu

Project Summary

The Labor Dynamics Institute in collaboration with the Cornell Institute for Social and Economic Research (CISER) has been awarded funding from the National Science Foundation (NSF) to build a Comprehensive Extensible Data Documentation and Access Repository (CED²AR) designed to improve the documentation and discoverability of both public and restricted data from the federal statistical system. The CED²AR will be based upon leading metadata standards and will be flexibly designed to ingest documentation from a variety of sources.

Technical Summary

The CED²AR system will require ETL “connectors” from a number of heterogeneous data sources to an XML database repository. This repository will be based on the Data Documentation Initiative (DDI) schema (version 2.5) and will serve as the backend to a search and discovery API that implements the OAI-PMH standard. Finally, a user interface will be needed to interact with the API.

Technologies extensively used: Java, XQuery, SQL, JQuery, DDI 2.5, BaseX, PostGreSql

Opportunity

This project has the potential to make an enormous impact on the way research is conducted using data from the federal statistical system. It is also an opportunity to work closely with a number of Cornell faculty members in Economics and Information Science.

[ Home ]