CS 502
Architecture of Web Information Systems and Digital Libraries
Spring 2002

Assignments

 

Assignment 0 (January 29, 2002)
Assignment 1 (February 14, 2002)
Assignment 2 (March 7, 2002)
Assignment 3 (April 18, 2002)
Assignment 4 (May 15, 2002)

Philosophy

There will be several programming assignments throughout the semester.  The assignments are designed to give students some practical experience in dealing with the technologies that make the Web and digital libraries work.  In general, the assignments will require students to understand relevant protocol or specifications documents and write a small to moderate amount of java code that demonstrates an understanding of those specifications.

These assignments are not a test of your programming skills.  Rather they meant to encourage you to read protocol specifications and APIs.  In the real world this is not done in isolation.  Thus,  students are expected to work in groups on these assignments.  At the beginning of the semester the class will break up into groups of three that will remain together for the remainder of the semester.  Members of the group are expected to share information, jointly understand protocol documents and APIs, and write the final code product.  Every member of the group will receive the same grade for the assignment and it is the responsibility of the group to ensure that the work is apportioned fairly.

Prerequisites

The assignments assume that students can program in Java and understand how to download and use class libraries.  No java or programming tutorials will be offered.

Grading

This is not a programming course.  Imaginative algorithms or data structures will not be required or play a role in grading.  Instead, grading will be based on completion of the assigned task and demonstrated understanding of the concepts and protocols underlying the assignment.  Nevertheless, assignments should demonstrate good programming practices and documentation commensurate with the 500 level of this course. 

Programming Environment

Metrowerks CodeWarrior and Borland JBuilder are the preferred environments for developing and testing programs. One of the two is required for submitting assignments.  CodeWarrior is in all the CIT and CSUG labs.  A personal version of JBuilder is available for free at http://www.borland.com/jbuilder/personal/. The assignments have been tested in both environments.

Submitting Assignments

All assignments are due by the beginning of the lecture on the due date.  NO LATE ASSIGNMENTS WILL BE ACCEPTED.

To identify your assignments and  make grading easier, assignments MUST conform to the following guidelines.  :

Submissions that fail to conform to these guidelines will be rejected.

Work Groups

The assignment groups are:

Assignment 0 - Due 1/29/2002

The purpose of this assignment is to ensure that you are familiar with the assignment submission process.  

Resources for assignment 0

Directions for assignment 0

Write a java program that prints out your name and the assignment number in the format "FirstLastAssignment#".

Assignment 1 - Due 2/14/2002

The purpose of this assignment is to develop some experience with the HTTP protocol that serves as the basis of the World Wide Web.  Most of us only experience the Web only through a browser, such as Internet Explorer, but there is interesting and relatively simple technology under the covers.

Resources for assignment 1

Directions for assignment 1

Write a java program that:

Assignment 2 - Due 3/7/2002

The purpose of this assignment is to develop experience parsing and manipulating XML documents using DOM, SAX, and XSLT.  XML is an fundamental part of the toolset that moves the Web from a network of documents to a globally distributed database.

Resources for assignment 2

Directions for assignment 2

This assignment has three parts as stated below.  Please submit the assignment in one zip file. The zip should consist of three directories: part1, part2, part3, corresponding to the parts described below.  Parts 1 and 2 will be JBuilder project directories and part 3 will be a single xsl file.

Part 1: DOM processing

Write a java program that:

Part 2: SAX Processing

Write a java program that:

Part 3: XSL Processing

Write an XSL document that processes an XML document using saxon and produces an HTML document formatted as meta_report.html.  Notes:

 Assignment 3 - Due 4/18/2002

The purpose of this assignment is to give you some experience with the most developed of RDF tools; the jena toolkit developed by HP labs in Bristol UK.  RDF is the one of the basic building blocks of the semantic web and provides the primitives for ontology development and processing.

Resources for assignment 3

Directions for assignment 3

This assignment has three parts.  Please submit the assignment in one zip file. The zip should consist of two directories: part1 and part3, corresponding to the parts described below.  The part1 directory should contain one file, an RDF schema.  The part3 directory should contain a jbuilder project. Note that part 2 of the assignment has no deliverable.

Part 1: RDF schema extension

The abc schema at ABC.rdfs provides basic resource classes and property types that can then be extended for specific community uses.  You will write an RDF schema - pubABC.rdfs that derives from basic ABC concepts for use in the publishing community. 

This schema defines the following classes:

This schema defines the following properties:

Note the following:

Part 2: RDF Modeling

In this part of the assignment you will build an RDF model that uses entities from three namespaces:

  1. http://metadata.net/harmony# - the ABC ontology
  2. http://purl.org/dc/terms# - the Dublin Core properties
  3. http://BooksAreUs.com/pubABC# - the extension to the ABC properties that you developed in Part 1 of this assignment.

You should develop an RDF model that describes the following:

A publishing event took place on January 1, 1999.  The event involved an author agent named "Mary Doe" and a publishing agent named "HB publishers".  The event leads to a situation in the context of which a hard cover book called "APIs are Fun - First Release" exists.  A follow on publishing event took place on January 1, 2000.  This event involved an illustrator agent named "John Smith" and a publishing agent named "SB publishers". The event leads to a situation in the context of which a soft cover book called "APIs are Fun - Illustrated" exists.  The soft cover and hard cover books are both realizations of a work called "APIs are Fun".

Please note the following in completing the above:

Part 3: RDF Model Programming

The jena toolkit provides an API for building and manipulating RDF models.  Use jena to write a program that does the following based on the RDF model you developed in part 2:

  1. Build a copy of the model in memory.
  2. Prints out the values of all dublin core 'title' properties (no special formatting required).
  3. Prints out the dates of all the publishing events (no special formatting required).

Before working on this programming task you will find it extremely useful to work through the tutorials in the jena release (you will find them in the tutorial directory).

 Assignment 4 - Due 5/15/2002

The purpose of this assignment is provide the opportunity to examine the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).  This protocol is increasing popular as a low-barrier interoperability mechanism.

Special Rules for Assignment 4

Due Date and Time

This assignment MUST be submitted via email by 5pm on May 15.  Final grades are due soon after that.

Individual Effort

Unlike previous assignments, assignment 4 should be worked on individually and you will receive an individual grade for it.  You may freely borrow code from your previous assignments, much of which will be applicable for this assignment.

Resources for Assignment 4

Directions for Assignment 4

Write a Java program that does the following:

  1. Read the OAI data provider registry at http://www.openarchives.org/Register/ListFriends.pl .
  2. List the BASE-URL of repositories who support a metadata format in addition to the required oai_dc format.
  3. For each repository that supports an additional format indicate whether any records have been created in that format since April 1, 2002. You need to only indicate whether "records are available" for one of the additional formats.  For example, if a repository supports oai_dc, marc, and rfc1807 formats, you can pick one (marc) and check if any records are available for that format since April 1, 2002.  Note that this means that you don't have to harvest all the records available for that format and count all of them.  Therefore you can ignore issues related to resumptionTokens - i.e., you can abort after receiving a first "incomplete list".

Hints and Notes:

  1. Repeating what was stated above, you may reuse code from earlier assignments including HTTP request and response handling and XML parsing. 
  2. The bulk of the work in this assignment is understanding the OAI-PMH protocol document. You are advised to spend time up front understanding the relevant protocol issues and formulating any questions you might have in this area.
  3. Output formatting should be as simple as possible; e.g., simple text lists.  No credit will be given for pretty XML formatting.
  4. Please submit your work as a single JBuilder or CodeWarrior Project.