 |
CS 431
Architecture of Web Information Systems
Spring 2004
Assignments |
Reaction Papers
The reaction paper assignments are structured as follows: you should cover at
least two closely related papers relevant to the current section of the course.
One of the papers should be from the course syllabus (assigned for discussion
section on which the paper is due or the two preceding sections). Another should be a
related paper that you discover via another method such as references in the
papers you have read, searching on Google, ResearchIndex, via the library gateway,
or from other information source. Think of finding this paper as a mini resource
discovery exercise. You should then write approximately 3-4 pages
(approximately 1500-2000 words) in which
you address the following points:
- What is main content of the papers?
- Why is it interesting in relation to the course,
reflected in both readings and lecture?
- What are the weakness of the papers, and how could they be improved?
- What are some promising further research questions in the direction of the
papers, and how could they be pursued?
Reaction papers should not just be summaries of the papers you read; most of
your text should be focused on synthesis of the underlying ideas, your own
perspective on the papers, and thinking on how the content of the papers relates
to the overall content of the course. Reaction papers should be done individually (i.e. not
in groups).
The reaction papers will be graded on a 12 point scale, with points allocated
in the following categories:
- Choice of papers (2 points) - Points will be awarded based on the
scholarly nature of the second paper that is chosen and its relationship to
the course content and to the paper selected from the syllabus.
- Presentation (2 points) - Points will be awarded based on clarity
in preparation and coherence of ideas presented.
- Content understanding and summarization (4 points) - Points will be
awarded based on the demonstrated understanding of the content of the two
papers and that way in which that understanding demonstrates an understanding
of the course content in general..
- Synthesis (4 points) - Points will be awarded based on the depth of
analysis of the relationship between the papers, critique of their content,
and integration into the issues raised by the course in general.
Submission procedure for reaction papers is as follows:
- A physical version of the paper should be handed in at discussion section
on the due date.
- An electronic copy of your physical submission should also be sent via email attachment and should be addressed to
lagoze@cs.cornell.edu, ags@cs.cornell.edu.
The subject of the email should be formatted as <your name>:Reaction:<due
date>, as in 'Carl Lagoze:Reaction:2004-02-22'. The date/time stamp
of this electronic submission will provide verification of your submission.
This must be by the beginning of discussion section. Late submissions
will not be accepted. Permitted formats are Word and PDF.
- Please state at the beginning of the paper bibliographic references to the
two papers discussed therein. You should format these references according to
the IEEE reference formats at
http://www.computer.org/author/style/refer.htm.
Reaction paper due dates are on the syllabus.
Programming projects tentative due
dates are April 5 and May 17.
The projects are designed to give students some practical experience in
dealing with the technologies that make the Web and digital libraries
work. In general, the assignments will require students to understand
relevant protocol or specifications documents and write a moderate
amount of java code that demonstrates an understanding of those
specifications.
These assignments are not mainly a test of your programming skills. Rather
they meant to encourage you to read protocol specifications and understand the APIs
that implement them. In
the real world this is not done in isolation. Thus, students are
expected to work in groups on these assignments. At the beginning of the
semester the class will break up into groups of 2 that will remain together
for the remainder of the semester. Members of the group are expected to
share information, jointly understand protocol documents and APIs, and write the
final code product. Grades will be awarded based on the final product of
the group and each student's contribution to the work of the group.
Prerequisites
The assignments assume that students can program in Java and understand how
to download and use class libraries. No java or programming tutorials will
be offered.
Grading Criteria
This is not a programming course. Imaginative algorithms or data
structures will not be required or play a role in grading. Instead,
grading will be based on completion of the assigned task and demonstrated
understanding of the concepts and protocols underlying the assignment.
Nevertheless, assignments should demonstrate good programming practices and
documentation commensurate with the 400 level of this course.
Programming Environment
Programming assignments should be done using the Eclipse IDE. This is
available for free from http://www.eclipse.org
for all major operating systems. Submissions will be in the form of
Eclipse projects.
Tools
Working with XML, XSLT, and the like is considerably easier if you don't have
to worry about syntactic details. Fortunately, there are a number of
excellent tools available to avoid this. Two that I recommend are:
-
xmlspy.
You may download it onto
your personal machine for a 30-day free trial, which may be renewed. The
purchase price is ridiculously expensive! This
tool is available for Windows only.
- oxygen. Also
available for a 30-day free trial. It has a very attractive academic
license fee. Also integrates as a plug-in for eclipse. Plus runs
on Windows, Mac OS X, and linux!
Submitting Assignments
All assignments are due by 11:59PM on the due
date. NO LATE ASSIGNMENTS WILL BE ACCEPTED.
To identify your assignments and make grading easier, assignments MUST
conform to the following guidelines. :
- Each group should identify a group leader when they form. That
persons name will serve as the "firstlast" in the remainder of these
instructions.
- The Eclipse project must be named as firstlastassignment#
(e.g., CarlLagozeAssignment1)
- The first executable line of the program should be
System.out.println("TeamMember1, TeamMember2, TeamMember2")
- The assignment should be submitted to CMS at
http://cms.csuglab.cornell.edu.
Submissions that fail to conform to these guidelines will be rejected.
The purpose of this assignment is to ensure that you are familiar with the
assignment submission process. It will not be graded but your
submission of it registers the existence of your project group.
Resources for assignment 0
- Sample submission zip file - here.
Directions for assignment 0
Write a java program that prints to the console two lines:
- firstlastassignment0
- The three names of group members separated by commas
In this assignment you will harvest Dublin Core metadata via the OAI-PMH,
transform that metadata via XSLT to conform to a new FRBR-based schema that you
design, and publish that metadata via RSS 1.0,
Resources for assignment 1
Directions for assignment 1
- The first part of the assignment involves some modeling work based on
Dublin Core and FRBR. The DC properties have been criticized because they are a simple flat
list. Semantically the properties can be partitioned among the four
entities in the IFLA FRBR entity model: work, manifestation, expression,
and item. Write a new RDF schema (expressed in RDF/XML) that expresses the four
classes of resources expressed by the FRBR, expresses properties to
associate the FRBR entities with the described resource, and then associates
the respective Dublin Core properties with the proper FRBR entity via domain
constraints. You should include the schema in your submitted zip file
with the name dc_frbr.rdfs. You should include comments in your
rdfs file sufficient to justify your modeling decisions.
- Building on this modeling work, you should then write a single java
program that takes no arguments and does the following:
- Harvest metadata from baseURL
http://services.nsdl.org:8080/nsdloai/OAI. You should restrict your
harvest to the set 'arXiv:org' and metadata format 'nsdl_dc' and to records that
are new since June 1, 2003. You can do a
single harvest, ignoring the resumptionToken (indicating that there is another
group of records to harvest for this request).
- Transform the harvested metadata into an RSS 1.0 channel that contains an
item for each OAI record harvested and which translates the harvested
metadata to conform to the new schema you designed in part 1.
- Write out the resulting RSS/XML channel as a file called RSS.xml.
Guidance for assignment 1
This assignment really doesn't require a significant amount of programming.
The bulk of the work is understanding the schema design, protocol specifications, APIs, and
tools such as XSLT. Much of the material will be introduced in lecture
over the next few weeks. I'd recommend, however, that you get an early
start by looking at and downloading the relevant resources and experimenting
with them. Before writing the XSLT transformation, I recommend manually
(using Oxygen) writing a trial RSS 1.0 channel to see what you are headed
towards.
In this assignment, you will integrate Fedora and Jena to provide a metadata
repository for various entities and reflect the relationships among those
entities in a Jena model. The entities (content) that you work with be
based on a simple modeling of information on Amazon.
Resources for assignment 2
Directions for assignment 2
- Pick a person who is the creator of both books and music on amazon.com.
An example of such a person is James McBride who has authored books, one which
is the fantastic "Color of Water", and is a jazz musician. You can use
McBride or any other person as long as s/he has creations in two very
different genre of materials. One other restriction is that
amazon.com should have at least one or two reviews for the books and music
created by your chosen person (this shouldn't be hard to meet since there are
reviews for virtually everything on amazon.com).
- Create a simple ontology expressed in RDF-s that provides the framework
for describing the class/sub-class and property/sub-property relationships in
the information from amazon. This does not have to be very complex and
only needs to express the following structure:
- There are two genres of creations: CDs and Books.
- People can have three roles: author, musician, reviewer
- There are properties that express the relationships among people in these
roles and their creations.
- You should include the schema in your submitted zip file with the name
amazon.rdfs.
- Set up a fedora content repository. Create digital objects for the
following entities:
- The person that is the creator of the books and music.
- At least one of the books created by this person and at least one of the
CDs.
- At least two of the reviewers of these resources.
- At least one review from each of these people.
- Set up data streams for these objects as follows:
-
- For the content (reviews, books, music), fill in the default Dublin Core
record with information for the content resource. Don't get carried away with the
completeness of the DC record. A minimal amount of information to
describe the content (e.g., creator, title, subject, type) is enough.
- For the people, create an addition data stream that is a simple vcard record as described at
Representing
vCard Objects in RDF/XML. Again, don't
get carried away with the completeness of the vcard record.
- Add a disseminator for each content object that disseminates the Dublin
Core information as an RSS 1.0 item. The RSS 1.0 documentation on the dc
module at
http://web.resource.org/rss/1.0/modules/dc/ gives a nice easy example of
this item expression format.
- Create another data stream in each digital object that is an RDF/XML
fragment expressing its relationship to another object in your repository.
This RDF fragment should use vocabulary from the simple relationship taxonomy described
by your RDF-s. For example, the relationship data stream in the digital
object corresponding to a book might express its connection to the digital
objects corresponding to the reviews of that book.
- Write a small java program that:
- Extracts the rss item fragment disseminations from each of the content
objects and combines them into a single xml document representing an rss
channel. You should do this via manipulation of the XML as a DOM
tree using JDOM, rather than doing
textual manipulations. Write the rss channel xml out to a file
called rss.xml.
- Extracts the relationship fragment disseminations and joins them into
a single jena model. Write the model out into single RDF/XML file called relationships.xml.
- You can then use IsaViz to view the RDF graph produced.
Guidance for assignment 2
You should run your fedora repository with the built-in McKoi java-based
database. This is the easiest way to get fedora up and running.
Make sure to take a look at some of the sample objects that come with the
fedora distribution. The use of XSLT transforms in the sample objects is a
template for the type of objects you will set up in your fedora repository.
As said above, don't spend a huge amount of time creating the metadata for
each object. Your grade will not be based on how complete the metadata is.
You only need enough to supply the material for the rest of the project.
Submission Procedure
You will use the standard CMS submission procedure for packaging your Java
code and associated rdf and xml files by the due date. However, it will
difficult for you to "submit" your fedora repository to us. Therefore, we
will grade you via short 15-20 minute presentations on Tuesday May 18 during
which you will have the chance to give an overview of your work. The schedule
for presentation is as follows;
Time |
Group |
9:00 |
Joseph Egbulefu & Marc Almendarez |
9:30 |
|
10:00 |
Ricky M. Yu & Gee-Hsien Chuang |
10:30 |
Stephanie Moy & Ari Tivon Epstein |
11:00 |
Michael Mahar |
11:30 |
Boris Suchkov & Theodore Tang |
12:00 |
Mikolaj Franaszczuk & Gerald Yean |
12:30 |
Dave Vitek & Mike Pape |
13:00 |
Brian Rogan
& Karl Schulze |
13:30 |
Mina Radhakrishnan & Patty Reeder |
14:00 |
Benjamin Ee & David Boxer |
14:30 |
Deva Mishra & Chaitanya Desai |
15:00 |
|
15:30 |
|
16:00 |
|
16:30 |
|
17:00 |
|
17:30 |
|
18:00 |
Abhiram Rajendran & Judhajit De |
18:30 |
Jackie Bodine & Vlad Muste |
19:00 |
Arthur Chitikian & Todd Defilippi |
19:30 |
Will Kruse & Matthew Wachs |
20:00 |
Raghav Venkat Agnihothri & Carlos Zednik |
[CS 431 Home Page]
Carl Lagoze (lagoze@cs.cornell.edu)
Last changed: 05/18/2004