CS 5150 Software Engineering: Project Suggestion

CS 5150
Software Engineering
Spring 2009

Project Suggestion:
Cornell Collaborative Web Publishing Project

Home

Syllabus

Client

Raj Smith, Cornell Cooperative Extension, Department of Natural Resources
Email: raj.smith@cornell.edu
Phone: 607-257-4578

Cornell Collaborative Web Publishing Project

Objective

The client has a long term project to develop a schema driven, collaborative content management system with publishing capabilities for the web, print, syndication, and mobile devices. He has proposed a specific project within the overall framework.

Overview of the Web Publishing Project

Data is at the heart of computer programming much as food is the soul of cooking. There are circuits, logic loops, algorithms, numerical calculations, and there is text. Text is the data that makes up words,
sentences, paragraphs, books and documents. Text is the data that shares our thoughts and makes up our language. Manipulating text is a challenge in itself. Our project works to create collaborative websites to capture and repurpose text. We create schemas to capture information in a particular
format. We store text in databases, XML files, and a variety of proprietary methods. The challenge is to manipulate the text, reformat it, search it, print it, display it, syndicate it and display it on mobile devices.

The holy grail in the publishing world is the marriage of programmatic control of text and with the output of a sophisticated word processing or page layout program. Text must be separated from formatting until it is ready for the final media in which it is displayed. Cornell Cooperative Extension is working to develop a complex collaborative system using Microsoft SharePoint technologies, asp.net, XML, XSLT, SQL and DocBook to create, store, format and manipulate thousands of pages of information. We
have developed an end to end system in which we guide authors through the process of creating information and then we store it in SQL, docx or XML files, and then apply a custom XSLT to create final output. We use .net to script workflows and to manipulate outputs and the goal is to creative
derivative works from our content management system, PDF them on the fly, and send them to an ecommerce system for payment before delivering them to the user, print shop, website or wherever their final media and destination might be.

The Cornell Cooperative Extension collaborative web publishing project is a fascinating exploration of many aspects of computer science not seen in the classroom. We work with a variety of proprietary and open source technologies to solve real life problems in an academic environment. We work
with collaborative, social networking technologies, XML, RSS, .net, C#, SQL, and most of the other technologies you have learned in the classroom. Our program has national attention and the potential for profitable return to the University to facilitate the delivery of much needed information for public use. We produce information to make the world a better place.

The CS 5150 Project

The student team will be required to evaluate and modify the prototype of our collaborative web publishing system. Author input is guided by creating an XML schema which can be used in MS Word, MS Info Path or a custom .net web form. Students will be asked to modify all three types of forms. A
custom XSLT is run against the XML file to create a new DocBook XML file that controls and contains the XML metadata and attributes needed for publishing to various media. Students will need to learn how to customize XSLT files. The files are stored in MS SharePoint. When a user requests a file - in print or HTML, another XSLT file is run to create the desired output. A user may request that the material in multiple files be placed together in a single PDF file that can be printed on demand. This requires a custom script file written in .net or c#. The PDF file must then be passed to an ecommerce system (which has not yet been developed). An XML attribute containing the value of the text or data must be parsed to create an invoice for the total cost of the derivative work and the PDF file will be delivered to the user or sent to a printer.

To be successful on this project students should have some interest in publishing information. This system has huge implications for how information is generated, stored and disseminated to the public. Skill are needed in .net, C#, XML, XSLT, HTML, and SQL. We will be using MS SharePoint, Visual Studio, Office, DocBook (http://docbook.org) , OxygenXML editor, and we will be choosing and implementing an ecommerce system.

William Y. Arms
(wya@cs.cornell.edu)
Last changed: January 20, 2009