CS514: Fault-tolerant Distributed Computer Systems -- Overview and Organization

Course Overview. Distributed systems are difficult to build and understand, for all sorts of reasons.  Failures are common in large systems, and we can't let them shut the application down; a well-engineered system will tolerate failures and repair any damage they caused.  A single event, perceived at multiple locations, may not be totally ordered with respect to other, conflicting, events.   Networks have annoying connectivity and bandwidth properties, that force the designer to confront challenging engineering trade-offs.

The focus of CS514 is on the principles and techniques that one can use to achieve a high-quality, trustworthy, fault-tolerant distributed system. Lectures will present the principles; programming assignments will enable students to put these principles into practice.  The course project will expose students to state of the art technology platforms (notably web services) but will also involve using cutting-edge techniques that are not available (yet) in products.  The goal is to understand what these kinds of platforms can be expected to do “without help” but also to get used to the idea that one can push beyond their limitations when necessary, and that doing so can open the door to all sorts of creative possibilities.

Our treatment of the topic will be set in the context of a major new architectural standard for building distributed systems: the so-called "Web Services" architecture.  Web Services are easy to build and the architecture is easy to understand, but the limitations just mentioned are serious obstacles to building "trustworthy" Web Services applications that scale well, manage themselves, are fault-tolerant, and have other robustness properties.  With this in mind, we'll skim the basic architecture quickly and will focus our treatment on the best ways of achieving these kinds of properties in systems that also adhere to the Web Services standards.

Course URL:   http://courses.cs.cornell.edu/cs514/2007sp

Lecture: Attendance is required. During the spring semester of 2007, the class meets on Monday and Wednesday but NOT on Friday.  The idea is to leave a day for travel – we know that many of you will be interviewing for jobs, pitching ideas to venture capitalists (see assignment 3!), etc.  If you leave town on Wednesday you can interview on Thursday and Friday and still be rested and ready on Monday morning!


Hacking help is ALWAYS available!  Moreover, we all monitor the newsgroup.  And feel free to jump in if you know the answer to someone else's question. 


TA Hours in a by-name format:


Primary course TA:  Krzys Ostrowski


TA Office Hours:



Teaching Staff:

Professor Ken Birman   (255-9199)
4115C Upson Hall
Office hours:   Ken doesn't have specific office hours but is available from 10am until 2:30pm most days.  Just drop in.  If you prefer an appointment, contact Bill Hogan (whh@cs.cornell.edu)
email: ken@cs.cornell.edu.  

Prerequisites. The course is open to any undergraduate or graduate student who has mastered the material in CS414 (Operating Systems) or CS519 (Engineering Computer Networks) or EE445 (Computer Networks and Telecommunications).   We do not require that you have taken these courses at Cornell, nor is it required that you have taken all of these courses (however, a student who hasn’t taken any of them probably won’t be ready for CS514). 

The course programming assignments are designed to be completed in the Visual Studio C# language using its ASP.NET framework.  We also accept projects in Java using the J2EE environment, but if you do use Java, we may not be able to help you out if you get stuck!  C# and Java are essentially identical languages, the real differences are in the associated runtime environment and libraries.  Thus if you know how to program in Java, you know C#, and just didn't know that you know it!  The libraries are extensively documented in the Visual Studio online help system and you should be able to cut and paste anything elaborate, such as thread creation or tricky kinds of event handling.

Reading.  Reliable Distributed Systems: Technologies, Web Services and Applications (Ken Birman; Springer Verlag).  This is available on reserve in the Engineering library if you prefer not to own a copy. 


Past students have recommended an e-book that may help you quickly learn C# programming.  The book is available for free from within the Cornell network just by pressing a button for agreement: http://library.books24x7.com/book/id_13568/toc.asp?bookid=13568


Assignments and Grading. In keeping with the professional (and practical) orientation of the course, homework assignments are underspecified, open-ended, and motivated by problems that arise in the real world (messy as it is). You will have to think, refine problem specifications, make reasonable and defensible assumptions, and be creative.

Most of your grade is based on the programming project, which involves the design and implementation of fault-tolerant, distributed and scalable services.  Some of the assignments are based on the kinds of things a bank might need if it were supported by data centers at multiple locations.  Other assigments focus on adding dynamic content to the Internet – one of the hallmarks of what some people are predicting will be the defining characteristic of Web 3.0, a much anticipated revolution in Internet content.  Notice that in contrast to previous offerings of CS514, we won’t be building a single big project.  Instead, we’ll do several smaller ones throughout the semester.

Your final course grades will be computed as follows:

Assignments are due on the date stipulated.  If you can’t complete an assignment on time, you should meet with Professor Birman or one of the TAs to discuss the situation.  Sometimes we prefer to receive an incomplete solution so that you can get to work on the next assignment; in other cases we might be able to help you break through whatever is holding you up and finish the solution up quickly.  But no matter how a problem is to be resolved, we want you to begin the dialog with us before the deadline, not at the last second.

Students must work alone on assignments 1 and 2, but  can form 2 or 3-person groups for assignment 3. Although all our projects will be of a size that a single person could tackle on their own, working with other people can lead to a better understanding of the material, and will help you to develop collaboration skills that should prove helpful throughout your career. All participants, however, should be able to explain the entire content of any submitted solution.  Large groups are not permitted for the first two assignments in spring of 2007. 

MEng Project Option. Students enrolled in the Master of Engineering program in Computer Science may use the CS514 project to satisfy their project requirement for that degree.  Here's what you would need to do: