The 4999 and 7999 projects that I supervise generally involve a mix of
reading research papers, design and implementation work, and a final writeup of
about 5,000 words. Students should have solid programming experience; taking a
prior 4000-level project course is a good idea. A good background for many of these
projects is a course with the number x1y0, where x ≥ 4. Examples include
CS 4120 (Compilers), CS 4110 (Programming Languages), and CS 6110 (Advanced Programming
The following are some projects that may be available. Students seeking a
project should think about what they would like to work on before talking to
provides a sophisticated way of automatically identifying
the locations of programmer mistakes that cause programs not to type-check. It has been
shown to significantly increase the accuracy of error localization for multiple
programming languages: OCaml, Haskell, and Jif. SHErrLoc defines a powerful constraint
language that can be used to express a wide range of interesting type systems, and could
in principle be used as the type checker for future compilers. However, it's
not currently in a state where it is easy to adopt. There are two major directions that
could lead to a lot of impact:
SHErrLoc is currently implemented as a standalone Java program. For
integration into compiler implementations, it would be much better to have it
implemented as a library. Further, the constraint language is currently not
well documented; a reference manual is needed, including examples of how to
encode various type system features into the constraint language.
One of the best examples of the effectiveness of SHErrLoc is its application
to the Haskell programming language. We showed that accuracy of type
inference errors could be increased to about 90%, a significant improvement
over GHC. However, implementing SHErrLoc as a separate Java program is not a
good strategy for integration into GHC. A good project would be to build a
GHC type-checker plugin that implements the SHErrLoc algorithm; such a plugin
could have a large impact on the Haskell ecosystem by easing the Haskell
The Genus programming language (http://www.cs.cornell.edu/projects/genus) is
Java-like language with a powerful new capability for generic programming,
related to Haskell's type classes. Some projects:
- explore how to use the new language features most effectively by
developing libraries for Genus, such as the library of collection classes.
- work on the compiler to improve code quality and support for
- The new Familia language extends Genus with support for family inheritance.
We also are developing a compiler for that language.
- Reduct is a new
game for teaching programming languages via operational semantics. We are expanding the game to
become a real programming environment. Students are needed with interests in
user interface, programming languages, education, and game design.
[Reduct version 1 |
Reduct version 2 ]
is a secure wide-area distributed object store and a big system-building
project. Fabric objects look like Java objects but are secure and are
transparently distributed and persistent.
- Automatically moving computation and data storage to distributed locations
to make programs secure and efficient
- Extending the language to support parallel distributed applications (MapReduce
- Implementing larger Fabric applications (e.g., CMS) to explore how well
we can capture security requirements.
- Studying how to make Fabric's software transactions as efficient as possible.
- Adding mechanisms for adaptively switching between optimistic
concurrency control and pessimistic locking.
- Improving the power of policy inference.
- Exploring richer security policy languages.
- Developing an Eclipse plugin for Fabric.
- Developing an Android version of Fabric.
is a new, secure voting system. Civitas is the first
voting system implementation that allows voters to vote securely from the
remote client of their choice.
A secure voter client that supports coercion resistance and cut-and-choose style
protection against malicious clients.
Additional tabulation mechanisms to protect against application-level DoS attacks.
Using threshold cryptography to improve availability of the system.
CIVS is the actively running voting system
that we developed and support. We have 17+ years of data from polls people
have run, including more than 13,000 polls and more than 300,000 votes.
This is a rich source of data about how preferential voting works in the
wild. It would be interesting to analyze this data to see how different
voting systems compare -- how often does IRV choose a non-Condorcet winner?
How often do different Condorcet methods disagree? And can we produce a
truly anonymized, publicly releasable version of the dataset that still
yields the same conclusions?
Polyglot: an extensible compiler framework for building extensions
to the Java programming language. Dozens of compilers have been built on top
of Polyglot by research groups all over the world, so work on Polyglot will
immediately be appreciated.
- Make Polyglot work on Java 9+. Currently its back-end compiler support relies on
an older Java compiler infrastructure that went away with Java 8. This should not be
very difficult and would deliver a lot of benefits to Polyglot users.
- Add features from Java 8-10—especially, lambdas. Currently Java 7
is implemented by Polyglot.
- Add more features to JLang,
our LLVM back end for Polyglot. JLang supports ahead-of-time
compilation from Java to any architecture supported
Implement a back end targeting the recently released
Counterexample generation for Bison. As CS 4120 students learn,
Polyglot's parser generator is easier to use than other parser
generators because it reports compact counterexamples when grammars
can't be parsed. A nice self-contained project with high impact would
be to extend the widely used parser generator Bison, the GNU version of
Yacc, to implement this algorithm. The current Bison maintainers
seem enthusiastic about this extension.
- Improve support for Javadoc, so that extended languages can generate
documentation easily. This can be built using Polyglot itself.
- Polyglot has some support for easy development of Eclipse IDEs for
extended languages, but it could be improved significantly, especially
for navigation and debugging.
- Extend the existing Polyglot tutorial so developers can more easily
learn advanced features of Polyglot like program analysis, quasiquoting,
and in-place translation.
- Course Management System.
The course management system used by the Computer Science Department
and now also hosted at CIT and used by other departments
was developed by students.
New team members are welcome.
[ CMS web site
| CMS system ]
- adding new scheduling and calendaring features,
- making it scale to the size of MOOCs
- developing the external API for integration with other services
- porting it (back) to Fabric.
- Also/A-mud: The
Also programming language is intended for
building internally extensible, decentralized virtual environments, along the lines of
Vinge's True Names,
Gibson's Matrix, or
A core technical difficulty with this vision has always been security. The
Also language has some novel and powerful security mechanisms to allow
secure systems to be built, and these mechanisms have been used in the
construction of A-mud, a
simple text-based virtual world that anyone can play with. The language
Also seems to be a good testbed for designing blockchain programming
language security mechanisms. However, the security of A-mud needs careful
evaluation and perhaps extension, and this will likely affect the language
design too. It would also be interesting to explore whether Also/A-mud can
be made concurrent and geodistributed, to support massive numbers of
- Hierarchical partitioned searching: The classic single-player game
of Sokoban requires hundreds of moves of searching; fortunately, it has some
properties that makes it amenable to a new kind of hierarchical searching
algorithm. Other games like Go are encouragingly similar. Experience
in search algorithms and ML programming is important.
- netview: an intuitive distributed graphical network monitoring tool
- codewriter: a web markup language for high-quality
pretty-printing of program code on web pages, so program code
fills the available horizontal width while remaining readable.
- jshell: a shell interface with integrated Java and Windows
support, smoothly combining a language-based (console prompt) interface with
a icon-based graphical interface.
- JMatch: pattern matching and logic programming for Java.
Some papers on JMatch: [ Iterable Abstract Pattern Matching for Java (PADL'03), Interruptible Iterators (POPL'06) ]
- unifying pattern matching with predicate dispatch
- unifying pattern matching with lenses
- adding more powerful predicate solving using external solvers (SAT, numerical, ...)
- developing an Eclipse plugin using the Polyglot IDE
- Building an LLVM back end (based on JLang) that can exploit LLVM
coroutines to achieve excellent performance
The ultimate graphical diff-merge tool, supporting N-way merging, RCS/CVS support, and
editing. Related to other tools like
a viewer and editor for version-controlled files
under RCS and CVS. Rcsview helps the user to figure out a source file
evolved and to identify who's responsible for any given code segment.
- Constrain is a
constraint-based system for creating animated figures on web pages and in
web-based presentations. There are many opportunities to extend its design: for
example, to add interactive editing or 3-D graphics.
We are programming robots using the Jif programming language, for greater
software security when robots are being attacked in various ways. There are
a lot of subprojects associated with this; one near-term goal is to get
Jif compiling onto the RISC V processor via our Polyglot/LLVM front end.
We can then run Jif code on an FPGA simulating RISC V.