Undergraduate and Masters Research Opportunities

The 4999 and 7999 projects that I supervise generally involve a mix of reading research papers, design and implementation work, and a final writeup of about 5,000 words. Students should have solid programming experience; taking a prior 4000-level project course is a good idea. A good background for many of these projects is a course with the number x1y0, where x ≥ 4. Examples include CS 4120 (Compilers), CS 4110 (Programming Languages), and CS 6110 (Advanced Programming Languages).

The following are some projects that may be available. Students seeking a project should think about what they would like to work on before talking to me.

  • Constrain is a constraint-based system for creating animated figures on web pages and in web-based presentations. There are many opportunities to add new functionality:
    • interactive WYSIWYG editing, so you can update a figure either by writing code or through a UI
    • 3-D graphics
    • GPU-based constraint solving
  • The SHErrLoc system provides a sophisticated way of automatically identifying the locations of programmer mistakes that cause programs not to type-check. It has been shown to significantly increase the accuracy of error localization for multiple programming languages: OCaml, Haskell, and Jif. SHErrLoc defines a powerful constraint language that can be used to express a wide range of interesting type systems, and could in principle be used as the type checker for future compilers. However, it's not currently in a state where it is easy to adopt. There are two major directions that could lead to a lot of impact:
    • SHErrLoc is currently implemented as a standalone Java program. For integration into compiler implementations, it would be much better to have it implemented as a library. Further, the constraint language is currently not well documented; a reference manual is needed, including examples of how to encode various type system features into the constraint language.
    • One of the best examples of the effectiveness of SHErrLoc is its application to the Haskell programming language. We showed that accuracy of type inference errors could be increased to about 90%, a significant improvement over GHC. However, implementing SHErrLoc as a separate Java program is not a good strategy for integration into GHC. A good project would be to build a GHC type-checker plugin that implements the SHErrLoc algorithm; such a plugin could have a large impact on the Haskell ecosystem by easing the Haskell learning curve.
  • Genus The Genus programming language (http://www.cs.cornell.edu/projects/genus) is Java-like language with a powerful new capability for generic programming, related to Haskell's type classes. Some projects:
    • explore how to use the new language features most effectively by developing libraries for Genus, such as the library of collection classes.
    • work on the compiler to improve code quality and support for code evolution.
    • The new Familia language extends Genus with support for family inheritance. We also are developing a compiler for that language.
  • Reduct is a new game for teaching programming languages via operational semantics. We are expanding the game to become a real programming environment. Students are needed with interests in user interface, programming languages, education, and game design. [Reduct version 1 | Reduct version 2 ]
  • Fabric is a secure wide-area distributed object store and a big system-building project. Fabric objects look like Java objects but are secure and are transparently distributed and persistent.
    Projects:
    • Automatically moving computation and data storage to distributed locations to make programs secure and efficient
    • Extending the language to support parallel distributed applications (MapReduce and beyond)
    • Implementing larger Fabric applications (e.g., CMS) to explore how well we can capture security requirements.
    • Studying how to make Fabric's software transactions as efficient as possible.
    • Adding mechanisms for adaptively switching between optimistic concurrency control and pessimistic locking.
    • Improving the power of policy inference.
    • Exploring richer security policy languages.
    • Developing an Eclipse plugin for Fabric.
    • Developing an Android version of Fabric.
  • Civitas is a new, secure voting system. Civitas is the first voting system implementation that allows voters to vote securely from the remote client of their choice.
    Projects:
    • A secure voter client that supports coercion resistance and cut-and-choose style protection against malicious clients.
    • Additional tabulation mechanisms to protect against application-level DoS attacks.
    • Using threshold cryptography to improve availability of the system.
  • CIVS is the actively running voting system that we developed and support. We have 17+ years of data from polls people have run, including more than 13,000 polls and more than 300,000 votes. This is a rich source of data about how preferential voting works in the wild. It would be interesting to analyze this data to see how different voting systems compare -- how often does IRV choose a non-Condorcet winner? How often do different Condorcet methods disagree? And can we produce a truly anonymized, publicly releasable version of the dataset that still yields the same conclusions?
  • Polyglot: an extensible compiler framework for building extensions to the Java programming language. Dozens of compilers have been built on top of Polyglot by research groups all over the world, so work on Polyglot will immediately be appreciated.
    Projects:
    • Make Polyglot work on Java 9+. Currently its back-end compiler support relies on an older Java compiler infrastructure that went away with Java 8. This should not be very difficult and would deliver a lot of benefits to Polyglot users.
    • Add features from Java 8-10—especially, lambdas. Currently Java 7 is implemented by Polyglot.
    • Add more features to JLang, our LLVM back end for Polyglot. JLang supports ahead-of-time compilation from Java to any architecture supported by LLVM.
    • Implement a back end targeting the recently released GraalVM.
    • Counterexample generation for Bison. As CS 4120 students learn, Polyglot's parser generator is easier to use than other parser generators because it reports compact counterexamples when grammars can't be parsed. A nice self-contained project with high impact would be to extend the widely used parser generator Bison, the GNU version of Yacc, to implement this algorithm. The current Bison maintainers seem enthusiastic about this extension.
    • Improve support for Javadoc, so that extended languages can generate documentation easily. This can be built using Polyglot itself.
    • Polyglot has some support for easy development of Eclipse IDEs for extended languages, but it could be improved significantly, especially for navigation and debugging.
    • Extend the existing Polyglot tutorial so developers can more easily learn advanced features of Polyglot like program analysis, quasiquoting, and in-place translation.
  • Course Management System. The course management system used by the Computer Science Department and now also hosted at CIT and used by other departments was developed by students. New team members are welcome.
    Projects:
    • adding new scheduling and calendaring features,
    • making it scale to the size of MOOCs
    • developing the external API for integration with other services
    • porting it (back) to Fabric.
    CMS web siteCMS system ]
  • Also/A-mud: The Also programming language is intended for building internally extensible, decentralized virtual environments, along the lines of Vinge's True Names, Gibson's Matrix, or Stephenson's Metaverse. A core technical difficulty with this vision has always been security. The Also language has some novel and powerful security mechanisms to allow secure systems to be built, and these mechanisms have been used in the construction of A-mud, a simple text-based virtual world that anyone can play with. The language Also seems to be a good testbed for designing blockchain programming language security mechanisms. However, the security of A-mud needs careful evaluation and perhaps extension, and this will likely affect the language design too. It would also be interesting to explore whether Also/A-mud can be made concurrent and geodistributed, to support massive numbers of simultaneous players.
  • Hierarchical partitioned searching: The classic single-player game of Sokoban requires hundreds of moves of searching; fortunately, it has some properties that makes it amenable to a new kind of hierarchical searching algorithm. Other games like Go are encouragingly similar. Experience in search algorithms and ML programming is important.
  • netview: an intuitive distributed graphical network monitoring tool
  • codewriter: a web markup language for high-quality pretty-printing of program code on web pages, so program code fills the available horizontal width while remaining readable. Has been implemented both in Java and as a reusable JavaScript module.
  • jshell: a shell interface with integrated Java and Windows support, smoothly combining a language-based (console prompt) interface with a icon-based graphical interface.
  • JMatch: pattern matching and logic programming for Java.
    Some papers on JMatch: [ Iterable Abstract Pattern Matching for Java (PADL'03), Interruptible Iterators (POPL'06) ]
    Projects:
    • unifying pattern matching with predicate dispatch
    • unifying pattern matching with lenses
    • adding more powerful predicate solving using external solvers (SAT, numerical, ...)
    • developing an Eclipse plugin using the Polyglot IDE
    • Building an LLVM back end (based on JLang) that can exploit LLVM coroutines to achieve excellent performance
  • ngdiff: The ultimate graphical diff-merge tool, supporting N-way merging, RCS/CVS support, and editing. Related to other tools like rcsview, a viewer and editor for version-controlled files under RCS and CVS. Rcsview helps the user to figure out a source file evolved and to identify who's responsible for any given code segment.
  • We are programming robots using the Jif programming language, for greater software security when robots are being attacked in various ways. There are a lot of subprojects associated with this; one near-term goal is to get Jif compiling onto the RISC V processor via our Polyglot/LLVM front end. We can then run Jif code on an FPGA simulating RISC V.