Department of Computer Science 


CS 6410: Advanced Systems

Fall 2015

   
    
* Home
* Schedule
* Labs
* Project
 

[ Final Projects | Project Information | Project Ideas | Project Proposal | Project Survey Paper | Midterm Paper | Peer Review | Final Demo | Final Paper ]


"Mini" Project information

Starting in Fall 2014, students in CS6410 will need to do two mini-projects aimed at confirming their mastery of basic architecture and operating systems material.  Neither is intended to take long or be very hard, and both are designed to also be useful to you even if you plan to work in a research area very remote from systems.

Mini-project I: Due September 15.

Goal of the project:  The goal here is to try to make some application very, very fast by optimizing it to perform well on some specific multicore architecture -- it could be the multicore machine you use at home, your desktop at Cornell, or one of the more heavily multicore platforms in our labs.  (Totally up to you what you decide to use; pick something convenient, but with at least 2 and ideall 6 or more cores).

Key steps:  If you are a reasonable hacker, we would like you to build a multithreaded version of Conway's Game of Life cellular automata application.  This is a very simple computation that runs in "generations" on a flat surface -- if you like, you can think of the n'th generation as living within a 2n x 2x array of small cells, each containing a one-bit value (true = "live" or false = "dead"), with 0,0 at the center.  The basic game is this: the user employs a GUI to dot some initial configuration of occupied cells by clicking them on the display.  Then each generation is computed from the prior one by applying the following rule: to compute generation k from generation k+1:

  1. Any live cell with fewer than two live neighbours dies, as if caused by under-population.
  2. Any live cell with two or three live neighbours lives on to the next generation.
  3. Any live cell with more than three live neighbours dies, as if by overcrowding.
  4. Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

You can see some nice animations of the game on the Wikipedia page listed above.  Most implementations have some form of "active block" representation, so that rather than needing to scan the entire 2n x 2n region that could have live cells in it, they focus on smaller blocks that actually do have live cells in them.  Further, most good solutions will be multi-threaded. Again, you can read about coding ideas on the Wikipedia page, and can even find links to fairly good open source implementations.   One, called Golly, is remarkably fast -- it can compute the 90'th quintillian generation for one complex configuration in just 90 seconds on a desktop.  Of course, there might not be a lot of speed left to squeeze from that implementation.  Doing one of your own guarantees that there will be plenty of ways to optimize the code.

I'm not very interested in the performance of the GUI, but obviously you will want one.  I would suggest that you use your favorite GUI builder and support a few modes of operation.  But then when measuring performance, don't look at the cost of the GUI itself.  Focus on how many generations you can compute per second for some hard scenario, like the Turing Machine used in that Golly example I mentioned.  Your GUI could have a slider for how often to display the picture...

Now, we really do prefer that you build and tune your own code.  But if you are not a good hacker, or have built some other big performance intensive program recently and prefer to optimize that program, or (worst choice) really, really want to use code from some other source, we'll allow that.  But keep in mind: if you didn't code it, you probably won't understand it well and may have trouble optimizing it.  So that could make this project harder, not easier!

What we want you to do:  You'll work on Linux or Windows (we don't have a preference).  We  recommend that your code be written in C or C++.  We aren't going to insist, but this is still our preference.

So in this language, you will either have coded the Game of Life, or will have decided to work with some other performance-intensive chunk of code from your own past, or from some other source.  Find a test case that really makes this program work hard.  Make sure your code is multithreaded and able to leverage a multicore computer -- we aren't interested in unthreaded code here!

Now, run your program on a multicore computer and compare its speed for the identical task on a single core versus multicore for the same number of threads.  Most of the desktop computers in the department are at least 4-core machines and many have more; there are quite a few 32-core machines around.  Theo can explain how to track such a machine down; you may need to ask the user's permission but you should be able to remote login to pretty much anything with permission. You should aim to build and test your project on your 4/8-core desktops, and only collect occassional performance data from larger machines.  Your laptop probably has 2 cores but might even have as many as 8 or 12.  It isn't hard to check.

Your challenge: tracking what you do at each step, squeeze all the speed you can out of the solution you built.  Test it on a mix of "classic" Game of Life scenarios (you can find a bunch on the Wikipedia web site link shown above) and on random 2n x 2n intial configurations, for various scenarios that should be challenging to your program.  We aren't going to be very specific here about what exactly to measure in order to assess speed, because we see that as part of your challenge: we want you to figure out how one even can charactize speed, given the very different ways "life" evolves depending on the initial configuration.

Baffled?  Read that Wikipedia web page about the Game of Life, the one mentioned above.  It is very interesting, very well written, and has lots of pictures.  You'll definitely "get it" in no time.

Things to think about: your multithreaded code presumably loses some time in lock contention.  How much?  It may have poor L2 cache behavior: would it make sense to check for low L2 cache hit performance, or false page sharing?  What about CPU prefetch stalls and branch-prediction errors? 

You can find tools that will help you online.  A whole discussion of this topic can be found here.  We would recommend that you start with simple things: you know the adage about 90% of the time being spent in 10% of the code.  Do you know which is "your" core 10%?  Use one of these tools to help you narrow in on it, focusing on how your program behaves when it is NOT doing GUI I/O (the GUI logic will be very slow compared to just executing to compute the next generation).  We don't care about slow execution in the GUI, but we definitely care about speed of computing future generations.  Ideally, go from "big issues" down and inward: are your data structures efficient?  Is locking and unlocking taking an unexpectedly high amount of time?  Try to visualize where time is being spent in that heavily loaded inner loop.

It may be tempting to just recode your version of the game itself again and again.  But we're actually more interested in how to hold one version kind of constant, then optimize it -- making very small changes.  This sheds more light on the role of the computer architecture in shaping speed.  Anyone can speed up a program by recoding it 20 times.  But taking a single instance and tweaking it to run fast -- that's the talent we hoped you might exercise on this project.

As you get fancier, the more sophisticated tools will be more useful.  But the best ones only work for code in C or C++.  So this is one more reason to do this particular assignment in a simple language where the mapping from code to instructions is kind of obvious.

What to hand in: When finished, hand in a short (certainly no more than 6 pages; 2 should be fine) writeup describing your findings, written in a scientific style (e.g. as distinct from an email to a friend from home).  Include graphical content if that's the best way to explain things, or "microbenchmark" results in tabular form, or other kinds of analytic material.  Think hard about what you learned and how to communicate that knowledge to Ken and Matt.  Some ideas regarding the paper: consider looking at good quality graphics in other kinds of performance-oriented systems papers: how did the authors use graphs to communicate succinctly?  Consider the ways one might challenge your claims: are you adequately supporting your assertions with rigorous evidence?  If your paper feels too long, think hard about what really matters and what is more or less window-dressing.  Which were the top five lessons learned?  Are you using a lot of paper on the next ten, or on things that were dead ends?

Why so short?  We believe that a really good systems analysis gets to the point without wandering around, and is structured and clear.  Further, writing in a short and clear way forces you to think about using graphics as a communication tool.  Many systems workshops (like HotOS) have a six page limit; some actually have a two page limit.

What are we hoping you might learn?  Everyone worries about performance no matter what they are doing in CS.  Performance of a multicore program can be surprisingly hard to optimize: you need to think about scheduling, contention, sharing, cache evictions, etc.  So it can be hard to really get high speed even from a powerful 12-core desktop "supercomputer".  You'll gain experience doing so, and will also find it challenging to communicate your findings in a succinct way, which is a skill that will benefit you later when writing other kinds of papers, whether in systems or in other areas of CS.

Mini-project II: Due Oct 2.

This will involve running Paxos as a service on EC2 using Elastic Beanstalk and then experimenting to find the performance-limiting bottlenecks.  Details will be posted soon.

"Main" Project information


Introduction

For the project, you can write a research paper or, if you prefer a more hands-on approach, build, design, implement an interesting system of your choice. There are six deadlines:

Ideas

You should feel free to choose any project you like, as long as it is related to storage systems, distributed systems or operating systems. It must have a substantial system-building and evaluation component. A successful class project usually have very well defined goals and is modest in scope (Remember, you only have 2.5 months to finish it). You could look for inspiration about hot topics in the on-line proceedings of recent SOSP, OSDI, Usenix, and NSDI conferences. Tips on preparing a paper appear appear here. Here's a list of ideas that we think could lead to interesting projects.  Start as soon as you possibly can!

  • Operating system features to better leverage RDMA
  • New cloud-scale computing services, perhaps focused on applications such as the smart power grid, smart self-driving cars, internet of things, smart homes
  • Study the security and distributed systems properties of BitCoin
  • New systems concepts aimed at better supporting “self aware” applications in cloud computing settings (or even in other settings)
  • Building better memory-mapped file systems: current model has become outmoded and awkward
  • Tools for improving development of super fast multicore applications like the one in mini-project one. 
  • Software defined network infrastructure on the systems or network side (as distinct from Nate’s focus on the PL side)

The best projects will lead to papers you can submit to a good conference.  We're ok with projects that have CS 6410 content but are tied to work you are starting with a faculty member in the field.  The key thing is that projects need to be serious research activities that lead to publishable or at least very high quality papers (12-14 pages, written very well and in a scientifically literate style, aimed at a clearly defined community).

Questions or comments? email ken@cs.cornell.edu

Policy on academic integrity