Department of Computer Science 


CS 5413: High Performance Systems and Networking (Fall 2014)

Mon/Wed/Fri 1:25-2:15; 205 Thurston Hall

   
    
* Home
* Schedule
* Labs
* Project
 
  Tiny Data with Raspberry Pi's: An Exploration of Low-Cost MapReduce Clusters
Jeremy Feinstein, Brian Kutsop, Kuan-Lin Chen
 

As data centers have increased in size, there has been a push to create clusters out of cheaper, more affordable commodity parts that can easily be replaced upon failure, and that create more affordable data centers overall. However, such large clusters are still outside of feasibility for individuals and small businesses. It is a worthwhile exercise to see if much smaller clusters could be created for such applications, and to compare their performance / price measure to that of traditional datacenters. In our case, we explored creating such a cluster with Raspberry Pi's which are $30 credit-card-sized, single-board computers. More specifically, we built a distributed data processing architecture in Python that runs on a cluster of four Raspberry Pi's and closely resembles Google's MapReduce architecture. In order to profile the performance of the system, we wrote several example MapReduce jobs such as counting words, calculating baseball statistics, and counting n-gram frequency for text documents.

This project will be extended to create a type of "plug-and-complete" networking project that can be used to teach and introduce networking concepts in one of Cornell's primary systems class: CS 3410 or CS 4410. This will also include coming up with a complete instruction set and set of guidelines to support students completing the project. During the implementation process, all members of our team learned previously unknown skills, including how to create a cluster, programming the infrastructure that lies under a single, physical switch, and analyzing system throughput.

 
    Paper [ PDF ]
Presentation [ PDF ]
Source Code [ TGZ ]