CS 5413: High Performance Systems and Networking, Fall 2014

Department of Computer Science

CS 5413: High Performance Systems and Networking (Fall 2014)

Mon/Wed/Fri 1:25-2:15; 205 Thurston Hall



* Home * Schedule * Labs * Project * CMS		Final Projects Rasberry Pi related projects Tiny Data with Raspberry Pi's: An Exploration of Low-Cost MapReduce Clusters Jeremy Feinstein, Brian Kutsop, Kuan-Lin Chen As data centers have increased in size, there has been a push to create clusters out of cheaper, more affordable commodity parts that can easily be replaced upon failure, and that create more affordable data centers overall. However, such large clusters are still outside of feasibility for individuals and small businesses. It is a worthwhile exercise to see if much smaller clusters could be created for such applications, and to compare their performance / price measure to that of traditional datacenters. In our case, we explored creating such a cluster with Raspberry Pi's which are $30 credit-card-sized, single-board computers. More specifically, we built a distributed data processing architecture in Python that runs on a cluster of four Raspberry Pi's and closely resembles Google's MapReduce architecture. In order to profile the performance of the system, we wrote several example MapReduce jobs such as counting words, calculating baseball statistics, and counting n-gram frequency for text documents. This project will be extended to create a type of "plug-and-complete" networking project that can be used to teach and introduce networking concepts in one of Cornell's primary systems class: CS 3410 or CS 4410. This will also include coming up with a complete instruction set and set of guidelines to support students completing the project. During the implementation process, all members of our team learned previously unknown skills, including how to create a cluster, programming the infrastructure that lies under a single, physical switch, and analyzing system throughput. HoneyPi: A distributed network Honeypot on the Raspberry Pi platform Maxwell Dergosits, Rob McGuinness, Naman Aragwal We describe the operation of HoneyPi, an architecture utilizing Raspberry Pis to create a scalable distributed honeypot to capture and analyze network traffic. We evaluate the feasibility of using Raspberry Pis in such an architecture, and the performance of the system itself. Finally, we aim to create a general project that may be utilized by CS 3410 students. SoNIC related projects SoNIC over 1G Adithya Venkatesh, Nandini Nagaraj, and Rafael Farias Marinheiro In standard environments, both Data Link and the Physical (PHY) layers are defined in the Network Interface Cards (NIC) and they cannot be accessed in real-time via software. However, these lower layers contain valuable information that can be used to measure and to improve the performance of the network. Recently, SoNIC was proposed to provide real-time access to the Physical Layer and it was used to accurately measure the performance of a high-speed wired complex network. Current SoNIC design and implementation only operates in 10 GbE (Gigabit Ethernet) devices. In this project, we present a SoNIC design for 1GbE devices as an effort to extend the SoNIC ecosystem. TCP Over SoNIC Abhishek Kumar Maurya, Amarinder Singh Bindra, and Gaurav Aggarwal Implementation of Transmission Control Protocol (TCP) layer on top of Software-defined Network Interface Card (SoNIC). SoNIC currently is optimized to generate UDP packets because of real-time constraint. Generating TCP traffic over SoNIC requires maintaining a TCP state machine for tracking connection state and reliable transfer of data while satisfying real-time constraints and line speed. We target to build a TCP layer on top of SoNIC. TCP over SoNIC: Light Speed Xuke Fang and XingXiang Lao SoNIC, a Software-defined Network Interface Card, which provides the access to the physical layer and data link layers in software by implement- ing them in software, provides complete control over network in real time; Hence it gives system programmers unprecedented precision for network measurements and researches. However, SoNIC only supports UDP transmission protocol for now. In this paper, we present the implementation of TCP over SoNIC and evaluation of TCP through- put over different configurations. Network Analysis using SoNIC Junyu Chen, Yicheng Liang, and Zhihong Liu The physical and data link layer of the network stack contain valuable information. SoNIC (Software-defined Network Interface Card) provides software access to these layers by implementing them in software. It provides complete control over network stack in real-time. With SoNIC, we are able to perform precise network measurements. Also, GENI (Global Environment for Network Innovations) provides a virtual laboratory for networking and distributed system research and education. Therefore, we can gather valuable information, in particular, available bandwidth estimation, on GENI using SoNIC and perform corresponding analysis. NetSlice related projects NetSlice Bottleneck David Kelly and Ross Hanson This paper provides a basic analysis of the performance of NetSlice versus two similar frameworks, Intel DPDK and netmap. NetSlice, while extremely portable and scalable up to 30Gbps, hits a bottleneck when attempting to scale it further. Intel DPDK and netmap, on the other hand, are capable of scaling up to 40Gbps. We hope to find this bottleneck and provide evidence that NetSlice can be also be scaled up to 40Gbps. Intel DPDK and Virtual Machine related projects Performance Analysis of Intel DPDK on Physical and Virtual Machines Vidhya Sankaran and Divyesh Darde Intel DPDK (Data Plane Development Kit) is a set of libraries and drivers for fast packet processing on x86 platforms. It provides a programming framework that scales from Intel Atom processors to Intel Xeon processors and enables faster development of high speed data packet networking applications. In this project, we measured the performance of Intel DPDK on Physical and Virtual machines and also analysed and suggested possible improvements on what can be improved to get better performance on VMs. Supercloud related projects TincVPN Optimization Derek Chiang, Jasdeep Hundal, and Jisun Jung We explored ways to improve the performance for tincVPN, a virtual private network (VPN) implementation. VPN's are typically used for creating a (usually private) network between computers connected by the public Internet. TincVPN is noted for being easy to configure, but is about 5 - 15 times slower at sending data between two machines that are otherwise linked together by a sufficiently a fast connection, including virtual servers running on the same physical host. In this paper we first detail the design and architecture of tincVPN and how it relates to the observed performance issues. Then we propose two key optimizations, buffering the data packets that tincVPN sends and switching the mechanism by which tincVPN decides whether there is incoming data available to be read from the Linux system call select to the the system call epoll. Finally, we compare the performance profiles of the unoptimized and optimized versions of tincVPN and discuss areas for further improvement. Unfortunately, we have not seen the gains expected from our implemented optimizations. Optimizaion of VM migration in SuperCloud Remya Balasubramanian, Sarah Alabdullatif, and Yashaswini Shekarappa Supercloud is a cloud of clouds. It is a cloud service distribution layer that is decoupled from the provider. Supercloud leverages a nested paravirtualizatoin layer called the Xen-Blanket. Xen-Balnket provides multicloud homogeneity. The Xen-Blanket layer exposes a single VM interface to blanket guests such that a single guest image can run on any cloud without modifications. This enables migration of VMs across cloud providers. Live migration has become an integral part of any cloud infrastructure design. In this project, we aim at profiling and analyzing the live VM migration to identify and improve on the downtime especially in the stop and copy phase. Supercloud Storage Measurement Study and Optimization Sneha Prasad and Lu Yang Over the last few years a significant number of organisations have chosen to host their data and services with a few cloud service providers. These cloud providers act in capacity of both producers and distributors of cloud services, meaning that that they control the whole ecosystem of proprietary interfaces that are not compatible across different clo ud providers. A customer of one cloud service provider cannot shift vendors without incurring significant expensive downtimes, he/she is said to be "Locked-In" by the vendor. In this context, it becomes very important to control and regulate these cloud providers to prevent vendor lock-ins. Supercloud is a system proposed by [1]. It decouples providers and distributers by providing a uniform cloud service interface/ layer of abstraction layered on top of resources obtained from several diverse infrastructure-as-a-service (IaaS) cloud resource providers. Top layer of supercloud provides a uniform interface to customers, while the bottom layer talks to different service providers. Decoupling customers from cloud providers provides customers the flexibility to migrate across providers without incurring cost of starting from scratch again and again. Decoupling and layering of OS is acheived primarily through virtualization. RACS related projects RACS JAVA Gary Zibrat Cloud storage is becoming increasingly popular and cheap. It is convenient for companies to simply store their data online so that they don't have to buy a large amount of storage, set it up, and pay people to maintain it. Companies such as Netflix store as much as 3 petabytes of information in the cloud (typical home computers can store up to 1-2 terabytes of information and 1 petabyte is 1024 terabytes). When storing such a volume of data the client is susceptible to being locked in with a provider due to the high costs of transferring data out of the cloud and relatively low cost of actually storing the data. This report discusses the feasibility of storing files evenly over multiple providers in a system called Redundant Array of Cloud Storage (RACS). First I will discuss RACS evolution and then I will explain the current version of RACS and finally compare the performance of two versions of RACS. Project information How CS5413 Projects Work: What are the different project stages and what do we need in each one of them: Team Formation and Project Idea (due early-September [Wed, Sept 3rd]) Create Projects groups of two or three people. The full list of project participants needs to be submitted via CMS. Project Idea (due early-September [Fri, Sept 12]) The project title and difficulty level (either from the list below or, if you propose a new project, similar in style). A brief description of the project (either from the list below or, if you propose a new project, similar in style). If you will use your project for MEng credit, a sentence saying " This project will be used for CS MEng credit, approved by (Prof. Weatherspoon/Han/Ki Suh) ". Please note the rules given below! You must respect them or we can't approve the CS MEng credit request. The name to list can be Professor Weatherspoon or one of the TAs and that person must meet with you, discuss your plan for MEng project credit and approve your plan. Project Proposal (due mid-September [Fri, Sept. 19]) One paragraph on the high level context/motivation for what you are proposing. A paragraph (or bullet points) on what you will do to carry out the project: e.g. "We will get access to the NSF GENI Testbed with a SoNIC slice and implement our own SoNIC-enabled network measurement architecture for monitoring available bandwidth" etc. You should also accompany these actions with an estimated date of completion (time schedule for the project). You can evolve this plan later if needed; the one you file is an initial concept. A paragraph (or bullet points again) explaining how you will demonstrate the project (on completion, we will have a visual demo and a poster. The demo will show....) This can evolve over time too. Of the team of three, who will do what? How often will you meet? How many hours per week will you work on the effort? How many credits are you taking: can you really spend that number of hours per week? We allow teams of three for any project. Intermediate Report I (due mid October [Wed, Oct 29]) Old and current (if they are different) proposals, along with information about what has been achieved and how (green status), what is in progress and how it is going (yellow status) and what is not done at all (red status). You should also justify any changes made to the previous proposal. Intermediate Report II (due mid November [Wed, Nov 19]) Same as Intermediate Report I. Final Report (due December 5) Same as Intermediate Reports. We expect to see everything that is reported here as completed (green status) in the demo/poster day mentioned below. Demo Day (December 12) A presentation and/or poster explaining the architecture of your system and how it works and a live demo that demonstrates the full capabilities of the system. We'll also have the original plan with us and will want you to explain how you departed from the plan if the thing you actually do isn't quite what you originally had in mind. All the documents need to be commited to CMS (http://cms.csuglab.cornell.edu). The posters can also be used in BOOM in the spring MEng Project Credit: As you know, Cornell MEng students must do an MEng project. You can use your CS5413 project to satisfy this requirement. If you wish to use CS5413 for MEng project credit, just sign up for 3 credits, graded, of CS5999 with Professor Weatherspoon's code. We will use the CS5413 grade as the CS5999 grade. Note that this means your quiz scores in CS5413 actually count towards you CS5999 grade too. We must explicitly agree to your MEng project plan before you can consider yourself to be approved to do this. So please meet with Professor Weatherspoon, Ki Suh or Han within the first days of classes to get their approval for this plan. Due Date: CS5413 projects are due on the last day of the course, which is set aside as a project demo day. Project Ideas More than one person or team can independently undertake the same project. Possible project ideas are listed below. It is prefered that groups select projects from below. But, student groups can propose their own project ideas as well. SoNIC-related project ideas Network analysis on GENI (requires knowledge of SoNIC, SDN, and GENI) In this project, students will perform various network measurement experiments over ExoGENI. There are four SoNIC nodes in GENI, two in RENCI and two in UC Davis. Students will set up various network topologies and use SoNIC nodes to perform network profiling, available bandwidth estimation, and covert channels. TCP over SoNIC (requires kernel module programming experience) In this project, students will implement a network stack, in particular the TCP layer, on top of SoNIC implementation. SoNIC currently is optimized to generate UDP packets because of real-time constraint. Generating TCP packets is challenging because it requires maintaining a state machine while satisfying real-time constraints and line speed. Students will build a TCP layer on top of SoNIC that operates in line speed. 1G SoNIC (requires FPGA and kernel module programming experience) Current SoNIC design and implementation only operates in 10 GbE (gigabit Ethernet). This project aims to port SoNIC's precise network measurement capability to 1GbE environment. Students are required to implement encoding / decoding of the physical layer of 1GbE in the software and also FPGA firmware to operate SoNIC with 1G optic fibers. 40G SoNIC (requires kernel module programming experience) Current SoNIC design and implementation only operates in 10 GbE. This project aims to port SoNIC's precise network measurement capability to 40 GbE environment. 40 GbE in essence uses four 10 GbE links to achieve high bandwidth. Students are required to design and implement the physical layer of 40 GbE in the software. This project is challenging because of the line speed requirement is much stricter than 10 GbE. Router simulator In this project, students will build a simulated router that behaves following the router model discussed in this paper. Students will be given data collected from various experiments on a commodity router, and will build a software router simulator that follows both data and theoretical model given an input stream and external traffic. Removing variability in interpacket spacing in commodity servers Commodity network interface cards and servers are normally optimized to support high-bandwidth and throughput. In particular, they often batch multiple packets together in order to achieve high-bandwidth. This optimization is fine for most of applications, but causes changes in interpacket spacing which is cruel for network measurements, or detecting covert timing channels. In this project, students will look into interrupt coalescing and cpu scheduling in order to reduce variability in interpacket spacing while achieving high bandwidth in 10 GbE. SoNIC and NetFPGA SoNIC on NetFPGA (requires FPGA programming experience) NetFPGA is a general network research platform. The board is equipped with four SFP+ modules (10 GbE cables can be plug into SFP+ ports). The project will enable the precise network measurement capability over NetFPGA. Students are mainly required to develop and debug DMA engine that transfer data between NetFPGA and the host memory as well as porting SoNIC implementation to Xilinx FPGA. Building low latency NIC with NetFPGA Cloud computing applications, such as RAMcloud resides mostly in the memory, and requires low request and response latency for high performance. Hyperdex, a state of art key value store, can also benefit from a low latency network stack. Recently, there has been a proposal to minimizing the latency of network stack by designing a low latency NIC using NetFPGA. This project will try to reproduce the result of NIQ and use NIQ to evaluate the performance improvement on modern cloud applications. The students will be exposed to the NetFPGA platform as well as the modern scale-out cloud applications. We will measure and analyze the benefits of low latency network interface on the overall system performance. Building openflow switch with NetFPGA NetFPGA platform is a research platform for networking research communities. One of the contributed project is an openflow switch. In this project we will look at how to build and test Openflow Switch on a NetFPGA platform. We will also implement an SDN controller program that interactive the prototype openflow controller and modify the internal of openflow switch to experiment with DCTCP. Building network tester with NetFPGA Precise network testing instruments are expensive and closed-source. There is an open-source alternative, OSNT , which have received a lot of attentions from the research community this year. In this project, we will build a network tester using OSNT and perform various of network measurement tasks with it. Students will work with linux kernel networking stack and NetFPGA platform. They will also be exposed to top of rack switches and network measurement applications. SDN related (requires openflow specfication, mininet) An Openflow-based Network Measurement Service Network functional virtualization is a popular paradigm to migrate network appliance to the cloud. In this project, we will investigate a path to migrate network measurement service to the cloud by implementing a openflow based network measurement middlebox. The students will implement a network measurement service, such as topology discovery or bandwidth estimation, which can be chained together with other network functions. We will work on kernel networking and the SoNIC platform. Software routers and Netslice related TCP accelerator Profiling and Improving NetSlice (requires kernel module, netperf) Scaling Netslice to 40G and beyond is challenging because of contention in cache and memory. In this project, students will profile the bottleneck of Netslice to figure out what prevents it from scaling up to 40G, and improve / optimize the system to scale beyond 40G. A study of Intel DPDK performance with Virtual Machines Intel DPDK is becoming increasingly popular in industry to provide low latency and high throughput network stack. An example is the newly added support for Intel DPDK support in Openvswitch. This project will study the performance of intel DPDK in a virtualized network environment. We will compare and contrast intel dpdk with existing kernel network stack to have a deeper understanding on the origin of these performance gains. The student will work on kernel networking stack and Intel NICs. A comparison of Netmap and Intel DPDK architecture Netmap and Intel DPDK are the two most popular userspace network framework available in industry. Netmap is available in FreeBSD and Intel DPDK is available in Linux. In this project, we will look deep into the architecture of these two frameworks and study the differences between them and learn how to design a high performance user space networking framework. In addition, we will build on top of intel DPDK or netmap to evaluate the performance of software vswitches that use these technologies. The students will be exposed to kernel networking and linux performance profiling. SoNIC and Datacenter Network related project ideas A study of current software and hardware traffic pacing using SoNIC (requires kernel programming experience and possibly using the NetFPGA) The datacenter network is increasingly shared among multiple untrusted tenants and many hundreds of different services. In order to provide fine-grained control for bandwidth partitioning among tenants, software-based and hardware-based rate limiters are often used. This project will study the effectiveness and accuracy of existing software-based and hardware-based rate limiters; namely, hierarchical token buckets (HTB) and EyeQ . The goal is to compare and contrast these two approaches and evaluate them. Students will have to the chance to develop in the linux kernel and on NetFPGA platform. A study of current network switch buffer allocation scheme using SoNIC Current datacenter networks use commodity switches as a building block for their network. A lot of these commodity switches share a common architecture (shared buffer memory) and use the same internal chipset (Broadcom Trident II). This project will study the shared buffer architecture in a commodity switch chipset and its effects on traffic between multiple switch ports. We will try to reproduce the result from Planck and evaluate their results with SoNIC. Students will study how the end-host network stack latency and batching can affect the accuracy of network measurements. Students will work on 10 GbE top of rack switches as well as use the SoNIC platform. A study of network bandwidth estimation algorithms using SoNIC Current datacenter network lacks an accurate method to estimate available bandwidth inside the network. There have been a lot of proposals for available bandwidth estimation, but most of them are hindered by the poor control on packet timing at the end host to be effective. MinProbe proposed to use SoNIC to provide accurate packet timing control to improve the state-of-art bandwidth estimation algorithms. This project will attempt to apply SoNIC to various of existing bandwidth estimation algorithm (Pathload, Pathchirp, etc) to compare and contrast the accuracy of these algorithm, and looking for new algorithms for bandwidth estimation. The students will work on linux userspace and kernel programming as well as use the SoNIC FPGA platform. Supercloud-related project ideas SuperCloud networking optimization SuperCloud uses tinc VPN as the underlying networking solution for the communications between management nodes and VMs. However, currently tinc can only achieve 20% throughput of the physical link. In this project, students will install and test tinc VPN, try to find its performance bottleneck, and port it into the kernel space if possible. The goal is to get a performance as close as possible to the physical network. SuperCloud storage measurement study SuperCloud is built on top of XenServer and OpenStack. Besides NFS storage, they also support some other shared storage for enabling VM migration, such as iSCSI. A VM can also attach to a local storage volume to achieve a higher disk performance. In this project, students will setup and enable different storage options for SuperCloud, and use a set of benchmarks to conduct a measurement study of different types of storage (NFS, iSCSI, and local volume etc.). SuperCloud VM migration optimization SuperCloud allows a VM to be live migrated across different cloud providers. Currently the migration suffers from a long VM downtime (more than 30 seconds when migrating an idle VM with only 64MB memory). In this project, students will study the VM migration mechanism, try to understand the bottleneck causing the long downtime, and further improve it. Also, student groups should investigate a migration policy SuperCloud smart VM placement: mechanism and policy SuperCloud uses OpenStack as the resource management platform. Although OpenStack has a placement engine that can work with different scheduling algorithms, it is not designed for a cloud that spans across a wide area and has a complicated network/storage topology. In this project, students will modify the placement engine in OpenStack to consider the scenario of deploying VMs to multiple clouds, and show the benefit of this modification by testing it with at least one application.