Here are some of the projects that I’ve done in the past.

  • Coordination-Free Distributed Shared Log

    Distributed Systems, RDMA

    Distributed Shared Log is a popular abstraction for building consistent, high throughput and fault-tolerant distributed applications. Existing protocols like Scalog provide scalability and high throughput, but at the cost of high latency. Meanwhile, NoPaxos provides low latency but at the cost of not being scalable (requiring a packet sequencer). Lastly, other implementations like Kafka provide high throughput and low latency, at the cost of not providing a global total order across partitions. Our protocol bridges the gap using a new kind of consensus protocol that achieves all three goals: high throughput (i.e scalability), low latency, and total ordering while being crash fault-tolerant by combining batching and pre-ordering.

  • Understanding Host Network Stack Overheads

    Systems, Networking

    Adoption of high-bandwidth access links (100 Gbps and beyond) in the data center has led to a shift in the bottlnecks from the network core to end-host processing. We perform exhaustive benchmarking of the Linux network stack and analyse several metrics like throughput, CPU utilisation, cache miss rate, to understand (1) the impact of offloads present in commodity NICs on performance, and (2) the causes of overheads in the existing network stacks; and provide recommendations for the design of future transport protocols, network stacks, and network hardware.

    I talked about this project at NetDev 0x15.

  • Gandivafair

    Systems for Deep Learning, Scheduling, Fairness

    Gandivafair is a scheduler built on top of Gandiva to provide cluster level fair share of GPU throughput. It also uses differential speedups obtained by different models on heterogeneous GPU architectures and an automatic trading policy based on second-price auctions to improve overall cluster throughput.

  • Gandiva

    Systems for Deep Learning, Resource Management

    Gandiva is a Cluster Scheduler for Deep Learning utilising CPU Scheduling-like primitives eg. timeslicing at minute-scales, and migration to schedule GPUs efficiently by providing coarse-grained GPU sharing across jobs.

  • ltdp-viterbi-algorithm

    C++, Parallel Programming, Dynamic Programming, Viterbi Algorithm

    An LTDP parallelisation for the Viterbi algorithm, based on Maleki et. al.

  • traycer-hs

    Haskell, Graphics

    A ray tracing image rendering system in pure Haskell.

  • simple_ra

    C++, DBMS, Programming Language

    An interpreter for the Relational Algebra with it's own relational database management system.

  • simple-scheme

    Haskell, Programming Language

    A simple Scheme (R5RS) implementation in Haskell.

  • publications_portal

    Python, Django, HTML, CSS, JavaScript

    A Publication Portal (IMS) written in Django and Python.