projects

Here are some of the projects that I’ve done in the past.

Fast Replica Coordination with Zip

Distributed Systems, RDMA, Consensus, SMR

State Machine Replication (SMR) is an essential abstraction for building reliable distributed applications. Existing protocols that implement SMR have high-overheads in terms of both throughput and latency, preventing efficient utilization of modern data center networking hardware. We propose a new protocol, Zip, that provides the same abstraction as SMR to the clients of a replicated service, but performs replication in 1-RTT in the absence of failures and asynchrony, using a combination of speculative agreement and off-path ordering. We also built two applications, Ziplog, a shared log that provides high-throughput and low-latency and Zipkvs, a state-of-the-art transactional key-value store that use Zip as the underlying replication protocol.

This work is under submission at SOSP '25.
Understanding Host Network Stack Overheads

Systems, Networking

Adoption of high-bandwidth access links (100 Gbps and beyond) in the data center has led to a shift in the bottlnecks from the network core to end-host processing. We perform exhaustive benchmarking of the Linux network stack and analyse several metrics like throughput, CPU utilisation, cache miss rate, to understand (1) the impact of offloads present in commodity NICs on performance, and (2) the causes of overheads in the existing network stacks; and provide recommendations for the design of future transport protocols, network stacks, and network hardware.

I talked about this project at NetDev 0x15. It was also published at SIGCOMM '21.
Gandiva

Systems for Deep Learning, Scheduling, Fairness

Gandiva is a Cluster Scheduler for Deep Learning utilising CPU Scheduling-like primitives eg. timeslicing at minute-scales, and migration to schedule GPUs efficiently by providing coarse-grained GPU sharing across jobs. Gandiva_fair is a scheduler built on top of Gandiva to provide cluster level fair share of GPU throughput. It also uses differential speedups obtained by different models on heterogeneous GPU architectures and an automatic trading policy based on second-price auctions to improve overall cluster throughput.

This work was published at EuroSys '20. This research was also incorporated in Microsoft's Project Singularity .

ltdp-viterbi-algorithm

C++, Parallel Programming, Dynamic Programming, Viterbi Algorithm

An LTDP parallelisation for the Viterbi algorithm, based on Maleki et. al.
traycer-hs

Haskell, Graphics

A ray tracing image rendering system in pure Haskell.
simple_ra

C++, DBMS, Programming Language

An interpreter for the Relational Algebra with it's own relational database management system.
simple-scheme

Haskell, Programming Language

A simple Scheme (R5RS) implementation in Haskell.

projects

Fast Replica Coordination with Zip

Understanding Host Network Stack Overheads

Gandiva

ltdp-viterbi-algorithm

traycer-hs

simple_ra

simple-scheme