projects
Here are some of the projects that I’ve done in the past.
-
Fast Replica Coordination with Zip
Distributed Systems, RDMA, Consensus, SMRState Machine Replication (SMR) is an essential abstraction for building reliable distributed applications. Existing protocols that implement SMR have high-overheads in terms of both throughput and latency, preventing efficient utilization of modern data center networking hardware. We propose a new protocol, Zip, that provides the same abstraction as SMR to the clients of a replicated service, but performs replication in 1-RTT in the absence of failures and asynchrony, using a combination of speculative agreement and off-path ordering. We also built two applications, Ziplog, a shared log that provides high-throughput and low-latency and Zipkvs, a state-of-the-art transactional key-value store that use Zip as the underlying replication protocol.
This work is under submission at SOSP '25. -
Understanding Host Network Stack Overheads
Systems, NetworkingAdoption of high-bandwidth access links (100 Gbps and beyond) in the data center has led to a shift in the bottlnecks from the network core to end-host processing. We perform exhaustive benchmarking of the Linux network stack and analyse several metrics like throughput, CPU utilisation, cache miss rate, to understand (1) the impact of offloads present in commodity NICs on performance, and (2) the causes of overheads in the existing network stacks; and provide recommendations for the design of future transport protocols, network stacks, and network hardware.
I talked about this project at NetDev 0x15. It was also published at SIGCOMM '21. -
Gandiva
Systems for Deep Learning, Scheduling, FairnessGandiva is a Cluster Scheduler for Deep Learning utilising CPU Scheduling-like primitives eg. timeslicing at minute-scales, and migration to schedule GPUs efficiently by providing coarse-grained GPU sharing across jobs. Gandivafair is a scheduler built on top of Gandiva to provide cluster level fair share of GPU throughput. It also uses differential speedups obtained by different models on heterogeneous GPU architectures and an automatic trading policy based on second-price auctions to improve overall cluster throughput.
This work was published at EuroSys '20. This research was also incorporated in Microsoft's Project Singularity. -
ltdp-viterbi-algorithm
C++, Parallel Programming, Dynamic Programming, Viterbi AlgorithmAn LTDP parallelisation for the Viterbi algorithm, based on Maleki et. al.
-
traycer-hs
Haskell, GraphicsA ray tracing image rendering system in pure Haskell.
-
simple_ra
C++, DBMS, Programming LanguageAn interpreter for the Relational Algebra with it's own relational database management system.
-
simple-scheme
Haskell, Programming LanguageA simple Scheme (R5RS) implementation in Haskell.