Here are some of the projects that I’ve done in the past.
Distributed Systems, RDMA
Distributed Shared Log is a popular abstraction for building consistent, high throughput and fault-tolerant distributed applications. Existing protocols like Scalog provide scalability and high throughput, but at the cost of high latency. Meanwhile, NoPaxos provides low latency but at the cost of not being scalable (requiring a packet sequencer). Lastly, other implementations like Kafka provide high throughput and low latency, at the cost of not providing a global total order across partitions. Our protocol bridges the gap using a new kind of consensus protocol that achieves all three goals: high throughput (i.e scalability), low latency, and total ordering while being crash fault-tolerant by combining batching and pre-ordering.
Adoption of high-bandwidth access links (100 Gbps and beyond) in the data center has led to a shift in the bottlnecks from the network core to end-host processing. We perform exhaustive benchmarking of the Linux network stack and analyse several metrics like throughput, CPU utilisation, cache miss rate, to understand (1) the impact of offloads present in commodity NICs on performance, and (2) the causes of overheads in the existing network stacks; and provide recommendations for the design of future transport protocols, network stacks, and network hardware.
I talked about this project at NetDev 0x15.
Systems for Deep Learning, Scheduling, Fairness
Gandivafair is a scheduler built on top of Gandiva to provide cluster level fair share of GPU throughput. It also uses differential speedups obtained by different models on heterogeneous GPU architectures and an automatic trading policy based on second-price auctions to improve overall cluster throughput.
Systems for Deep Learning, Resource Management
Gandiva is a Cluster Scheduler for Deep Learning utilising CPU Scheduling-like primitives eg. timeslicing at minute-scales, and migration to schedule GPUs efficiently by providing coarse-grained GPU sharing across jobs.
C++, Parallel Programming, Dynamic Programming, Viterbi Algorithm
An LTDP parallelisation for the Viterbi algorithm, based on Maleki et. al.
A ray tracing image rendering system in pure Haskell.
C++, DBMS, Programming Language
An interpreter for the Relational Algebra with it's own relational database management system.
Haskell, Programming Language
A simple Scheme (R5RS) implementation in Haskell.
A Publication Portal (IMS) written in Django and Python.