Traditionally, datacenters have been built around server-centric architectures where each server tightly couples the compute and storage resources on the same motherboard. However, the end of Dennard's scaling and the slowdown of Moore's Law have led to surfacing of several fundamental limitations of such server-centric architectures. As a result, a new disaggregated architecture is emerging, where resources are built as individual blades and a network fabric interconnects these blades. Such disaggregated architectures break the many fundamental assumptions made in design and optimization of existing systems, networks and applications. We are working on re-architecting systems and applications for this emerging computing paradigm.
Web services today collect an immense amount of private data about their users. This private data is then used to support wide variety of interactive queries. Unfortunately, existing systems are able to achieve only one of the two desirable properties: data confidentiality or query interactivity.
We are building new systems that have the potential to avoid leakage of user's private data while maintaining query interactivity. To achieve this, we have designing new techniques that exploit our prior work on interactive queries on compressed data, and extending these techniques to also support data confidentiality for a wide range of real-world queries.
Results: MiniCrypt [EuroSys'17]
Today's networks are increasingly complex. And yet, we continue to push more complexity into these networks -- using in-network scheduling techniques, in-network monitoring techniques, in-network sampling techniques, dynamic programmability, etc. Are all these in-network techniques really necessary? Could we keep the core of the network as dumb as possible and keep all the complexity at the edge?
We are exploring the above questions in the context of network scheduling, network monitoring and network debugging.
Results: SwitchP[NSDI'18], PathDump [OSDI'16], UPS [NSDI'16], pHost [CoNext'15], CherryPick [SOSR'15], Anteater [SIGCOMM'11]
RAM access latencies are still 100x better than SSD access latencies. Thus, in-memory query execution is one of the keys to achieving query interactivity. However, executing queries in main memory is increasingly challenging since data sizes are growing faster than Moore's Law.
This project developed, analyzed and implemented a distributed data store named Succinct that supports functionality comparable to state-of-the-art NoSQL stores and yet, enables query interactivity for an order of magnitude larger data sizes than what is possible today (or, alternatively, up to two orders of magnitude faster queries at scale). To achieve this, Succinct uses new techniques to execute a wide range of queries -- e.g., search, range, and even regular expressions -- directly on compressed data. Succinct achieves scale by storing the input data in a compressed form, and interactivity by avoiding data scans and data decompression.
Succinct is open-sourced, and is already being adopted in production clusters of several large-scale web services.
Results: ZipG [SIGMOD'17], BlowFish [NSDI'16], Succinct [NSDI'15]
In News: Datanami, GigaOm, Adtmag, AMPLab, Databricks
Many web services today personalize user query responses by using user’s social network. A fundamental primitive in these queries is computation of distance between node pairs in the social graph.
This project developed, analyzed and implemented new algorithms that leverage the sparsity in real-world graphs to achieve the first theoretical improvement over several decade-old results for this problem. For such real-world graphs, we showed that classic lower bounds established in theoretical computer science are far from optimal and that new algorithms can achieve all the desirable properties: better bounds on approximation factor (good quality of results), low storage overhead and low query latency in practice.
The techniques and algorithms developed in this project have already played a key role in design of new memory-efficient data structures for dynamic graphs and efficient routing schemes on real-world networks.
Results: Stretch less than 2 [ESA'14, SODA'13], BGP [ToN'14], Stretch 2 [PODC'13], In Practice [WOSN'12], Linear Space [INFOCOM'11]