Networks that never drop packets

Need for Lossless Networks

  • High Performance w/ Low CPU overhead: At today’s bandwidths (40-100 Gbps), OS has become a bottleneck. Lossless networks require much simpler transport & allow easier hardware offload bypassing kernel, providing high throughput and ultra-low latency at minimal CPU overhead.

  • Eliminating Large & Unexpected (Tail) Latencies: Packet drops and large unbounded queueing adversely impact the latency, especially degrading performance for interactive applications, with tail latencies that can be orders of magnitudes larger than the median.

  • Enabling Next-Gen Infrastructure: Reliable low-latency and high throughput, enabled by lossless fabrics, are essential to realize ongoing datacenter trends like high speed remote I/O, remote memory & resource disaggregation.

Existing Lossless Mechanisms Are Insufficient

  • Distributed Schemes – Credit-based flow control (Infiniband, QuickPath, PCIe, etc) and PFC
    • Scalable to datacenter topologies
    • No throughput guarantees, can lead to congestion collapse
    • Other known associated problems (HOL blocking, deadlocks, congestion spreading, etc)
  • Centralized Scheme – Fastpass
    • Worst-case throughput guarantees
    • Not scalable

The goal of this project is to design network fabrics (and end-host stacks) for datacenter topologies that guarantee, for arbitrary input workloads:

  1. Zero packet drops in the network;
  2. Near-optimal network utilization; and
  3. Scalable, decentralized design

Current Problems

Designing network fabrics with the following properties:

1. Bounded Queueing w/ Throughput Guarantees

  • Loco logically decomposes tree-topology to multiple single-switches, each scheduled independently
  • Scheduling performed at each logical switch via computing graph matchings, providing near-optimal utilization
  • Clean slate design, implemented using FPGAs

2. Zero Queueing w/ Throughput Guarantees

  • DZQ provides even stronger guarantees – deterministic zero-queueing in network switches
  • Admission control perfomed using techniques from graph matching and edge-coloring
  • Online, fully distributed mechanism, implementable using available programmable switches

3. Zero Queueing w/ Throughput Guarantees w/ Commodity Hardware

Currently trying to solve the problem of ensuring deterministic zero-queueing without any support from the network.

Papers

Download

git clone https://github.com/sakshamagarwals/lossless

(Will be available here soon..)

Members