Summary

Consider a system consisting of a relatively small number of hosts in a large network (like the Internet). We would like to know the properties of every pairwise path between the hosts in the system, but we would prefer not to have to measure all such paths. Somewhat surprisingly, this is possible. Many path metrics, such as latency, jitter, and (log) packet transmission success rates, can be written as sums of corresponding measures on the edges that make up the paths. Therefore, we can write a linear system relating row properties to path properties via a path matrix. The rank of the path matrix is typically much lower than the number of end-to-end paths. Thus, we can use linear dependencies between rows of the path matrix to infer properties of all the paths from a relatively small number of measurements. This path matrix also serves as the basis of inference about properties of individual links, when such inference is possible. Our current work involves building fast algorithms for factoring path matrices based on understanding the relation between the linear algebraic structure and the structure of the underlying network routing.

Papers

Y. Zhao, Y. Chen, and D. Bindel, “Towards Unbiased End-to-End Network Diagnosis,” IEEE/ACM Transactions on Networking, vol. 17, no. 6, pp. 1724–1737, Dec. 2009.
@article{2009-tons,
  author = {Zhao, Yao and Chen, Yan and Bindel, David},
  title = {Towards Unbiased End-to-End Network Diagnosis},
  journal = {IEEE/ACM Transactions on Networking},
  volume = {17},
  number = {6},
  pages = {1724--1737},
  month = dec,
  year = {2009},
  doi = {10.1109/TNET.2009.2022158}
}

Abstract:

Internet fault diagnosis is extremely important for end-users, overlay network service providers (like Akamai ), and even Internet service providers (ISPs). However, because link-level properties cannot be uniquely determined from end-to-end measurements, the accuracy of existing statistical diagnosis approaches is subject to uncertainty from statistical assumptions about the network. In this paper, we propose a novel least-biased end-to-end network diagnosis (in short, LEND) system for inferring link-level properties like loss rate. We define a minimal identifiable link sequence (MILS) as a link sequence of minimal length whose properties can be uniquely identified from end-to-end measurements. We also design efficient algorithms to find all the MILSs and infer their loss rates for diagnosis. Our LEND system works for any network topology and for both directed and undirected properties and incrementally adapts to network topology and property changes. It gives highly accurate estimates of the loss rates of MILSs, as indicated by both extensive simulations and Internet experiments. Furthermore, we demonstrate that such diagnosis can be achieved with fine granularity and in near real-time even for reasonably large overlay networks. Finally, LEND can supplement existing statistical inference approaches and provide smooth tradeoff between diagnosis accuracy and granularity.

Y. Chen, D. Bindel, H. Song, B. Chavez, and R. Katz, “Algebra-Based Scalable Overlay Network Monitoring: Algorithms, Evaluation, and Applications,” ACM Transactions on Networking, vol. 15, no. 5, pp. 1084–1097, Oct. 2007.
@article{2007-tons,
  author = {Chen, Yan and Bindel, David and Song, Hanhee and Chavez, Brian and Katz, Randy},
  title = {Algebra-Based Scalable Overlay Network Monitoring:
             Algorithms, Evaluation, and Applications},
  journal = {ACM Transactions on Networking},
  volume = {15},
  number = {5},
  pages = {1084--1097},
  month = oct,
  year = {2007},
  doi = {10.1109/TNET.2007.896251}
}

Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with end hosts, existing systems either require measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract [Y. Chen, D. Bindel, and R. H. Katz, “Tomography-based overlay network monitoring,” Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC), 2003] briefly proposes an algebraic approach that selectively monitors linearly independent paths that can fully describe all the paths. The loss rates and latency of these paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal. In this paper, we improve, implement, and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of $k$ is bounded as $O(n \log n)$, ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, iv) topology measurement error handling, and v) design and implementation of an adaptive streaming media system as a representative application. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.

Y. Zhao, Y. Chen, and D. Bindel, “Toward Unbiased End-to-End Network Diagnosis,” in Proceedings of SIGCOMM 2006, 2006, pp. 219–230.
@inproceedings{2006-sigcomm,
  author = {Zhao, Yao and Chen, Yan and Bindel, David},
  title = {Toward Unbiased End-to-End Network Diagnosis},
  booktitle = {Proceedings of SIGCOMM 2006},
  pages = {219--230},
  year = {2006},
  doi = {10.1145/1151659.1159939}
}

Abstract:

Internet fault diagnosis is extremely important for end users, overlay network service providers (like Akamai [1]) and even Internet service providers (ISPs). However, because link-level properties cannot be uniquely determined from end-to-end measurements, the accuracy of existing statistical diagnosis approaches is subject to uncertainty from statistical assumptions about the network. In this paper, we propose a novel Least-biased End-to-end Network Diagnosis (in short, LEND) system for inferring link-level properties like loss rate. We define a minimal identifiable link sequence (MILS) as a link sequence of minimal length whose properties can be uniquely identified from end-to-end measurements. We also design efficient algorithms to find all the MILSes and infer their loss rates for diagnosis. Our LEND system works for any network topology and for both directed and undirected properties, and incrementally adapts to network topology and property changes. It gives highly accurate estimates of the loss rates of MILSes, as indicated by both extensive simulations and Internet experiments. Furthermore, we demonstrate that such diagnosis can be achieved with fine granularity and in near real-time even for reasonably large overlay networks. Finally, LEND can supplement existing statistical inference approaches and provide smooth tradeoff between diagnosis accuracy and granularity.

Y. Zhao, Y. Chen, and D. Bindel, “Toward Deterministic Overlay Diagnosis,” in ACM SIGMETRICS/Performance 2006, 2006, pp. 387–388.
@inproceedings{2006-sigmetrics,
  author = {Zhao, Yao and Chen, Yan and Bindel, David},
  title = {Toward Deterministic Overlay Diagnosis},
  booktitle = {ACM SIGMETRICS/Performance 2006},
  pages = {387--388},
  year = {2006},
  doi = {10.1145/1140277.1140333}
}
Y. Chen, D. Bindel, H. Song, and R. Katz, “An Algebraic Approach to Practial and Scalable Overlay Network Monitoring,” in Proceedings of ACM SIGCOMM 2004, 2004.
@inproceedings{2004-sigcomm,
  author = {Chen, Yan and Bindel, David and Song, Hanhee and Katz, Randy},
  title = {An Algebraic Approach to Practial and Scalable
             Overlay Network Monitoring},
  booktitle = {Proceedings of ACM SIGCOMM 2004},
  year = {2004},
  doi = {10.1145/1015467.1015475}
}

Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract briefly proposes an algebraic approach that selectively monitors $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. The loss rates and latency of these $k$ paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal.In this paper, we improve, implement and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of $k$ is bounded as $O(n \log n)$, ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, and iv) topology measurement error handling. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.

Y. Chen, D. Bindel, and R. Katz, “Tomography-Based Overlay Network Monitoring,” in Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC), 2003.
@inproceedings{2003-imc,
  author = {Chen, Yan and Bindel, David and Katz, Randy},
  title = {Tomography-Based Overlay Network Monitoring},
  booktitle = {Proceedings of ACM SIGCOMM Internet
                 Measurement Conference (IMC)},
  year = {2003},
  doi = {10.1145/948205.948233}
}

Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Unlike other network tomography systems, we characterize end-to-end losses (this extends to any additive metrics, including latency) rather than individual link losses. We find a minimal basis set of $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. We selectively monitor and measure the loss rates of these paths, then apply them to estimate the loss rates of all other paths. By extensively studying synthetic and real topologies, we find that for reasonably large n (e.g., 100), $k$ is only in the range of $O(n \log n)$. This is explained by the moderately hierarchical nature of Internet routine.Our scheme only assumes the knowledge of underlying IP topology, and any link can become lossy or return to normal. In addition, our technique is tolerant to topology measurement inaccuracies, and is adaptive to topology changes.

Y. Chen, D. Bindel, H. Song, and R. Katz, “Tomography-Based Overlay Network Monitoring,” UC Berkeley Computer Science Division, CSD-03-1252, Jun. 2003. Appeared in SIGCOMM IMC 2003
@techreport{2003-tomography-tr,
  author = {Chen, Yan and Bindel, David and Song, Hanhee and Katz, Randy},
  title = {Tomography-Based Overlay Network Monitoring},
  number = {CSD-03-1252},
  institution = {UC Berkeley Computer Science Division},
  month = jun,
  year = {2003},
  status = {unrefereed},
  submit = {Appeared in SIGCOMM IMC 2003}
}

Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Unlike other network tomography systems, we characterize end-to-end losses (this extends to any additive metrics, including latency) rather than individual link losses. We find a minimal basis set of $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. We selectively monitor and measure the loss rates of these paths, then apply them to estimate the loss rates of all other paths. By extensively studying synthetic and real topologies, we find that for reasonably large n (e.g., 100), $k$ is only in the range of $O(n \log n)$. This is explained by the moderately hierarchical nature of Internet routine.Our scheme only assumes the knowledge of underlying IP topology, and any link can become lossy or return to normal. In addition, our technique is tolerant to topology measurement inaccuracies, and is adaptive to topology changes.

Talks

Matrix Factorizations for Computer Network Tomography

Householder Symposium
tomographymeeting external invited plenary

Matrix Factorizations for Computer Network Tomography

NYU NA Seminar
tomographyseminar external invited

Linear Algebra for Network Loss Characterization

UC Berkeley LAPACK Seminar
tomographyseminar local

Linear Algebra for Network Loss Characterization

UC Berkeley LAPACK seminar
tomographyseminar local