### Summary

Consider a system consisting of a relatively small number of hosts in
a large network (like the Internet). We would like to know the
properties of *every* pairwise path between the hosts in the system,
but we would prefer not to have to measure all such paths. Somewhat
surprisingly, this is possible. Many path metrics, such as latency,
jitter, and (log) packet transmission success rates, can be written as
sums of corresponding measures on the edges that make up the paths.
Therefore, we can write a linear system relating row properties to
path properties via a *path* matrix. The rank of the path matrix is
typically much lower than the number of end-to-end paths. Thus, we
can use linear dependencies between rows of the path matrix to infer
properties of all the paths from a relatively small number of
measurements. This path matrix also serves as the basis of inference
about properties of individual links, when such inference is possible.
Our current work involves building fast algorithms for factoring path
matrices based on understanding the relation between the linear
algebraic structure and the structure of the underlying network
routing.

### Papers

*IEEE/ACM Transactions on Networking*, vol. 17, no. 6, pp. 1724–1737, Dec. 2009.

@article{2009-tons, author = {Zhao, Yao and Chen, Yan and Bindel, David}, title = {Towards Unbiased End-to-End Network Diagnosis}, journal = {IEEE/ACM Transactions on Networking}, volume = {17}, number = {6}, pages = {1724--1737}, month = dec, year = {2009}, doi = {10.1109/TNET.2009.2022158} }

#### Abstract:

Internet fault diagnosis is extremely important for end-users, overlay network service providers (like Akamai ), and even Internet service providers (ISPs). However, because link-level properties cannot be uniquely determined from end-to-end measurements, the accuracy of existing statistical diagnosis approaches is subject to uncertainty from statistical assumptions about the network. In this paper, we propose a novel least-biased end-to-end network diagnosis (in short, LEND) system for inferring link-level properties like loss rate. We define a minimal identifiable link sequence (MILS) as a link sequence of minimal length whose properties can be uniquely identified from end-to-end measurements. We also design efficient algorithms to find all the MILSs and infer their loss rates for diagnosis. Our LEND system works for any network topology and for both directed and undirected properties and incrementally adapts to network topology and property changes. It gives highly accurate estimates of the loss rates of MILSs, as indicated by both extensive simulations and Internet experiments. Furthermore, we demonstrate that such diagnosis can be achieved with fine granularity and in near real-time even for reasonably large overlay networks. Finally, LEND can supplement existing statistical inference approaches and provide smooth tradeoff between diagnosis accuracy and granularity.

*ACM Transactions on Networking*, vol. 15, no. 5, pp. 1084–1097, Oct. 2007.

@article{2007-tons, author = {Chen, Yan and Bindel, David and Song, Hanhee and Chavez, Brian and Katz, Randy}, title = {Algebra-Based Scalable Overlay Network Monitoring: Algorithms, Evaluation, and Applications}, journal = {ACM Transactions on Networking}, volume = {15}, number = {5}, pages = {1084--1097}, month = oct, year = {2007}, doi = {10.1109/TNET.2007.896251} }

#### Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with end hosts, existing systems either require measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract [Y. Chen, D. Bindel, and R. H. Katz, “Tomography-based overlay network monitoring,” Proceedings of the ACM SIGCOMM Internet Measurement Conference (IMC), 2003] briefly proposes an algebraic approach that selectively monitors linearly independent paths that can fully describe all the paths. The loss rates and latency of these paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal. In this paper, we improve, implement, and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of $k$ is bounded as $O(n \log n)$, ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, iv) topology measurement error handling, and v) design and implementation of an adaptive streaming media system as a representative application. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.

*Proceedings of SIGCOMM 2006*, 2006, pp. 219–230.

@inproceedings{2006-sigcomm, author = {Zhao, Yao and Chen, Yan and Bindel, David}, title = {Toward Unbiased End-to-End Network Diagnosis}, booktitle = {Proceedings of SIGCOMM 2006}, pages = {219--230}, year = {2006}, doi = {10.1145/1151659.1159939} }

#### Abstract:

Internet fault diagnosis is extremely important for end users, overlay
network service providers (like Akamai [1]) and even Internet service
providers (ISPs). However, because link-level properties cannot be
uniquely determined from end-to-end measurements, the accuracy of
existing statistical diagnosis approaches is subject to uncertainty
from statistical assumptions about the network. In this paper, we
propose a novel *Least-biased End-to-end Network Diagnosis* (in short,
LEND) system for inferring link-level properties like loss rate. We
define a *minimal identifiable link sequence* (MILS) as a link sequence
of minimal length whose properties can be uniquely identified from
end-to-end measurements. We also design efficient algorithms to find
all the MILSes and infer their loss rates for diagnosis. Our LEND
system works for any network topology and for both directed and
undirected properties, and incrementally adapts to network topology
and property changes. It gives highly accurate estimates of the loss
rates of MILSes, as indicated by both extensive simulations and
Internet experiments. Furthermore, we demonstrate that such diagnosis
can be achieved with fine granularity and in near real-time even for
reasonably large overlay networks. Finally, LEND can supplement
existing statistical inference approaches and provide smooth tradeoff
between diagnosis accuracy and granularity.

*ACM SIGMETRICS/Performance 2006*, 2006, pp. 387–388.

*Proceedings of ACM SIGCOMM 2004*, 2004.

@inproceedings{2004-sigcomm, author = {Chen, Yan and Bindel, David and Song, Hanhee and Katz, Randy}, title = {An Algebraic Approach to Practial and Scalable Overlay Network Monitoring}, booktitle = {Proceedings of ACM SIGCOMM 2004}, year = {2004}, doi = {10.1145/1015467.1015475} }

#### Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Our earlier extended abstract briefly proposes an algebraic approach that selectively monitors $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. The loss rates and latency of these $k$ paths can be used to estimate the loss rates and latency of all other paths. Our scheme only assumes knowledge of the underlying IP topology, with links dynamically varying between lossy and normal.In this paper, we improve, implement and extensively evaluate such a monitoring system. We further make the following contributions: i) scalability analysis indicating that for reasonably large n (e.g., 100), the growth of $k$ is bounded as $O(n \log n)$, ii) efficient adaptation algorithms for topology changes, such as the addition or removal of end hosts and routing changes, iii) measurement load balancing schemes, and iv) topology measurement error handling. Both simulation and Internet experiments demonstrate we obtain highly accurate path loss rate estimation while adapting to topology changes within seconds and handling topology errors.

*Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC)*, 2003.

@inproceedings{2003-imc, author = {Chen, Yan and Bindel, David and Katz, Randy}, title = {Tomography-Based Overlay Network Monitoring}, booktitle = {Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC)}, year = {2003}, doi = {10.1145/948205.948233} }

#### Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Unlike other network tomography systems, we characterize end-to-end losses (this extends to any additive metrics, including latency) rather than individual link losses. We find a minimal basis set of $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. We selectively monitor and measure the loss rates of these paths, then apply them to estimate the loss rates of all other paths. By extensively studying synthetic and real topologies, we find that for reasonably large n (e.g., 100), $k$ is only in the range of $O(n \log n)$. This is explained by the moderately hierarchical nature of Internet routine.Our scheme only assumes the knowledge of underlying IP topology, and any link can become lossy or return to normal. In addition, our technique is tolerant to topology measurement inaccuracies, and is adaptive to topology changes.

@techreport{2003-tomography-tr, author = {Chen, Yan and Bindel, David and Song, Hanhee and Katz, Randy}, title = {Tomography-Based Overlay Network Monitoring}, number = {CSD-03-1252}, institution = {UC Berkeley Computer Science Division}, month = jun, year = {2003}, status = {unrefereed}, submit = {Appeared in SIGCOMM IMC 2003} }

#### Abstract:

Overlay network monitoring enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with $n$ end hosts, existing systems either require $O(n^2)$ measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. Unlike other network tomography systems, we characterize end-to-end losses (this extends to any additive metrics, including latency) rather than individual link losses. We find a minimal basis set of $k$ linearly independent paths that can fully describe all the $O(n^2)$ paths. We selectively monitor and measure the loss rates of these paths, then apply them to estimate the loss rates of all other paths. By extensively studying synthetic and real topologies, we find that for reasonably large n (e.g., 100), $k$ is only in the range of $O(n \log n)$. This is explained by the moderately hierarchical nature of Internet routine.Our scheme only assumes the knowledge of underlying IP topology, and any link can become lossy or return to normal. In addition, our technique is tolerant to topology measurement inaccuracies, and is adaptive to topology changes.

### Talks

#### Matrix Factorizations for Computer Network Tomography

Householder Symposium

tomography
•
meeting external invited plenary

#### Matrix Factorizations for Computer Network Tomography

NYU NA Seminar

tomography
•
seminar external invited

#### Linear Algebra for Network Loss Characterization

UC Berkeley LAPACK Seminar

tomography
•
seminar local

#### Linear Algebra for Network Loss Characterization

UC Berkeley LAPACK seminar

tomography
•
seminar local