The Structure of Information Networks
Computer Science 6850
Time: MWF 10:10-11:00 pm.
Place: G01 Gates Hall.
The past decade has seen a convergence of social and technological
networks, with systems such as the World Wide Web characterized
by the interplay between rich information content, the
millions of individuals and organizations who create it,
and the technology that supports it.
This course covers recent research on the structure and analysis of
such networks, and on models that abstract their basic properties.
Topics include combinatorial and probabilistic techniques for
link analysis, centralized and decentralized search algorithms,
network models based on random graphs, and connections with work
in the social sciences.
The course prerequisites include introductory-level background
in algorithms, graphs, probability, and linear algebra, as well
as some basic programming experience (to be able to manipulate
The work for the course
will consist primarily of
two problem sets, a short reaction paper,
and a more substantial project.
Coursework should be handed in through
(1) Random Graphs and Small-World Properties
A major goal of the course is to illustrate
how networks across a variety of domains exhibit
common structure at a qualitative level.
One area in which this arises is in the study
of `small-world properties' in networks:
many large networks have short paths between most pairs of nodes,
even though they are highly clustered at a local level,
and they are searchable in the sense that one can
navigate to specified target nodes without global knowledge.
These properties turn out to provide insight into the
structure of large-scale social networks, and,
in a different direction, to have applications
to the design of decentralized peer-to-peer systems.
- Small-world experiments in social networks.
- Basic Random Graph Models, and the Consequences of Expansion.
- Decentralized Search in Networks.
- Decentralized Search in Peer-to-Peer Systems
- Nearest-Neighbor Search in Metric Spaces
(2) Cascading Behavior in Networks
We can think of a network as a large circulatory system,
through which information continuously flows.
This diffusion of information can happen rapidly or slowly;
it can be disastrous -- as in a panic or cascading failure --
or beneficial -- as in the spread of an innovation.
Work in several areas has proposed models for such processes,
and investigated when a network is more or less susceptible
to their spread.
This type of diffusion or cascade process
can also be used as a design principle for network protocols.
This leads to the idea of
epidemic algorithms, also called gossip-based algorithms,
in which information is propagated through a collection
of distributed computing hosts, typically using some
form of randomization.
- Models of Collective Action.
- Threshold-Based Models of Diffusion in Networks.
- Simple Probabilistic Models of Contagion.
- Finding Influential Sets of Nodes.
(3) Heavy-Tailed Distributions in Networks
The degree of a node in a network is the number
of neighbors it has.
For many large networks -- including the Web, the Internet,
collaboration networks, and semantic networks --
the fraction of nodes with very high degrees is much larger
than one would expect based on ``standard'' models of random graphs.
The particular form of the distribution ---
the fraction of nodes with degree d
decays like d to some fixed power --- is called a power law.
What processes are capable of generating such power laws,
and why should they be ubiquitous in large networks?
The investigation of these questions suggests that power laws
are just one reflection of the local and global processes driving
the evolution of these networks.
- Preferential Attachment and Rich-get-Richer Processes
- Hierarchical Network Models and Zipf's Law
- Power Laws through Optimization
(4) Spectral Analysis of Networks
One can gain a lot of insight into the structure of a network
by analzing the
eigenvalues and eigenvectors of its adjacency matrix.
The connection between spectral parameters and
the more combinatorial properties of networks and datasets
is a subtle issue, and
while many results have been established about this connection,
it is still not fully understood.
This connection has also led to a number of applications,
including the development of link analysis algorithms for Web search.
- Link Analysis and Web Search
(5) Clustering and Communities in Networks
Clustering is one of the oldest and most well-established
problems in data analysis; in the context of networks,
it can be used to search for densely connected communities.
A number of techniques have been applied to this problem,
including combinatorial and spectral methods.
A task closely related to clustering is the problem
of classifying the nodes of a network using a known set of labels.
For example, suppose we wanted to
classify Web pages into topic categories.
Automated text analysis can give us an estimate of the topic of each page;
but we also suspect that pages have some tendency to be similar
to neighboring pages in the link structure.
How should we combine these two sources of evidence?
A number of probabilistic frameworks are useful for this task,
including the formalism of Markov random fields,
which -- for quite different applications --
has been extensively studied in computer vision.
- Hierarchical Clustering of Networks
- Minimum Cuts for Graph Partitioning
Yuri Boykov, Olga Veksler and Ramin Zabih.
Fast Approximate Energy Minimization via Graph Cuts.
IEEE Transactions on Pattern Analysis and Machine Intelligence vol. 23, no. 11 pages 1222-1239 (2001).
Avrim Blum, Shuchi Chawla.
Learning from Labeled and Unlabeled Data using Graph Mincuts.
International Conference on Machine Learning (ICML), 2001.
Gary Flake, K. Tsioutsiouliklis, R.E. Tarjan.
Graph Clustering Techniques based on Minimum Cut Trees.
Technical Report 2002-06, NEC, Princeton, NJ, 2002.
- Enumerating Small Communities
- Partitioning Signed Networks: Structural Balance Theory