The Structure of Information Networks

Computer Science 6850
Cornell University
Spring 2017

Time: MWF 10:10-11:00 pm.

Place: G01 Gates Hall.

http://www.cs.cornell.edu/courses/cs6850/

Course Staff

Instructor: Jon Kleinberg
Office hours: 11 am - 12 noon, Wednesday, 318 Gates.
Part-time TAs: Rahmtin Rotabi and Rad Niazadeh
Office hours: 11 am - 12 noon, Tuesday, 324 Gates.

Overview

The past decade has seen a convergence of social and technological networks, with systems such as the World Wide Web characterized by the interplay between rich information content, the millions of individuals and organizations who create it, and the technology that supports it. This course covers recent research on the structure and analysis of such networks, and on models that abstract their basic properties. Topics include combinatorial and probabilistic techniques for link analysis, centralized and decentralized search algorithms, network models based on random graphs, and connections with work in the social sciences.

The course prerequisites include introductory-level background in algorithms, graphs, probability, and linear algebra, as well as some basic programming experience (to be able to manipulate network datasets).

The work for the course will consist primarily of two problem sets, a short reaction paper, and a more substantial project. Coursework should be handed in through CMS.

Course Outline

(1) Random Graphs and Small-World Properties

A major goal of the course is to illustrate how networks across a variety of domains exhibit common structure at a qualitative level. One area in which this arises is in the study of `small-world properties' in networks: many large networks have short paths between most pairs of nodes, even though they are highly clustered at a local level, and they are searchable in the sense that one can navigate to specified target nodes without global knowledge. These properties turn out to provide insight into the structure of large-scale social networks, and, in a different direction, to have applications to the design of decentralized peer-to-peer systems.

Small-world experiments in social networks.

J. Travers and S. Milgram. An experimental study of the small world problem. Sociometry 32(1969).

J. Kleinfeld. Could it be a Big World After All? The `Six Degrees of Separation' Myth. Society, April 2002.

Peter Sheridan Dodds, Roby Muhamad, Duncan J. Watts. An Experimental Study of Search in Global Social Networks. Science 301(2003), 827.

Basic Random Graph Models, and the Consequences of Expansion.

Some Basic Calculations on Random Graphs.

Decentralized Search in Networks.

J. Kleinberg. Complex Networks and Decentralized Search Algorithms. Proceedings of the International Congress of Mathematicians (ICM), 2006.

Section 20.7 of D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

Decentralized Search in Peer-to-Peer Systems

H. Balakrishnan, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Looking up data in P2P systems. Communications of the ACM 46:43-48, February 2003.

Nearest-Neighbor Search in Metric Spaces

David R. Karger, Matthias Ruhl. Finding nearest neighbors in growth-restricted metrics. STOC 2002: 741-750

(2) Cascading Behavior in Networks

We can think of a network as a large circulatory system, through which information continuously flows. This diffusion of information can happen rapidly or slowly; it can be disastrous -- as in a panic or cascading failure -- or beneficial -- as in the spread of an innovation. Work in several areas has proposed models for such processes, and investigated when a network is more or less susceptible to their spread. This type of diffusion or cascade process can also be used as a design principle for network protocols. This leads to the idea of epidemic algorithms, also called gossip-based algorithms, in which information is propagated through a collection of distributed computing hosts, typically using some form of randomization.

Models of Collective Action.

M. Granovetter. Threshold models of collective behavior. American Journal of Sociology 83(6):1420-1443, 1978.

Threshold-Based Models of Diffusion in Networks.

Section 19.7 of D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

Simple Probabilistic Models of Contagion.

Section 21.8 of D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

Finding Influential Sets of Nodes.

J. Kleinberg. Cascading Behavior in Networks: Algorithmic and Economic Issues. In Algorithmic Game Theory (N. Nisan, T. Roughgarden, E. Tardos, V. Vazirani, eds.), Cambridge University Press, 2007.

(3) Heavy-Tailed Distributions in Networks

The degree of a node in a network is the number of neighbors it has. For many large networks -- including the Web, the Internet, collaboration networks, and semantic networks -- the fraction of nodes with very high degrees is much larger than one would expect based on ``standard'' models of random graphs. The particular form of the distribution --- the fraction of nodes with degree d decays like d to some fixed power --- is called a power law. What processes are capable of generating such power laws, and why should they be ubiquitous in large networks? The investigation of these questions suggests that power laws are just one reflection of the local and global processes driving the evolution of these networks.

Power Laws

M. Mitzenmacher A Brief History of Generative Models for Power Law and Lognormal Distributions. Internet Mathematics, vol 1, No. 2, pp. 226-251, 2004.

Preferential Attachment and Rich-get-Richer Processes

Section 18.7 of D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

Hierarchical Network Models and Zipf's Law

J. Leskovec, J. Kleinberg, C. Faloutsos. Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. Proc. 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2005.

Power Laws through Optimization

A. Fabrikant, E. Koutsoupias, C. Papadimitriou. Heuristically Optimized Trade-offs: A New Paradigm for Power Laws in the Internet. 29th International Colloquium on Automata, Languages, and Programming (ICALP), 2002.

(4) Spectral Analysis of Networks

One can gain a lot of insight into the structure of a network by analzing the eigenvalues and eigenvectors of its adjacency matrix. The connection between spectral parameters and the more combinatorial properties of networks and datasets is a subtle issue, and while many results have been established about this connection, it is still not fully understood. This connection has also led to a number of applications, including the development of link analysis algorithms for Web search.

Graph Partitioning

Notes on Spectral Analysis of Graphs.

Daniel A. Spielman and Shang-Hua Teng. Spectral Partitioning Works: Planar graphs and finite element meshes. Proceedings of the 37th Annual IEEE Conference on Foundations of Computer Science, 1996.

Random Walks

Notes on Random Walks in Graphs.

Link Analysis and Web Search

Section 14.6 of D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

(5) Clustering and Communities in Networks

Clustering is one of the oldest and most well-established problems in data analysis; in the context of networks, it can be used to search for densely connected communities. A number of techniques have been applied to this problem, including combinatorial and spectral methods. A task closely related to clustering is the problem of classifying the nodes of a network using a known set of labels. For example, suppose we wanted to classify Web pages into topic categories. Automated text analysis can give us an estimate of the topic of each page; but we also suspect that pages have some tendency to be similar to neighboring pages in the link structure. How should we combine these two sources of evidence? A number of probabilistic frameworks are useful for this task, including the formalism of Markov random fields, which -- for quite different applications -- has been extensively studied in computer vision.

Hierarchical Clustering of Networks

Section 3.6 of D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

Minimum Cuts for Graph Partitioning

Yuri Boykov, Olga Veksler and Ramin Zabih. Fast Approximate Energy Minimization via Graph Cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence vol. 23, no. 11 pages 1222-1239 (2001).

Avrim Blum, Shuchi Chawla. Learning from Labeled and Unlabeled Data using Graph Mincuts. International Conference on Machine Learning (ICML), 2001.

Gary Flake, K. Tsioutsiouliklis, R.E. Tarjan. Graph Clustering Techniques based on Minimum Cut Trees. Technical Report 2002-06, NEC, Princeton, NJ, 2002.

Enumerating Small Communities

R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins. Trawling the web for emerging cyber-communities. 8th International World Wide Web Conference, May 1999.

R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules. 20th Int'l Conference on Very Large Databases (VLDB), 1994.

Partitioning Signed Networks: Structural Balance Theory

Section 5.5 of D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.

The Structure of Information Networks

Computer Science 6850 Cornell University Spring 2017

Course Staff

Overview

Course Outline

Computer Science 6850
Cornell University
Spring 2017