Room: Hollister
368.
Overview
This class will look at information networks, with a particular
emphasis on the Web, but also studying graphs such as social
networks, the email graph, citation graphs, and others. We will
look at empirical studies of the properties of these graphs,
algorithms for mining information from these graphs, mathematical
models of information networks, issues in the search of these
networks, and constructing networks to enable decentralized
search.
An overview of the course topics, including course prerequisites
and assignments, can be found in the following handout.
The content of the course will be drawn from a number of recent
papers in the area. This list will be updated as the term
progresses.
Assignments
Reaction paper 1.
Reaction paper 2.
Reaction paper 3.
Papers
(1) Overview and Background.
What ideas led to the creation of the Web? What can we say
about its basic structure?  V. Bush.
As We May Think. Atlantic Monthly, July 1945.
 World Wide Web Consortium. A Little History of the
World Wide Web, 19451995.
 R. Albert, H. Jeong, A. Barabasi.
Diameter of the WorldWide Web. Nature 401:130131, 1999.
 A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S.
Rajagopalan, R. Stata, A. Tomkins, J. Wiener. Graph structure in the
web. 9th International World Wide Web Conference, May 2000.
 S. Dill, R. Kumar, K. McCurley, S. Rajagopalan, D.
Sivakumar, A. Tomkins.
Selfsimilarity in the Web. 27th International Conference on
Very Large Data Bases, 2001.

Krishna Bharat, Andrei Broder, Monika Henzinger, Puneet Kumar,
and Suresh Venkatasubramanian.
The Connectivity Server: fast access to linkage information on the Web.
Proc. 7th International World Wide Web Conference, 1998. Also, Computer Networks and ISDN Systems,
30:469477, 1998.
 Other documents
(2) Uses of link information.
What kinds of information can we infer from the existence of links in information networks?
 The HITS and PageRank algorithms

 Connections with citation analysis

 E. Garfield.
Citation analysis as a tool in journal evaluation.
Science, 178:471479, 1972.

G. Pinski, F. Narin.
Citation influence for journal aggregates of scientific
publications: Theory, with application
to the literature of physics.
Information Processing and Management, 12(1976), pp. 297312.

 Comparisons of indegree and PageRankbased link methods
 Probabilistic models for link analysis

 Issues in and extensions of linkbased methods

B. D. Davison.
Recognizing Nepotistic Links on the Web.
AAAI Workshop on Artificial Intelligence for Web Search, 2000.

K. Bharat and M. R. Henzinger.
Improved algorithms for topic distillation in a hyperlinked environment.
21st International Conference on Research and Development in
Information Retrieval (SIGIR 1998).

S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S.R. Kumar,
P. Raghavan, S. Rajagopalan, A. Tomkins.
Mining the link structure of the World Wide Web.
IEEE Computer, August 1999.

D. Cohn, T. Hofmann,
The Missing Link  A Probabilistic Model of Document Content
and Hypertext Connectivity.
Advances in Neural Information Processing Systems (NIPS) 13, 2000.

S. Chakrabarti, M. M. Joshi and V. B. Tawde.
Enhanced topic distillation using text, markup tags, and hyperlinks.
24th Annual International Conference on Research and Development
in Information Retrieval, 2001.

D. Rafiei, A. Mendelzon.
What is this Page Known for? Computing Web Page Reputations.
Proc. WWW9 Conference, Amsterdam, May 2000

P. Domingos, M. Richardson.
The Intelligent Surfer: Probabilistic Combination of
Link and Content Information in PageRank.
Advances in Neural Information Processing Systems 14, 2002.

T. H. Haveliwala.
TopicSensitive PageRank.
11th International World Wide Web Conference, 2002.
 G. Jen, J. Widom.
Scaling personalized web search.
12th International World Wide Web Conference, 2003.
 Z. Gyongyi, H. GarciaMolina, and J. Pedersen.
Combating web spam with TrustRank.
Proceedings of the 30th VLDB Conference, 2004.
 M. Bianchini, M. Gori, and F. Scarselli.
Inside PageRank.
To appear in ACM Transactions on Internet Technology 5, 2005.
 A.N. Langville, C.D. Meyer.
Deeper inside PageRank.
To appear in Internet Mathematics 5, 2005.

 Uses of anchortext in pagefinding tasks

 N. Craswell, D. Hawking, and S. Robertson.
Effective site finding using link anchor information.
24th International Conference on Research and Development in
Information Retrieval (SIGIR 2001), 250257, 2001.

 N. Eiron, and K. McCurley.
Analysis of anchor text for web search.
26th International Conference on Research and Development in
Information Retrieval (SIGIR 2003), 459460 (abstract), 2003.
 T. Upstill, N. Craswell, and D. Hawking.
Queryindependent evidence in home page finding.
ACM Transactions on Information Systems, 21:286313, 2003.

 Related papers

 A. Broder. A taxonomy of web search. SIGIR Forum, 36:310, 2002.
 D. E. Rose and D. Levinson. Understanding user goals in web search. WWW 2004.

 R. Fagin, A. Karlin, J. Kleinberg, P. Raghavan, S. Rajagopalan, R. Rubinfeld, M. Sudan, and A. Tomkins.
Random walks with "back buttons". Annals of Applied Probability 11:810862, 2001.
(3) Finding communities.
How can we automatically find "communities" of common interests in information networks?

 Spectral methods

S. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas
and R.A. Harshman.
Indexing by latent semantic analysis.
Journal of the Society for Information Science, 41(6), 391407 (1990).

C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala.
Latent Semantic Indexing: A Probabilistic Analysis.
17th ACM Symposium on the Principles of Database Systems, 1998.

D. Gibson, J. Kleinberg, P. Raghavan.
Inferring Web communities from link topology.
Proc. 9th ACM Conference on Hypertext and Hypermedia, 1998.

P. Drineas, R. Kannan, A. Frieze, S. Vempala and V. Vinay.
Clustering large graphs via the Singular Value Decomposition.
Machine Learning 56:933, 2004.

Y. Azar, A. Fiat, A. Karlin, F. McSherry and J. Saia.
Spectral Analysis of Data.
33rd ACM Symposium on Theory of Computing, 2001.

D. Achlioptas, A. Fiat, A. Karlin, and F. McSherry,
Web Search via Hub Synthesis.
42nd IEEE Symposium on Foundations of Computer Science, 2001, p.611618.

A. Y. Ng, A. X. Zheng, and M. I. Jordan.
Link analysis, eigenvectors, and stability.
International Joint Conference on Artificial Intelligence (IJCAI), 2001.

A. Y. Ng, A. X. Zheng, and M. I. Jordan.
Stable algorithms for link analysis.
24th International Conference on Research and Development in
Information Retrieval (SIGIR 2001).
 Combinatorial methods


R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins.
Trawling the web for emerging cybercommunities.
8th International World Wide Web Conference, May 1999.

R. Agrawal, R. Srikant.
Fast algorithms for mining association rules.
20th Int'l Conference on Very Large Databases (VLDB), 1994.

G.W. Flake, S. Lawrence, and C.L. Giles.
Efficient identification of web communities.
Proc. of the 6th International Conference on Knowledge Discovery and Data Mining (KDD), 2000.

G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee.
Selforganization and identification of web communities.
IEEE Computer, 35:3, March 2002.


G. Flake, K. Tsioutsiouliklis, and R.E. Tarjan.
Graph clustering techniques based on minimum cut trees.
Technical Report 200206, NEC, Princeton, NJ, 2002.

M. Girvan and M. E. J. Newman.
Community structure in social and biological networks.
Proc. Natl. Acad. Sci. USA 99, 82718276 (2002).

M. Granovetter.
The strength of weak ties.
American Journal of Sociology, 78(6):13601380, 1973.

 Related papers
(4) Rank aggregation and metasearch.
Given multiple rankings of web pages, can we combine these in a principled way into a single ranking?

K. Arrow.
Social choice and individual values. Wiley, 1951.

H.P. Young.
Condorcet's theory of voting.
American Political Science Review 82(1988), pp. 12311244.

Cynthia Dwork, Ravi Kumar, Moni Naor, D. Sivakumar.
Rank aggregation methods for the Wwb.
10th International World Wide Web Conference, May 2001.

Erik Selberg, Oren Etzioni.
The MetaCrawler architecture for resource aggregation on the web.
IEEE Expert, 1997.

Weiyi Meng, Clement Yu, KingLup Liu.
Building efficient and effective metasearch engines.
ACM Computing Surveys 34(2002).
 Related papers.
(5) Power law distributions.
Many properties of information networks obey power law distributions. Where has this been observed? What accounts for it? How can it be modelled?
 Surveys

M. Mitzenmacher
A brief history of generative models for power law and lognormal distributions.
Internet Mathematics, vol 1, No. 2, pp. 226251, 2004.

R. Albert and A.L.Barabasi.
Statistical mechanics of complex networks.
Reviews of Modern Physics 74, 47 (2002).

W. Aiello, F. Chung, L. Lu.
Random evolution of massive graphs.
Handbook of Massive Data Sets, (Eds. James Abello et al.), Kluwer, 2002,
pages 97122.

 M.E.J. Newman.
The structure and function of complex networks.
SIAM Review 45: 167256, 2003.
 Papers
A.L. Barabasi, R. Albert, and H. Jeong.
Meanfield theory for scalefree random networks.
Physica A 272 173187 (1999).

B.A. Huberman, L.A. Adamic.
Growth dynamics of the WorldWide Web.
Nature, 399 (1999) 130.

M. Faloutsos, P. Faloutsos and C. Faloutsos.
On powerlaw relationships of the Internet topology.
ACM SIGCOMM 1999.

R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal.
Stochastic models for the web graph.
41th IEEE Symp. on Foundations of Computer Science, 2000, pp. 5765.

B. Bollobas, C. Borgs, J. Chayes, and O. Riordan
Directed scalefree graphs
Proceedings of the 14th ACMSIAM Symposium
on Discrete Algorithms (2003), 132139.

R. Kleinberg, J. Kleinberg,
Isomorphism and embedding problems for infinite limits of scalefree graphs.
Proc. 15th ACMSIAM Symposium on Discrete Algorithms, 2005.

A. Fabrikant, E. Koutsoupias, C. Papadimitriou.
Heuristically Optimized Tradeoffs: A new paradigm for
power laws in the Internet.
29th International Colloquium on Automata, Languages,
and Programming (ICALP), 2002.

N. Berger, C. Borgs, J. Chayes, R. D'Souza, and R. Kleinberg.
Competitioninduced preferential attachment.
To appear in Combinatorics, Probability, and Computing.
Extended abstract appeared in Proceedings of the
31st International Colloquium on Automata, Languages, and Programming
(ICALP 2004), pages 208221.

M. E. J. Newman, S. H. Strogatz and D. J. Watts,
Random graphs with arbitrary degree distributions and their applications.
Phys. Rev. E 64, 026118 (2001).

Q. Chen, H. Chang, R. Govindan, S. Jamin, S.J. Shenker, W. Willinger.
The origin of power laws in Internet topologies revisited.
Proc. of IEEE Infocom 2002.

J. Carlson and J. Doyle.
Highly Optimized Tolerance: A mechanism for power laws in designed systems.
Physical Review E 60:2(1999).

A. Lakhina, J.W. Byers, M. Crovella, and P. Xie.
Sampling biases in IP topology measurements.
INFOCOM 2003.

A. Clauset and C. Moore.
Accuracy and scaling phenomena in Internet mapping.
Physical Review Letters, 94:018701, 2005.

D. Achlioptas, A. Clauset, D. Kempe, C. Moore.
The bias of traceroute sampling. To appear in STOC 2005.

D. Aldous.
A tractable complex network model based on the stochastic meanfield model of distance.
In Complex Networks, ed. E. benNaim, et al., pp. 5187, 2004.
(6) Change and decay on the web.
How is the web changing over time? How can we detect changes? How can we use link structure to decide if a page is no longer being
maintained?
 A.Z. Broder, S.C. Glassman, M.S. Manasse, G. Zweig.
Syntactic clustering of the Web. WWW 1997.
 J. Cho, H. GarciaMolina.
The evolution of the web and implications for an incremental crawler.
VLDB 2000.

D. Fetterly, M. Manasse, M. Najork, J.L. Wiener.
A largescale study of the evolution of web pages.
WWW 2003.

A. Ntoulas, J. Cho, C. Olston.
What's new on the web? The evolution of the web from a search engine perspective.
WWW 2004.

W. Koehler.
A longitudinal study of Web pages continued: a consideration of document persistence.
Information Research 9(2), paper 174, 2004.

D. Spinellis.
The decay and failures of web references.
Communications of the ACM 46:7177, 2003.

S. Lawrence, D.M. Pennock, G.W. Flake, R. Krovetz, F.M. Coetzee, E. Glover, F.A. Nielsen, A. Kruger, C.L. Giles.
Persistence of web references in scientific research.
IEEE Computer 34:2631, 2001.

Z. BarYossef, A.Z. Broder, R. Kumar, A. Tomkins.
Sic Transit Gloria Telae: Towards an understanding of the web's decay.
WWW 2004.
(7) Smallworld networks.
Many social networks appear to have a "small world" property. What is the property? Where has it been observed? What accounts for it? How can it be modelled?

S. Milgram.
The small world problem.
Psychology Today 1:6167, 1967.

J. Travers and S. Milgram.
An experimental study of the small world problem.
Sociometry 32:425443, 1969.

C. Korte and S. Milgram.
Acquaintance networks between racial groups: application of the small world method.
Journal of Personality and Social Psychology 15:101108, 1970.

P. Killworth and H. Bernard,
Reverse small world experiment.
Social Networks 1:159192, 1978.

D.J. Watts and S. H. Strogatz.
Collective dynamics of 'smallworld' networks.
Nature 393:44042(1998).

J. Kleinberg.
The smallworld phenomenon: An algorithmic perspective.
Proc. 32nd ACM Symposium on Theory of Computing, 2000.

J. Kleinberg.
Smallworld phenomena and the dynamics of information.
Advances in Neural Information Processing Systems (NIPS) 14, 2001.

D. J. Watts, P. S. Dodds, M. E. J. Newman
Identity and search in social networks.
Science, 296, 13021305, 2002.

J. Kleinfeld.
The small world problem.
Society 39:6166, 2002.

P.S. Dodds, R. Muhamad, D.J. Watts.
An experimental study of search in global social networks.
Science 301(2003), 827.

F. Menczer.
Growing and navigating the small world web by local content.
Proc. Natl. Acad. Sci. USA 99(22): 1401414019, 2002

L. A. Adamic, R. M. Lukose, A. R. Puniyani, B. A. Huberman.
Search in powerlaw networks.
Phys. Rev. E, 64 46135 (2001).

L. Adamic, E. Adar.
How to search a social network.
submitted to Social Networks, 2004.

L.A. Adamic and B.A. Huberman
Information dynamics in a networked world.
in: Eli BenNaim, Hans Frauenfelder, Zoltan Toroczkai, (Eds.),
'Complex Networks', Lecture Notes in Physics, Springer, 2003.

A. Clauset and C. Moore
How do networks become navigable?
preprint at arxiv.org, 2003.

G.S. Manku, M. Bawa, and P. Raghavan.
Symphony: Distributed hashing in a small world.
USENIX Symposium on Internet Technologies and Systems 2003.
 G.S. Manku, M. Naor, and U. Wieder.
Know thy neighbor's neighbor: the power of lookahead in randomized P2P networks.
STOC 2004.
 H. Balakrishnan, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica.
Looking up data in P2P systems.
Communications of the ACM 46:4348, February 2003.
 Related resources
Interlude: Blogs
Ravi Kumar gave a presentation on 4/20 about his work on weblogs. Below are the
references to some of the papers he mentioned.
 R. Kumar, J. Novak, P. Raghavan, A. Tomkins.
Structure and evolution of Blogspace.
Communcations of the ACM 47:3539, 2004.
 R. Kumar, J. Novak, P. Raghavan, A. Tomkins.
On the bursty evolution of Blogspace. WWW 2003.
 D. Gruhl, R. Guha, D. LibenNowell, A. Tomkins.
Information diffusion through Blogspace. WWW 2004.
(8) Information diffusion
How does information and behavior propagate in various kinds of information networks?

T. Schelling. Dynamic models of segregation. Journal of Mathematical Sociology 1:143186, 1971.

M. Granovetter.
Threshold models of collective behavior.
American Journal of Sociology 83:14201443, 1978.

S. Morris.
Contagion.
Review of Economic Studies 67:5778, 2000.

H. Peyton Young.
The diffusion of innovations in social networks.
Santa Fe Institute Working Paper 0204018.

E. Berger.
Dynamic monopolies of constant size.
Journal of Combinatorial Theory Series B 83:191200, 2001.

P. Domingos, M. Richardson.
Mining knowledgesharing sites for viral marketing.
Eighth International Conference on Knowledge Discovery and Data Mining,
KDD2002.

D. Kempe, J. Kleinberg, E. Tardos.
Maximizing the spread of influence through a social network.
Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.

P. Dodds and D. J. Watts.
Universal behavior in a generalized model of contagion.
Phyical Review Letters, 2004.

R. Karp, C. Schindelhauer, S. Shenker, B. Vocking.
Randomized rumor spreading.
41st IEEE Symposium on Foundations of Computer Science, 2000.

M. HarcholBalter, T. Leighton, D. Lewin.
Resource discovery in distributed networks.
ACM Symposium on Principles
of Distributed Computing (PODC), 1999, pp. 229238.
Coda: What's next?
What might be the next generation of information networks? What kinds of interesting issues arise in these networks?

T. BernersLee, J. Hendler, O. Lassila.
The semantic web. Scientific American, May 2001.
 M. Richardson, R. Agrawal, P. Domingos.
Trust management for the semantic web.
The Second International Semantic Web Conference, 2003.
 R. Guha, R. Kumar, P. Raghavan, A. Tomkins.
Propagation of trust and distrust.
WWW 2004.

Related links.
 Tim BernersLee presentation
about the Semantic Web at MIT LCS, September 2002.