Emergence of the web and cyberspace gave rise to detailed traces of human social activity. This offers great opportunities to analyze and model behaviors of millions of people. For example, we examined ''planetary scale'' dynamics of a full Microsoft Instant Messenger network that contains 240 million people, with more than 255 billion exchanged messages per month (4.5TB of data), which makes it the largest social network analyzed to date.
In this talk I will focus on two aspects of the dynamics of large real-world networks: (a) dynamics of information diffusion and cascading behavior in networks, and (b) dynamics of the structure of time evolving networks. First, I will consider network cascades that are created by the diffusion process where behavior cascades from node to node like an epidemic. We study two related scenarios: information diffusion among blogs, and a viral marketing setting of 16 million product recommendations among 4 million people. Motivated by our empirical observations we develop algorithms for detecting disease outbreaks and finding influential bloggers that create large cascades. We exploit the ''submodularity'' principle to develop an efficient algorithm that finds near optimal solutions, while scaling to large problems and being 700 times faster than a simple greedy solution. Second, in our recent work we found counter intuitive patterns that change some of the basic assumptions about fundamental structural properties of networks varying over time. Leveraging our observations we developed a Kronecker graph generator model that explains processes governing network evolution. Moreover, we can fit the model to large networks, and then use it to generate realistic graphs and give formal statements about their properties. Estimating the model naively takes O(N!N^2) while we develop a linear time O(E) algorithm.
Jure Leskovec (www.cs.cmu.edu/~jure) is a PhD candidate in Machine Learning Department at Carnegie Mellon University. He is also a Microsoft Research Graduate Fellow. He received the ACM KDD 2005 and ACM KDD 2007 best paper awards, won the ACM KDD cup in 2003 and topped the Battle of the Sensor Networks 2007 competition. Jure holds three patents. His research interests include applied machine learning and large-scale data mining focusing on the analysis and modeling of large real-world networks as the study of phenomena across the social, technological, and natural worlds.