Microsoft Instant Messenger Communication Network How does the world communicate? Jure Leskovec Machine Learning Department

Slides:



Advertisements
Similar presentations
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Advertisements

Milgram-Routing in Social Networks
Analysis and Modeling of Social Networks Foudalis Ilias.
The Connectivity and Fault-Tolerance of the Internet Topology
Online Social Networks and Media Navigation in a small world.
Jure Leskovec Joint work with Eric Horvitz, Microsoft Research.
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
Advanced Topics in Data Mining Special focus: Social Networks.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Network Models Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Models Why should I use network models? In may 2011, Facebook.
Small-World Graphs for High Performance Networking Reem Alshahrani Kent State University.
Small Worlds Presented by Geetha Akula For the Faculty of Department of Computer Science, CALSTATE LA. On 8 th June 07.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences 1 IM Analysis Shruti.
CS 728 Lecture 4 It’s a Small World on the Web. Small World Networks It is a ‘small world’ after all –Billions of people on Earth, yet every pair separated.
Marko Grobelnik, Dunja Mladenic JSI Parts of the presentation taken from the tutorial “Structure and function of real-world graphs and networks” by Jure.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao Yinglian Xie *, Fang Yu *, Qifa Ke *, Yuan Yu *, Yan Chen and Eliot Gillum ‡ EECS Department,
My name is ****** Background with AFS
Mining Large Graphs Part 3: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada Adamic, Deepay Chakrabarti,
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Prepared by Poker Players Research Ltd. Methodology for Spring 2010 Wave Poker Players Research Limited.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Planetary-Scale Views on a Large Instant-Messaging Network 报告人:徐波.
Jure Leskovec Joint work with Eric Horvitz, Microsoft Research.
Jure Leskovec, CMU Eric Horwitz, Microsoft Research.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
November 8, Global Competitive Internet Usage Forecasting Across Countries and Languages June Wei Department of Management/MIS College of Business.
1 Computing with Social Networks on the Web (2008 slide deck) Jennifer Golbeck University of Maryland, College Park Jim Hendler Rensselaer Polytechnic.
Advanced Software Engineering PROJECT. 1. MapReduce Join (2 students)  Focused on performance analysis on different implementation of join processors.
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography.
Web Science Course Lecture: Social Networks - * Dr. Stefan Siersdorfer 1 * Figures from Easley and Kleinberg 2010 (
Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
Internet Economics כלכלת האינטרנט Class 10 – it’s a small world 1.
© Copyright 2006, Thomson South-Western, a division of the Thomson Corporation Internet Marketing & e-Commerce Ward Hanson Kirthi Kalyanam Requests for.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Make it Smart&Creative ICM Cluj-Napoca, 21st April 2015.
Gennaro Cordasco - How Much Independent Should Individual Contacts be to Form a Small-World? - 19/12/2006 How Much Independent Should Individual Contacts.
Online Social Networks and Media
Selfishness, Altruism and Message Spreading in Mobile Social Networks September 2012 In-Seok Kang
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Networks and Surrounding Contexts Chapter 4, from D. Easley and J. Kleinberg book.
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Social Network Analysis. Outline l Background of social networks –Definition, examples and properties l Data in social networks –Data creation, flow and.
Complex Network Theory – An Introduction Niloy Ganguly.
Most of contents are provided by the website Network Models TJTSD66: Advanced Topics in Social Media (Social.
With OVER 750 Million ACTIVE users it’s NO LONGER a question of ‘IF’ a business should have a Facebook Business Page (FanPage) Discover How QuarterMoonPlumbing.com.
How Do “Real” Networks Look?
Tools for large graph mining WWW 2008 tutorial Part 4: Case studies Jure Leskovec and Christos Faloutsos Machine Learning Department Joint work with: Lada.
HEPTech Reaching Out Ian
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Small World Social Networks With slides from Jon Kleinberg, David Liben-Nowell, and Daniel Bilar.
Performance Evaluation Lecture 1: Complex Networks Giovanni Neglia INRIA – EPI Maestro 10 December 2012.
The Structure of Scientific Collaboration Networks by M. E. J. Newman CMSC 601 Paper Summary Marie desJardins January 27, 2009.
Offense: Planetary-Scale Views on a Large Instant Messaging Network J. Leskovec, et al.
Netlogo demo. Complexity and Networks Melanie Mitchell Portland State University and Santa Fe Institute.
Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.
Randolph’s Community Health Network RANDOLPH HEALTH SERVICE AREA JULY 2015.
Social Networks Some content from Ding-Zhu Du, Lada Adamic, and Eytan Adar.
How Do “Real” Networks Look?
How Do “Real” Networks Look?
How Do “Real” Networks Look?
How Do “Real” Networks Look?
2006 Rank Adjusted for Purchasing Power
Advanced Topics in Data Mining Special focus: Social Networks
Planetary-Scale Views on a Large Instant-Messaging Network
Presentation transcript:

Microsoft Instant Messenger Communication Network How does the world communicate? Jure Leskovec Machine Learning Department Joint work with: Eric Horvitz, Microsoft Research

Networks: Why?  Today: large on-line systems leave detailed records of social activity  On-line communities: MyScace, Facebook  , blogging, instant messaging  On-line publications repositories, arXiv, MedLine  Emerging behavior (need lots of data):  Actions of individual nodes are independent but global patterns and regularities emerge

The Largest Social Network  What is the largest social network in the world (that we can relatively easily obtain)? For the first time we had a chance to look at complete (anonymized) communication of the whole planet (using Microsoft MSN instant messenger network) 3

Instant Messaging Contact (buddy) list Messaging window 4

Instant Messaging as a Network 5 Buddy Conversation

IM – Phenomena at planetary scale Observe social phenomena at planetary scale:  How does communication change with user demographics (distance, age, sex)?  How does geography affect communication?  What is the structure of the communication network? 6

Communication data The record of communication  Presence data  user status events (login, status change)  Communication data  who talks to whom  Demographics data  user age, sex, location 7

Data description: Presence  Events:  Login, Logout  Is this first ever login  Add/Remove/Block buddy  Add unregistered buddy (invite new user)  Change of status (busy, away, BRB, Idle,…)  For each event:  User Id  Time 8

Data description: Communication  For every conversation (session) we have a list of users who participated in the conversation  There can be multiple people per conversation  For each conversation and each user:  User Id  Time Joined  Time Left  Number of Messages Sent  Number of Messages Received 9

Data description: Demographics  For every user (self reported):  Age  Gender  Location (Country, ZIP)  Language  IP address (we can do reverse geo IP lookup) 10

Data collection  Log size: 150Gb/day  Just copying over the network takes 8 to 10h  Parsing and processing takes another 4 to 6h  After parsing and compressing ~ 45 Gb/day  Collected data for 30 days of June 2006:  Total: 1.3Tb of compressed data 11

Network: Conversations 12 Conversation

Data statistics Activity over June 2006 (30 days)  245 million users logged in  180 million users engaged in conversations  17,5 million new accounts activated  More than 30 billion conversations 13

Data statistics per day Activity on June  1 billion conversations  93 million users login  65 million different users talk (exchange messages)  1.5 million invitations for new accounts sent 14

User characteristics: age 15

Age piramid: MSN vs. the world 16

Conversation: Who talks to whom?  Cross gender edges:  300 male-male and 235 female-female edges  640 million female-male edges 17

Number of people per conversation  Max number of people simultaneously talking is 20, but conversation can have more people 18

Conversation duration  Most conversations are short 19

Conversations: number of messages Sessions between fewer people run out of steam 20

Time between conversations  Individuals are highly diverse  What is probability to login into the system after t minutes?  Power-law with exponent 1.5  Task queuing model [Barabasi]  My , Darvin’s and Einstein’s letters follow the same pattern 21

Age: Number of conversations User self reported age High Low 22

Age: Total conversation duration User self reported age High Low 23

Age: Messages per conversation User self reported age High Low 24

Age: Messages per unit time User self reported age High Low 25

Who talks to whom: Number of conversations 26

Who talks to whom: Conversation duration 27

Geography and communication  Count the number of users logging in from particular location on the earth 28

How is Europe talking  Logins from Europe 29

Users per geo location Blue circles have more than 1 million logins. 30

Users per capita Fraction of population using MSN: Iceland: 35% Spain: 28% Netherlands, Canada, Sweden, Norway: 26% France, UK: 18% USA, Brazil: 8% Fraction of population using MSN: Iceland: 35% Spain: 28% Netherlands, Canada, Sweden, Norway: 26% France, UK: 18% USA, Brazil: 8% 31

Communication heat map  For each conversation between geo points (A,B) we increase the intensity on the line between A and B 32

 Correlation:  Probability: Homophily (gliha v kup štriha) Age vs. Age 33

Per country statistics  On a particular typical day… 34 Country# of logins# of users# of messagesMessages per user USA38,319,36313,261,337412,729, Brazil20,582,6137,864,424467,972, France19,163,1316,475,858518,931, Unknown18,444,3526,872,347191,167, Spain16,868,5496,140,895503,759, UK16,659,0095,724,826487,018, Canada14,558,6925,021,185160,249, China14,225,1635,314,463101,003, Turkey13,619,7894,696,555353,540, Mexico10,756,9894,359,932209,195, Note that global usage and market share statistics are higher if we accumulate data over longer time periods.

Per typical user per country  On a typical day MSN user from a country … 35 Country Logins on a particular day Users on a particular day Messages sent Messages per user Slovenia364,988130,88415,919, Malta122,84641,8294,993, Hungary1,214,268427,32047,623, Bosnia105,58435,6893,254, Teunion100,33533,3993,041, Gibraltar19,0966,452581, UK16,659,0095,724,826487,018, Macedonia126,72943,7543,669, Netherlands7,399,1602,696,669221,300, Spain16,868,5496,140,895503,759, Note that global usage and market share numbers are higher if we accumulate data over longer time periods.

What about Slovenia (per capita)? StatisticNumber Rank (per capita) Conversations inside19,868,88622 Conversation to outside7,868,48348 Total conversations27,737,36929 Avg. time inside Avg. time outside Avg. time inside (pct.) Messages sent inside Messages sent outside Messages inside (pct.)

Who is Slovenia talking to? 37 Rank Target Country Pairs of people Number of conversations Avg. time per conv. Avg. # of messages 1Slovenia13,41,25019,868, USA61,794922, Spain27,650310, UK14,709204, Germany9,047129, Bosnia9,956114, Yugoslavia8,194104, Italy8,612100, Croatia6,83884, Turkey10,76377, Albania9,51776, Sweden5,08369, Netherlands5,06168, Canada5,00360,

Instant Messaging as a Network 38 Buddy

IM Communication Network  Buddy graph:  240 million people (people that login in June ’06)  9.1 billion edges (friendship links)  Communication graph:  There is an edge if the users exchanged at least one message in June 2006  180 million people  1.3 billion edges  30 billion conversations 39

Buddy network: Number of buddies  Buddy graph: 240 million nodes, 9.1 billion edges (~40 buddies per user) 40

Communication Network: Degree  Number of people a users talks to in a month 41

Network: Small-world  6 degrees of separation [Milgram ’60s]  Average distance 5.5  90% of nodes can be reached in < 8 hops HopsNodes

Network: Searchability  Milgram’s experiment showed:  (1) short paths exist in networks  (2) humans are able to find them  Assume the following setting:  Nodes are scattered on a plane  Given starting node u and we want to reach target node v  Algorithm: always navigate to a neighbor that is geographically closest to target node v  Surprise: Geo-routing finds the short paths (for appropriate distance measure) 43 u v

Communication network: Clustering  How many triangles are closed?  Clustering normally decays as k -1  Communication network is highly clustered: k High clusteringLow clustering 44

Communication Network Connectivity 45

k-Cores decomposition  What is the structure of the core of the network? 46

k-Cores: core of the network  People with k<20 are the periphery  Core is composed of 79 people, each having 68 edges among them 47

Network robustness  We delete nodes (in some order) and observe how network falls apart:  Number of edges deleted  Size of largest connected component 48

Robustness: Nodes vs. Edges 49

Robustness: Connectivity 50

Conclusion  A first look at planetary scale social network  The largest social network analyzed  Strong presence of homophily: people that communicate share attributes  Well connected: in only few hops one can research most of the network  Very robust: Many (random) people can be removed and the network is still connected 51

References  Leskovec and Horvitz: Worldwide Buzz: Planetary-Scale Views on an Instant- Messaging Network, 2007 