Analysis of Large-Scale Cell Phone Networks 10-802 Course Project Leman Akoglu Bhavana Dalvi Skyler Speakman April 22 2010.

Slides:



Advertisements
Similar presentations
Classes will begin shortly. Networks, Complexity and Economic Development Class 5: Network Dynamics.
Advertisements

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Analysis and Modeling of Social Networks Foudalis Ilias.
Based on chapter 3 in Networks, Crowds and markets (by Easley and Kleinberg) Roy Mitz Supervised by: Prof. Ronitt Rubinfeld November 2014 Strong and weak.
Nodes, Ties and Influence
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Automatic Identification of ROIs (Regions of interest) in fMRI data.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Lecture 9 Measures and Metrics. Structural Metrics Degree distribution Average path length Centrality Degree, Eigenvector, Katz, Pagerank, Closeness,
Comparison of Online Social Relations in terms of Volume vs. Interaction: A Case Study of Cyworld Hyunwoo Chun+ Haewoon Kwak+ Young-Ho Eom* Yong-Yeol Ahn#
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.
Community Detection in a Large Real-World Social Network Karsten Steinhaeuser Nitesh V. Chawla DIAL Research Group University of Notre.
The Structure of Information Pathways in a Social Communication Network The Structure of Information Pathways in a Social Communication Network Presented.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Structure, Tie Persistence and Event Detection in Large Phone and SMS Networks Leman Akoglu and Bhavana Dalvi {lakoglu, Carnegie Mellon.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
CS8803-NS Network Science Fall 2013
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Models of Influence in Online Social Networks
Exploiting indirect neighbors and topological weight to predict protein function from protein– protein interactions Hon Nian Chua, Wing-Kin Sung and Limsoon.
Jerry Scripps N T O K M I N I N G E W R. Overview What is network mining? What is network mining? Motivation Motivation Preliminaries Preliminaries definitions.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Using Friendship Ties and Family Circles for Link Prediction Elena Zheleva, Lise Getoor, Jennifer Golbeck, Ugur Kuter (SNAKDD 2008)
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Network Aware Resource Allocation in Distributed Clouds.
Anomalous Node Detection in Time Series of Mobile Communication Graphs Leman Akoglu January 28, 2010.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα Strong and Weak Ties Chapter 3, from D. Easley and J. Kleinberg book.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
A Graph-based Friend Recommendation System Using Genetic Algorithm
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Emergence of Scaling and Assortative Mixing by Altruism Li Ping The Hong Kong PolyU
Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
EVENT DETECTION IN TIME SERIES OF MOBILE COMMUNICATION GRAPHS
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Leveraging Asset Reputation Systems to Detect and Prevent Fraud and Abuse at LinkedIn Jenelle Bray Staff Data Scientist Strata + Hadoop World New York,
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Peer Centrality in Socially-Informed P2P Topologies Nicolas Kourtellis, Adriana Iamnitchi Department of Computer Science & Engineering University of South.
Online Social Networks and Media
1. 2 CIShell Features A framework for easy integration of new and existing algorithms written in any programming language. CIShell Sci2 Tool NWB Tool.
Slides are modified from Lada Adamic
Link Prediction Topics in Data Mining Fall 2015 Bruno Ribeiro
Comparing Snapshots of Networks Shah Jamal Alam and Ruth Meyer Centre for Policy Modelling 28 th March, 2007 – CAVES Bi-annual Meeting, IIASA,
Network Community Behavior to Infer Human Activities.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Supervised Random Walks: Predicting and Recommending Links in Social Networks Lars Backstrom (Facebook) & Jure Leskovec (Stanford) Proc. of WSDM 2011 Present.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
Wenyu Zhang From Social Network Group
Groups of vertices and Core-periphery structure
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Outlier Discovery/Anomaly Detection
Classes will begin shortly
Presentation transcript:

Analysis of Large-Scale Cell Phone Networks Course Project Leman Akoglu Bhavana Dalvi Skyler Speakman April

Analysis of Large-Scale Cell Phone Networks 3.8 million anonymized customers from India Gender Activation Date Age (sketchy) 6 months of time-stamped directed phone calls Time of day Duration Switching stations removed (bummer) 220 million text messages Time of day

Analysis of Large-Scale Cell Phone Networks Analysis of Tie Strengths and Mutuality Leman Akoglu Persistence of Social Ties Bhavana Dalvi Pattern & Event Detection in Social Networks Skyler Speakman

Analysis of Ties in Composite Networks Link Prediction in Large SMS+CALL Networks Presented by Leman Akoglu April 22, 2010

Sub-Problem Goal: Link prediction – In integrated networks (SMS+VOICE) Questions: 1.How do different methods perform? 2.Does information of edge weights matter? 3.Does knowledge of VOICE interactions improve SMS predictions, and vice versa? Similar to: D. Liben-Nowell, J. Kleinberg. The Link Prediction Problem for Social Networks. Proc. 12th International Conference on Information and Knowledge Management (CIKM), 2003The Link Prediction Problem for Social Networks. – They use very small graphs with up to 5K nodes and 50K edges. Here we have networks of millions of users. – They did not use the weighted version of most methods.

Methods used in Link Prediction

Results UNWEIGHTED METHODSVOICE onlyVOICE +SMSSMS onlySMS+VOICE /merge A random prediction~5.27 (0.01%)~1.67 (0.0022%) #Common neighbors CN1316 (2.69%)1323 (2.70%)2299 (2.96%)3495 (4.51%) Jaccard index1064 (2.17%) 268 (0.54%) 890 (1.14%)3251 (4.19%) Jaccard index*CN1813 (3.71%)1208 (2.47%)4836 (6.24%)5207 (6.72%) Adamic/Adar1318 (2.69%)1324 (2.71%) 1821(2.35%)3597 (4.64%) Preferential attachment 63 (0.12%) 577 (0.74%) 572 (0.73%) Katz (rank=100) β = (0.62%) 860 (1.11%) β = (2.01%) 888 (1.14%) β = (2.13%)1856 (2.39%) Pagerank α = (1.67%) 418 (0.53%) (rank=100) α = (1.68%) 731 (0.84%) α = (1.67%)1009 (1.30%) α = (1.68%)1016 (1.31%) α = (1.69%) 999 (1.28%) WEIGHTED METHODSVOICE onlyVOICE +SMSSMS onlySMS+VOICE #Common neighbors CN1665 (3.40%)1662 (3.40%)1275 (1.64%)2037 (2.63%) Jaccard index2003 (4.10%)1164 (2.38%)1545 (1.99%)4495 (5.80%) Jaccard index*CN1918 (3.92%)1759 (3.60%)1588 (2.05%)2879 (3.71%) Adamic/Adar1716 (3.51%)1714 (3.50%)1013 (1.30%)1663 (2.14%) Preferential attachment 52 (0.10%) 280 (0.36%) Katz (rank=100) β = (0.01%) 5 (0.0065%) β = (0.01%) 5 (0.0065%) β = (0.01%) 4 (0.0052%) Pagerank α = (2.07%) 361 (0.46%) (rank=100) α = (2.07%) 377 (0.48%) α = (2.07%) 459 (0.59%) α = (2.07%) 569 (0.73%) α = (2.08%) 657 (0.84%) In general, low prediction accuracy (up to ~7%)

Sub-Problem II Main sub-project goal: Analysis of ties/links – In integrated networks (SMS+VOICE) Questions: 1.How do mutual and non-mutual networks differ? 2.How equal is reciprocity? 3.Is there a correlation between node degree and its neighbors’ degrees? 4.How does total duration or number of phonecalls/SMSs grow by the number of contacts? 5.Does strength of a tie depend on neigborhood overlap?

1. How do mutual and non-mutual networks differ? SMSPHONECALL 0.3 In the mutual network of SMS, 70% of the nodes become singletons!

2. How equal is reciprocity? SMSPHONECALL

3. Is there a correlation between node degree and its neighbors’ degrees? SMS disassortative vs. assortative mixing high degree nodes with low degree neighbors, where also all edges have the same weight.

3. Is there a correlation between node degree and its neighbors’ degrees? PHONECALL

4. How does total duration or number of phonecalls/SMSs grow by the number of contacts? SMSPHONECALL

5. Does strength of a tie depend on neigborhood overlap? SMS

5. Does strength of a tie depend on neigborhood overlap? PHONECALL

CONCLUSIONS: 1.How do mutual and non-mutual networks differ? There is far less mutuality in the SMS network. 2.Is reciprocity balanced? Yes, balanced and small reciprocity is more common. 3.Is there a correlation between node degree and its neighbors’ degrees? Yes, degree of a node and avg. degree of its neighbors have an assortative mixing for nodes of degree>~10. 4.How does total duration or number of phonecalls/SMSs grow by the number of contacts? Total node strength grows super-linearly (power-law) by increasing degree. 5.Does strength of a tie depend on neigborhood overlap? Yes, tie strength increases by increasing neighborhood overlap on average.

Network Structure and Tie Persistence in mobile network Bhavana Dalvi

Goal Predict which of the existing ties will survive? Questions : – Which link features matter? – Which node features matter? – How are they correlated to each other? – Which prediction method to use?

Related Work Structure and tie strengths in mobile communication network - Onnela, Barabasi - PNAS 2007 – Coupling between tie strengths and local network structure – Information diffusion through strong ties vs weak ties The dynamics of a mobile phone network - Hidalgo et. al. ScienceDirect Jan 2008 – Relation between structure of mobile network and link persistence – Rule based prediction – We formulate it as prediction problem.

Problem Formulation Divide the data into time panels Given the links and network structure in panel 1 predict which links will persist in panels 2,3,4 etc.

Concept Definitions Persistence of tie Perseverence of user

Random Sample Selected seed uniformly at random Took a subgraph of original graph by traversing neighbors and their neighbors # users : 5K #links : 14.6K Duration : 3 months

Tie persistence distribution Bimodal distribution Ties either active most of the times or rarely active

Tie Attributes Reciprocity (R) – 1 : If the tie is reciprocal – 0 : otherwise Topological Overlap (TO)

Node Attributes Degree (K) Cluster Coefficient (C) Average reciprocity (r) – fraction of ties containing both incoming and outgoing calls

Pearson Correlation Coefficient Measures of dependence between two quantities Corr(X,Y) = cov(X,Y) var(X) * var(Y)

Tie Persistence Delta_CDelta_KDelta_rRTOTie_persistence Delta_C Delta_K Delta_r R TO Tie_persistence1

User Perseverence CKrUser_perseverence C K r User_perseverence1

Example regression Coefficients for Tie Persistence Delta_C : Delta_K : Delta_r : R : TO :

Prediction Problem Input : – Links in panel 1 – For each link Delta_C, Delta_K, Delta_r, R and TO (from panel 1 data) Output : – Will a link in panel 1 persist in Panel k? K = 2,3,4,5,6

Variants of Logistic regression for tie persistence prediction Using both node and tie attributes improves the prediction accuracy

Comparison with rule based method LR performs better than rule based method : (R =1 & TO > 0.1) then predict 1 else 0 LR performs better than rule based method : (R =1 & TO > 0.1) then predict 1 else 0

Conclusion To predict persistence of existing ties local network attributes does help. LR like techniques give better accuracy than rule based techniques.

Analysis of Social Media Presentation Contribution from Skyler Speakman April

Pattern Detection through Subset Scanning (A reminder) Find the subset of locations for a given region that has the highest score Affected locations Un-affected locations contributing to region score (Neill, 2008)

Connectivity Constraints Increase power to detect non- circular clusters Create an adjacency graph of the locations and score every connected subset

Social Media Can pattern detection work with people on ‘societal scale’ ? – Automatic (participatory sensing) – Self-reported (healthmap.org)

In the News… (American Teenagers) Texting has surpassed: – Face-to-face – – Instant Message – Voice calling 1 in 3 send more than 100 texts a day Pew Internet & American Life Project

Anomaly Detection through Subset Scanning Assume texts ~ Poisson(b i ) (learned from historical data) We wish to maximize a scoring function over all possible connected subsets, S Provides a likelihood score that the counts in S are generated from a different distribution (Anomalous)

Initial Attempt Formed a very simple social network based off of ‘1 call’ – Add a threshold? … Still running Focus on a much smaller group of extremely active texters

Trimming the data… Require a threshold of monthly activity in order to be considered – 500 incoming & outgoing texts every month 468 customers Require a threshold of messages exchanged in order to be connected Threshold Edges

Threshold Edges Runtime (1 month) 20s155s385s9.7m26.8m49.2m500m104h--

Maximum likelihood ratio score for everyday in May Highest scoring connected subset for a selection of days

Conclusions GraphScan algorithm can reasonably scale to graphs of a few hundred nodes Performance is highly dependent on underlying graph structure – Future improvements through heuristics are possible (necessary) Realistic anomaly detection is difficult with unlabeled data, but have demonstrated a solid proof of principle