Structure, Tie Persistence and Event Detection in Large Phone and SMS Networks Leman Akoglu and Bhavana Dalvi {lakoglu, Carnegie Mellon.

Slides:



Advertisements
Similar presentations
Liang Shan Clustering Techniques and Applications to Image Segmentation.
Advertisements

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Canonical Correlation
Stelios Lelis UAegean, FME: Special Lecture Social Media & Social Networks (SM&SN)
School of Computer Science Carnegie Mellon University Athens University of Economics & Business Patterns amongst Competing Task Frequencies: S u p e r.
As applied to face recognition.  Detection vs. Recognition.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Analysis of Large-Scale Cell Phone Networks Course Project Leman Akoglu Bhavana Dalvi Skyler Speakman April
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
UNDERSTANDING VISIBLE AND LATENT INTERACTIONS IN ONLINE SOCIAL NETWORK Presented by: Nisha Ranga Under guidance of : Prof. Augustin Chaintreau.
Lecture 9 Measures and Metrics. Structural Metrics Degree distribution Average path length Centrality Degree, Eigenvector, Katz, Pagerank, Closeness,
Comparison of Online Social Relations in terms of Volume vs. Interaction: A Case Study of Cyworld Hyunwoo Chun+ Haewoon Kwak+ Young-Ho Eom* Yong-Yeol Ahn#
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Learning Bit by Bit Collaborative Filtering/Recommendation Systems.
Influence and Correlation in Social Networks Aris Anagnostopoulos Ravi Kumar Mohammad Mahdian.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Correlation MARE 250 Dr. Jason Turner.
Why Geography is important.
Sunbelt 2009statnet Development Team ERGM introduction 1 Exponential Random Graph Models Statnet Development Team Mark Handcock (UW) Martina.
A Measurement-driven Analysis of Information Propagation in the Flickr Social Network WWW09 报告人: 徐波.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Violence, Sectarianism and Patterns of Communication in Yemen MURI Presentation Christia, Dahleh, Jadbabaei, Leskovec, 1.
Correlation By Dr.Muthupandi,. Correlation Correlation is a statistical technique which can show whether and how strongly pairs of variables are related.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
The College of Saint Rose CSC 460 / CIS 560 – Search and Information Retrieval David Goldschmidt, Ph.D. from Programming Collective Intelligence by Toby.
3 Objects (Views Synonyms Sequences) 4 PL/SQL blocks 5 Procedures Triggers 6 Enhanced SQL programming 7 SQL &.NET applications 8 OEM DB structure 9 DB.
 Focused Clustering and Outlier Detection in Large Attributed Graphs Author : Bryan Perozzi, Leman Akoglu, Patricia lglesias Sánchez, Emmanuel Müller.
Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.
Introduction to Quantitative Data Analysis (continued) Reading on Quantitative Data Analysis: Baxter and Babbie, 2004, Chapter 12.
Free Powerpoint Templates Page 1 Free Powerpoint Templates Influence and Correlation in Social Networks Azad University KurdistanSocial Network.
1 Computing with Social Networks on the Web (2008 slide deck) Jennifer Golbeck University of Maryland, College Park Jim Hendler Rensselaer Polytechnic.
Anomalous Node Detection in Time Series of Mobile Communication Graphs Leman Akoglu January 28, 2010.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Analysis of Social Media MLD , LTI William Cohen
Chapter Fourteen Statistical Analysis Procedures Statistical procedures that simultaneously analyze multiple measurements on each individual or.
Correlation Association between 2 variables 1 2 Suppose we wished to graph the relationship between foot length Height
To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Correlation Analysis. A measure of association between two or more numerical variables. For examples height & weight relationship price and demand relationship.
Predicting Positive and Negative Links in Online Social Networks
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
EVENT DETECTION IN TIME SERIES OF MOBILE COMMUNICATION GRAPHS
Page 1 Inferring Relevant Social Networks from Interpersonal Communication Munmun De Choudhury, Winter Mason, Jake Hofman and Duncan Watts WWW ’10 Summarized.
Correlation of temperature with solar activity (SSN) Alexey Poyda and Mikhail Zhizhin Geophysical Center & Space Research Institute, Russian Academy of.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
We would expect the ENTER score to depend on the average number of hours of study per week. So we take the average hours of study as the independent.
Scatter Diagram of Bivariate Measurement Data. Bivariate Measurement Data Example of Bivariate Measurement:
Network Community Behavior to Infer Human Activities.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
Social Interactions and Commerce (in Second Life) April 4, 2011.
Correlation Chapter 6. What is a Correlation? It is a way of measuring the extent to which two variables are related. It measures the pattern of responses.
CS 590 Term Project Epidemic model on Facebook
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
© Vipin Kumar IIT Mumbai Case Study 2: Dipoles Teleconnections are recurring long distance patterns of climate anomalies. Typically, teleconnections.
CRIM6660 Terrorist Networks Lesson 1: Introduction, Terms and Definitions.
 DM-Group Meeting Liangzhe Chen, Oct Papers to be present  RSC: Mining and Modeling Temporal Activity in Social Media  KDD’15  A. F. Costa,
Graph clustering to detect network modules
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining: Concepts and Techniques
Uncovering the Mystery of Trust in An Online Social Network
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Edge Weight Prediction in Weighted Signed Networks
Categorizing networks using Machine Learning
Department of Computer Science University of York
Graph Theoretic Analysis of Resting State Functional MR Imaging
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Volume 3, Issue 1, Pages (July 2016)
GANG: Detecting Fraudulent Users in OSNs
“The Spread of Physical Activity Through Social Networks”
Presentation transcript:

Structure, Tie Persistence and Event Detection in Large Phone and SMS Networks Leman Akoglu and Bhavana Dalvi {lakoglu, Carnegie Mellon University and iLab Dataset used for this work was provided by iLab at Carnegie Mellon University. Tie Persistence (TP) : It is the stability of ties across time as number of time- ticks in which a link is observed, over the total number P of time-ticks. User Perseverance (UP) : Perseverance of a user is defined as the average of the persistence of all his/her ties. Tie Persistence (2) Event Detection: Define a sliding window of size W (set to 5 days) Generate a correlation matrix C, with Cij being Pearson’s correlation between the time series of pair (i,j)over window W. Largest eigenvector of C give the “activity” of each node. Compare “activity” vectors over time by taking dot product score Z (1 if same, 0 if perpendicular –flag for small Z) Structure Analysis Phone and SMS network Tie AttributesNode Attributes Reciprocity (R) : 1 if the tie is reciprocal in time tick Degree (K) Topological Overlap (TO) :Cluster Coefficient (C) : User reciprocity (r) : Faction of ties containing both incoming and outgoing calls # common neighbours Node degree # triads in which node is involved How are these attributes correlated to each other and to TP and UP ? Delta_CDelta_KDelta_rRTOTP Delta_C Delta_K Delta_r R TO10.22 TP1 CKrUP C K r10.39 UP1 If A calls/texts B n times, can we say anything about how many times B calls/texts A? Are a node’s degree and its neighbors’ degrees correlated? How does the total duration or the number of phonecalls and SMSs grow by the number of contacts a user has? Does tie strength of i and j depend on their neighborhood overlap? Reciprocity patterns can be used to spot outliers. Deg of node & avg deg of its neighbors exhibit assortative mixing Total node strength grows super- linearly by increasing degree. Tie strength increases by increasing neighborhood overlap on avg. Local network attributes do help to predict tie persistence. Using both tie attributes and node attributes improve prediction accuracy Regression techniques give better accuracy than rule based techniques. Tie Persistence Event Detection At which points in time does the behavior of the customers change considerably? Can the detected change-points be attributed to a set of nodes, i.e. can we characterize which customer(s) cause most of the change? Methodology (1) Feature extraction: Characterize nodes with 12 network- features F: degree (number of contacts), total weight (phone call duration), … One TxN time-series matrix per feature, T=183 days N=1,8M users (left) Z score vs time with W=5 and F=inweight (number of calls received). Top 10 days with the largest Z score is highlighted in red bars. (middle) u(t) vs r(t-1) for each node at T=Dec 26th. Top 5 nodes with the largest change is marked with red stars. (right) inweight vs time for the top 5 nodes marked – notice the change in calling behavior during the Christmas week. Large change in users’ “eigen-behaviors” is flagged as change- points (events) in time. Our method detected “events” that coincide with major holidays and festivals in our data set. These results can be used to spot top users who contributes most to detected changes. How to predict whether a link will persist in the future? Which link and node attributes are important in prediction? Around 2M users, 50M edges, 500M phone calls/SMS 6 months data Tie strength based on (a) # SMS (b) # Phone calls (c) Duration of phone calls