N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

CMU SCS PageRank Brin, Page description: C. Faloutsos, CMU.
Partitional Algorithms to Detect Complex Clusters
Social network partition Presenter: Xiaofei Cao Partick Berg.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Comparing Twitter Summarization Algorithms for Multiple Post Summaries David Inouye and Jugal K. Kalita SocialCom May 10 Hyewon Lim.
Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.
More on Rankings. Query-independent LAR Have an a-priori ordering of the web pages Q: Set of pages that contain the keywords in the query q Present the.
DATA MINING LECTURE 12 Link Analysis Ranking Random walks.
G ROUP PROXIMITY MEASURE FOR RECOMMENDING GROUPS IN ONLINE SOCIAL NETWORKS Barna Saha and Lise Getoor University of Maryland SNA-KDD Workshop ‘08 Presented.
Lecture 21: Spectral Clustering
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Hongtao Cheng 1 Efficiently Supporting Ad Hoc Queries in Large Datasets of Time Sequences Author:Flip Korn H. V. Jagadish Christos Faloutsos From ACM.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
Spatial Outlier Detection and implementation in Weka Implemented by: Shan Huang Jisu Oh CSCI8715 Class Project, April Presented by Jisu.
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
1 Maximizing Remote Work in Flooding-based P2P Systems Qixiang Sun Neil Daswani Hector Garcia-Molina Stanford University.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
Link Analysis, PageRank and Search Engines on the Web
Scaling Personalized Web Search Glen Jeh, Jennfier Widom Stanford University Presented by Li-Tal Mashiach Search Engine Technology course (236620) Technion.
Measure Proximity on Graphs with Side Information Joint Work by Hanghang Tong, Huiming Qu, Hani Jamjoom Speaker: Mary McGlohon 1 ICDM 2008, Pisa, Italy15-19.
1 Fast Incremental Proximity Search in Large Graphs Purnamrita Sarkar Andrew W. Moore Amit Prakash.
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Affinity Rank Yi Liu, Benyu Zhang, Zheng Chen MSRA.
Fast Random Walk with Restart and Its Applications
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
School of Electronics Engineering and Computer Science Peking University Beijing, P.R. China Ziqi Wang, Yuwei Tan, Ming Zhang.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
CS315 – Link Analysis Three generations of Search Engines Anchor text Link analysis for ranking Pagerank HITS.
CS 533 Information Retrieval Systems.  Introduction  Connectivity Analysis  Kleinberg’s Algorithm  Problems Encountered  Improved Connectivity Analysis.
Similarity Matrix Processing for Music Structure Analysis Yu Shiu, Hong Jeng C.-C. Jay Kuo ACM Multimedia 2006.
On Node Classification in Dynamic Content-based Networks.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
A P ARALLEL A LGORITHM FOR E XTRACTING T RANSCRIPTIONAL R EGULATORY N ETWORK M OTIFS Fu Rong Wu.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
Ranking Link-based Ranking (2° generation) Reading 21.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
Data Structures and Algorithms in Parallel Computing Lecture 7.
DM GROUP MEETING PRESENTATION PLAN Eigenvector-based Centrality Measures For Temporal Networks by D Taylor et.al. Uncovering the Small Community.
Kijung Shin Jinhong Jung Lee Sael U Kang
Efficient Semi-supervised Spectral Co-clustering with Constraints
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Ariel Fuxman, Panayiotis Tsaparas, Kannan Achan, Rakesh Agrawal (2008) - Akanksha Saxena 1.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Graph clustering to detect network modules
Large Graph Mining: Power Tools and a Practitioner’s guide
Quality of a search engine
Search Engines and Link Analysis on the Web
Community detection in graphs
Hanghang Tong, Brian Gallagher, Christos Faloutsos, Tina Eliassi-Rad
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Prof. Paolo Ferragina, Algoritmi per "Information Retrieval"
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
3.3 Network-Centric Community Detection
PRSim: Sublinear Time SimRank Computation on Large Power-Law Graphs.
Presentation transcript:

N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Presented By Bhavana Dalvi Presented By Bhavana Dalvi

O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a

B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ?

B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ?

B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ? 0.8 b b

B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which authors are most related to ‘a’ ? b b

B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which is the uncommon paper written by ‘a’ ?

B IPARTITE GRAPHS AND INTERESTING QUESTIONS Author Paper graph Authors Papers a a Which is the uncommon paper written by ‘a’ ?

B IPARTITE GRAPHS AND INTERESTING QUESTIONS P2P Network 10 users files Which users have similar preferences as a particular user? Jimeng Sun’s presentation at ICDM 2005 Which files are downloaded by users with very different preferences?

O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

Neighborhood formation (NF) Input : query node q in V 1 Output : relevance scores of all the nodes in V 1 to q Anomaly detection (AD) Input : query node q in V 1, Output : normality scores for nodes in V 2 that link to q P ROBLEM DEFINITION V1V2 q E

O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

N EIGHBORHOOD FORMATION Relevance (b, q)  (# short length paths from q to b) b q The connection that links only b and q brings more relevance than the connection which links b, q and other nodes. The connection that links only b and q brings more relevance than the connection which links b, q and other nodes. b q

E XACT NF ALGORITHM : RANDOM WALK WITH RESTART Input : a graph G and a query node q Output : relevance scores to q Construct the transition matrix where every node in the graph becomes a state every state has a restart probability c to jump back to the query node q. transition probability Find the steady-state probability u which is the relevance score of all the nodes to q q c cc c c Jimeng Sun’s presentation at ICDM 2005

F INDING S TEADY S TATE P ROBABILITIES |V 1 | = k, |V 2 | = n M : k*n matrix representing weighted graph G Adjacency matrix : P A = col_norm(M A ) q A : transform query node ‘a’ to (k+n)*1 vector where only a th column has 1 and rest are 0. u A : steady state probability vector with restart probability c Bipartite structure : k << n then savings are significant

E XTENSIONS TO NF A LGORITHM Parallel NF If multiple queries, computation can be done in parallel. Approximate NF Cluster the nodes in to k partitions (preprocessing) Given query node q, find partition G i it belongs to Run Exact NF algorithm only on G i Set relevance = 0 for nodes not in G i

O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

A NOMALY D ETECTION A node x in V 2 is normal if  Nodes in V 1 that links to x are in same neighbourhood. e.g. V 1 V 2 V 1 V 2 low normalityhigh normality x x

A NOMALY D ETECTION A LGORITHM Input : node t in V 2, Bipartite transition matrix P, Output : Normality score(t) 1. Set S t = neighbours of t in V 1 2. RS t : Pairwise relevance scores for nodes in S t 3. Normality score ns(t) = function (RS t ) e.g. mean over non-diagonal elements in RS t

O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

D ATASETS datasets|V 1 ||V 2 ||E| Avgdeg (V 1 ) Avgdeg (V 2 ) Conference- Author (CA) K662K5105 Author- Paper (AP) 316K472K1M32 IMDB553K204K2.2M411

D O THE NEIGHBORHOODS MAKE SENSE ? relevance score most relevant neighbors The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node.

H OW ACCURATE IS THE APPROXIMATE NF? neighborhood size = 20 num of partitions = 10  Precision = fraction of overlaps between ApprNF and NF among top k neighbors  The precision drops slowly while increasing the number of partition  The precision remain high for a wide range of neighborhood size  Precision = fraction of overlaps between ApprNF and NF among top k neighbors  The precision drops slowly while increasing the number of partition  The precision remain high for a wide range of neighborhood size

D O THE ANOMALIES MAKE SENSE ? avg. normality score Injection : Inject 100 nodes in V 2 connecting k nodes each in V 1 where k = avg. degree of nodes in V 2 Nodes in V 1 are randomly picked such that degree = 10 * avg. degree of nodes in V 1 Assumption : will induce connections across neighbourhoods Injection : Inject 100 nodes in V 2 connecting k nodes each in V 1 where k = avg. degree of nodes in V 2 Nodes in V 1 are randomly picked such that degree = 10 * avg. degree of nodes in V 1 Assumption : will induce connections across neighbourhoods

W HAT ABOUT THE COMPUTATIONAL COST ?

O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

R ELATED W ORK Random walk on Graphs Page-Rank [ISDN 1998], Topic Sensitive Page-Rank [WWW 2002] Outlier detection Outlier detection in high dimensional data : Aggarwal and Yu [SIGMOD 2001] Outlier Detection Using Random Walks [ICTAI 2006] Find outlier clusters Graph partitioning : METIS package Spectral clustering methods Neighbourhoods can become personalized clusters

O UTLINE Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work

C ONCLUSIONS AND F UTURE W ORK Solution to two problems for Bipartite Graphs Neighborhood Formation (NF) Anomaly Detection (AD) Random walk with restart along with graph partitioning can be used to solve NF efficiently. AD can be done based on relevance scores generated by NF Experiments on real datasets show good results. Proximity Tracking on Time-Evolving Graphs (SIAM 2008 paper) Defines proximity scores in dynamic setting. Efficient incremental updates

T HANK YOU