CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

Beyond Streams and Graphs: Dynamic Tensor Analysis
Multilinear Algebra for Analyzing Data with Multiple Linkages
CMU SCS : Multimedia Databases and Data Mining Lecture #20: SVD - part III (more case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
Multilinear Algebra for Analyzing Data with Multiple Linkages Tamara G. Kolda plus: Brett Bader, Danny Dunlavy, Philip Kegelmeyer Sandia National Labs.
CMU SCS : Multimedia Databases and Data Mining Lecture #21: Tensor decompositions C. Faloutsos.
CMU SCS Copyright: C. Faloutsos (2012)# : Multimedia Databases and Data Mining Lecture #27: Graph mining - Communities and a paradox Christos.
15-826: Multimedia Databases and Data Mining
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.
SCS CMU Joint Work by Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip S. Yu, Christos Faloutsos Speaker: Hanghang Tong Aug , 2008, Las Vegas.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
Multimedia Databases SVD II. Optimality of SVD Def: The Frobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005
Multimedia Databases SVD II. SVD - Detailed outline Motivation Definition - properties Interpretation Complexity Case studies SVD properties More case.
SCS CMU Proximity Tracking on Time- Evolving Bipartite Graphs Speaker: Hanghang Tong Joint Work with Spiros Papadimitriou, Philip S. Yu, Christos Faloutsos.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases LSI and SVD. Text - Detailed outline text problem full text scanning inversion signature files clustering information filtering and.
Fast Random Walk with Restart and Its Applications
Singular Value Decomposition and Data Management
SCS CMU Joint Work by Hanghang Tong, Yasushi Sakurai, Tina Eliassi-Rad, Christos Faloutsos Speaker: Hanghang Tong Oct , 2008, Napa, CA CIKM 2008.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Spectral Graph Theory (Basics)
CMU SCS Large Graph Mining: Power Tools and a Practitioner’s Guide Christos Faloutsos Gary Miller Charalampos (Babis) Tsourakakis CMU.
CMU SCS Graph Analytics wkshpC. Faloutsos (CMU) 1 Graph Analytics Workshop: Tools Christos Faloutsos CMU.
1 Information Retrieval through Various Approximate Matrix Decompositions Kathryn Linehan Advisor: Dr. Dianne O’Leary.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
Introduction to tensor, tensor factorization and its applications
SAND C 1/17 Coupled Matrix Factorizations using Optimization Daniel M. Dunlavy, Tamara G. Kolda, Evrim Acar Sandia National Laboratories SIAM Conference.
CMU SCS U Kang (CMU) 1KDD 2012 GigaTensor: Scaling Tensor Analysis Up By 100 Times – Algorithms and Discoveries U Kang Christos Faloutsos School of Computer.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Talk 3: Graph Mining Tools – Tensors, communities, parallelism Christos Faloutsos CMU.
CMU SCS Talk 3: Graph Mining Tools – Tensors, communities, parallelism Christos Faloutsos CMU.
Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan ICDM 2006 Dec , HongKong.
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
ParCube: Sparse Parallelizable Tensor Decompositions
Incremental Pattern Discovery on Streams, Graphs and Tensors Jimeng Sun Ph.D.Thesis Proposal May 15, 2006.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
Kijung Shin Jinhong Jung Lee Sael U Kang
Efficient Semi-supervised Spectral Co-clustering with Constraints
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
SCS CMU Speaker Hanghang Tong Colibri: Fast Mining of Large Static and Dynamic Graphs Speaking Skill Requirement.
Multidimensional Network Analysis Foundations of multidimensional Network Analysis, Berlingerio, Coscia, Giannotti, Monreale, Pedreschi. WWW Journal 2012.
1 / 24 Distributed Methods for High-dimensional and Large-scale Tensor Factorization Kijung Shin (Seoul National University) and U Kang (KAIST)
A Peta-Scale Graph Mining System
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
Zhu Han University of Houston Thanks for Dr. Hung Nguyen’s Slides
Jure Leskovec and Christos Faloutsos Machine Learning Department
Graph and Tensor Mining for fun and profit
LSI, SVD and Data Management
Large Graph Mining: Power Tools and a Practitioner’s guide
Large Graph Mining: Power Tools and a Practitioner’s guide
CIS 5590: Large-Scale Matrix Decomposition Tensors and Applications
Jimeng Sun · Charalampos (Babis) E
Jinhong Jung, Woojung Jin, Lee Sael, U Kang, ICDM ‘16
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
15-826: Multimedia Databases and Data Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Junghoo “John” Cho UCLA
Presentation transcript:

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos, Miller, Tsourakakis CMU

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-2 Outline Introduction – Motivation Task 1: Node importance Task 2: Community detection Task 3: Recommendations Task 4: Connection sub-graphs Task 5: Mining graphs over time Task 6: Virus/influence propagation Task 7: Spectral graph theory Task 8: Tera/peta graph mining: hadoop Observations – patterns of real graphs Conclusions

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-3 Thanks to Tamara Kolda (Sandia) for the foils on tensor definitions, and on TOPHITS

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-4 Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-5 Examples of Matrices: Authors and terms dataminingclassif.tree... John Peter Mary Nick...

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-6 Motivation: Why tensors? Q: what is a tensor?

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-7 Motivation: Why tensors? A: N-D generalization of matrix: dataminingclassif.tree... John Peter Mary Nick... KDD’09

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-8 Motivation: Why tensors? A: N-D generalization of matrix: dataminingclassif.tree... John Peter Mary Nick... KDD’08 KDD’07 KDD’09

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-9 Tensors are useful for 3 or more modes Terminology: ‘mode’ (or ‘aspect’): dataminingclassif.tree... Mode (== aspect) #1 Mode#2 Mode#3

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-10 Notice 3 rd mode does not need to be time we can have more than 3 modes... IP destination Dest. port IP source

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-11 Notice 3 rd mode does not need to be time we can have more than 3 modes –Eg, fFMRI: x,y,z, time, person-id, task-id From DENLAB, Temple U. (Prof. V. Megalooikonomou +)

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-12 Motivating Applications Why tensors are useful? –web mining (TOPHITS) –environmental sensors –Intrusion detection (src, dst, time, dest-port) –Social networks (src, dst, time, type-of-contact) –face recognition –etc …

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-13 Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-14 Tensor basics Multi-mode extensions of SVD – recall that:

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-15 Reminder: SVD –Best rank-k approximation in L2 A m n  m n U VTVT 

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-16 Reminder: SVD –Best rank-k approximation in L2 A m n  + 1u1v11u1v1 2u2v22u2v2

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-17 Goal: extension to >=3 modes ¼ I x R K x R A B J x R C R x R x R I x J x K +…+=

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-18 Main points: 2 major types of tensor decompositions: PARAFAC and Tucker both can be solved with ``alternating least squares’’ (ALS)

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-19 = U I x R V J x R W K x R R x R x R Specially Structured Tensors Tucker TensorKruskal Tensor I x J x K = U I x R V J x S W K x T R x S x T I x J x K Our Notation +…+= u1u1 uRuR v1v1 w1w1 vRvR wRwR “core”

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-20 Tucker Decomposition - intuition I x J x K ¼ A I x R B J x S C K x T R x S x T author x keyword x conference A: author x author-group B: keyword x keyword-group C: conf. x conf-group G : how groups relate to each other

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-21 Intuition behind core tensor 2-d case: co-clustering [Dhillon et al. Information-Theoretic Co- clustering, KDD’03]

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-22 m m n nl k k l eg, terms x documents

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-23 term x term-group doc x doc group term group x doc. group med. terms cs terms common terms med. doc cs doc

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-24 Tensor tools - summary Two main tools –PARAFAC –Tucker Both find row-, column-, tube-groups –but in PARAFAC the three groups are identical ( To solve: Alternating Least Squares )

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-25 Detailed outline Motivation Definitions: PARAFAC and Tucker Case study: web mining

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-26 Web graph mining How to order the importance of web pages? –Kleinberg’s algorithm HITS –PageRank –Tensor extension on HITS (TOPHITS)

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-27 Kleinberg’s Hubs and Authorities (the HITS method) Sparse adjacency matrix and its SVD: authority scores for 1 st topic hub scores for 1 st topic hub scores for 2 nd topic authority scores for 2 nd topic from to Kleinberg, JACM, 1999

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-28 authority scores for 1 st topic hub scores for 1 st topic hub scores for 2 nd topic authority scores for 2 nd topic from to HITS Authorities on Sample Data We started our crawl from and crawled 4700 pages, resulting in 560 cross-linked hosts.

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-29 Three-Dimensional View of the Web Observe that this tensor is very sparse! Kolda, Bader, Kenny, ICDM05

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-30 Three-Dimensional View of the Web Observe that this tensor is very sparse! Kolda, Bader, Kenny, ICDM05

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-31 Three-Dimensional View of the Web Observe that this tensor is very sparse! Kolda, Bader, Kenny, ICDM05

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-32 Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. authority scores for 1 st topic hub scores for 1 st topic hub scores for 2 nd topic authority scores for 2 nd topic from to term term scores for 1 st topic term scores for 2 nd topic

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-33 Topical HITS (TOPHITS) Main Idea: Extend the idea behind the HITS model to incorporate term (i.e., topical) information. authority scores for 1 st topic hub scores for 1 st topic hub scores for 2 nd topic authority scores for 2 nd topic from to term term scores for 1 st topic term scores for 2 nd topic

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-34 TOPHITS Terms & Authorities on Sample Data TOPHITS uses 3D analysis to find the dominant groupings of web pages and terms. authority scores for 1 st topic hub scores for 1 st topic hub scores for 2 nd topic authority scores for 2 nd topic from to term term scores for 1 st topic term scores for 2 nd topic Tensor PARAFAC w k = # unique links using term k

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-35 Conclusions Real data are often in high dimensions with multiple aspects (modes) Tensors provide elegant theory and algorithms –PARAFAC and Tucker: discover groups

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-36 References T. G. Kolda, B. W. Bader and J. P. Kenny. Higher-Order Web Link Analysis Using Multilinear Algebra. In: ICDM 2005, Pages , November Jimeng Sun, Spiros Papadimitriou, Philip Yu. Window-based Tensor Analysis on High- dimensional and Multi-aspect Streams, Proc. of the Int. Conf. on Data Mining (ICDM), Hong Kong, China, Dec 2006

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-37 Resources See tutorial on tensors, KDD’07 (w/ Tamara Kolda and Jimeng Sun):

CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-38 Tensor tools - resources Toolbox: from Tamara Kolda: csmr.ca.sandia.gov/~tgkolda/TensorToolbox 2-38 Copyright: Faloutsos, Tong (2009) 2-38 ICDE’09 T. G. Kolda and B. W. Bader. Tensor Decompositions and Applications. SIAM Review, Volume 51, Number 3, September 2009 csmr.ca.sandia.gov/~tgkolda/pubs/bibtgkfiles/TensorReview-preprint.pdf T. Kolda and J. Sun: Scalable Tensor Decomposition for Multi-Aspect Data Mining (ICDM 2008)