CMU SCS Yahoo/Hadoop, 2008#1 Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec,

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

On the Vulnerability of Large Graphs
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
The first part of today’s presentation is largely stolen from Ricky Ho’s.
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms.
CMU SCS : Multimedia Databases and Data Mining Extra: intro to hadoop C. Faloutsos.
The Connectivity and Fault-Tolerance of the Internet Topology
Peer-to-Peer and Social Networks Centrality measures.
Relationship Mining Network Analysis Week 5 Video 5.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
GraphChi: Big Data – small machine
Graphs (Part I) Shannon Quinn (with thanks to William Cohen of CMU and Jure Leskovec, Anand Rajaraman, and Jeff Ullman of Stanford University)
Modeling Real Graphs using Kronecker Multiplication
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS KDD 2006Leskovec & Faloutsos1 ??. CMU SCS KDD 2006Leskovec & Faloutsos2 Sampling from Large Graphs poster# 305 Jurij (Jure) Leskovec Christos.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
More Graph Algorithms Weiss ch Exercise: MST idea from yesterday Alternative minimum spanning tree algorithm idea Idea: Look at smallest edge not.
MapReduce on Matlab By: Erum Afzal.
A Lightweight Infrastructure for Graph Analytics Donald Nguyen Andrew Lenharth and Keshav Pingali The University of Texas at Austin.
Leveraging Big Data: Lecture 11 Instructors: Edith Cohen Amos Fiat Haim Kaplan Tova Milo.
Social Media Mining Graph Essentials.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
FAST COUNTING OF TRIANGLES IN LARGE NETWORKS: ALGORITHMS AND LAWS RPI Theory Seminar, 24 November 2008 Charalampos (Babis) Tsourakakis School of Computer.
Mining Social Network Graphs Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
Social Network Analysis (1) LING 575 Fei Xia 01/04/2011.
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
Finding Top-k Shortest Path Distance Changes in an Evolutionary Network SSTD th August 2011 Manish Gupta UIUC Charu Aggarwal IBM Jiawei Han UIUC.
Record Linkage in a Distributed Environment
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
Class 2: Graph Theory IST402.
Spanning Tree Definition:A tree T is a spanning tree of a graph G if T is a subgraph of G that contains all of the vertices of G. A graph may have more.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
GRAPH AND LINK MINING 1. Graphs - Basics 2 Undirected Graphs Undirected Graph: The edges are undirected pairs – they can be traversed in any direction.
A Peta-Scale Graph Mining System
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
NetMine: Mining Tools for Large Graphs
湖南大学-信息科学与工程学院-计算机与科学系
Part 1: Graph Mining – patterns
Apache Spark & Complex Network
R-MAT: A Recursive Model for Graph Mining
Why Social Graphs Are Different Communities Finding Triangles
Graph and Tensor Mining for fun and profit
CS224w: Social and Information Network Analysis
Algorithms for Large Graph Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
Presentation transcript:

CMU SCS Yahoo/Hadoop, 2008#1 Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec, Jure Kang, U

CMU SCS Yahoo/Hadoop, Our goal: One-stop solution for mining huge graphs

CMU SCS Yahoo/Hadoop, CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)

CMU SCS Yahoo/Hadoop, machines - 8min Degree Distributions - NetFlix Movie in-degree count

CMU SCS Yahoo/Hadoop, machines - 8min Degree Distributions - NetFlix Movie in-degree count Theoretically expected

CMU SCS Yahoo/Hadoop, machines - 8min Degree Distributions - NetFlix User out-degree count

CMU SCS Yahoo/Hadoop, machines - 8min Degree Distributions - NetFlix User out-degree count Theoretically expected Sharp drop below 100 ratings

CMU SCS Yahoo/Hadoop, Nodes:259M - Edges: 1B 100 machines - 6h Degree Distributions - Kronecker degree count

CMU SCS Yahoo/Hadoop, Degree Distributions - timings Edge file size (MB) Time (sec) 1 task 24 tasks 48 tasks

CMU SCS Yahoo/Hadoop, CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)

CMU SCS Yahoo/Hadoop, Diameter of a graph Maximum shortest path Normally, > O(N**2) ANF : `Approximate Neighborhood function’ [Palmer+02]: O(E) Goal : calculate neighborhood function Neighborhood N(h) : number of pairs of nodes within distance h Diameter

CMU SCS Yahoo/Hadoop, For large jobs, parallelization helps Unstable results due to shared machines Diameter Edge file (MB) Time (min) 1 node 48 nodes 28 nodes

CMU SCS Yahoo/Hadoop, Diameter / Hop Plot (Netflix) h: # of hops # of reachable pairs within <= h hops

CMU SCS Yahoo/Hadoop, Diameter / Hop Plot (Netflix) h: # of hops # of reachable pairs within <= h hops Diameter: 3

CMU SCS Yahoo/Hadoop, CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)

CMU SCS Yahoo/Hadoop, Community detection Cross associations [Chakrabarti+ ’04]

CMU SCS Yahoo/Hadoop, Community detection

CMU SCS Yahoo/Hadoop, CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)

CMU SCS Yahoo/Hadoop, Triangles ‘friends of friends are friends’

CMU SCS Yahoo/Hadoop, Triangles ‘friends of friends are friends’

CMU SCS Yahoo/Hadoop, Triangles ‘friends of friends are friends’ Naïve algo: 3-way join (slow) [Tsourakakis’08]: # triangles ~ sum of cubes of eigenvalues Thus, super-fast computation of #triangles (100x - 25,000x faster than naïve; >95% accuracy

CMU SCS Yahoo/Hadoop, Triangles Easy to implement on hadoop: it only needs eigenvalues (to do, with Lanczos)

CMU SCS Yahoo/Hadoop, CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)

CMU SCS Yahoo/Hadoop, Visualization Principled visualization of large graphs (show few most `important’ edges)

CMU SCS Yahoo/Hadoop, CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Summary Goal: one-stop solution for mining huge graphs