1 CSC 463 Fall 2010 Dr. Adam P. Anthony Class #27.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
An Introduction of Support Vector Machine
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
An Introduction of Support Vector Machine
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
Support Vector Machines
SVM—Support Vector Machines
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Discriminative and generative methods for bags of features
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Support Vector Machines
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Clustering Unsupervised learning Generating “classes”
An Introduction to Support Vector Machines Martin Law.
Classification III Tamara Berg CS Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Based on: The Nature of Statistical Learning Theory by V. Vapnick 2009 Presentation by John DiMona and some slides based on lectures given by Professor.
Support Vector Machine & Image Classification Applications
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Copyright © 2001, Andrew W. Moore Support Vector Machines Andrew W. Moore Associate Professor School of Computer Science Carnegie Mellon University.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
Introduction to SVMs. SVMs Geometric –Maximizing Margin Kernel Methods –Making nonlinear decision boundaries linear –Efficiently! Capacity –Structural.
Data Mining Volinsky - Columbia University Topic 9: Advanced Classification Neural Networks Support Vector Machines 1 Credits: Shawndra Hill Andrew.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines 2 (SVMs)
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
1 CMSC 671 Fall 2010 Class #24 – Wednesday, November 24.
1 Support Vector Machines Chapter Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School.
1 Support Vector Machines. Why SVM? Very popular machine learning technique –Became popular in the late 90s (Vapnik 1995; 1998) –Invented in the late.
1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)
Machine Learning Lecture 7: SVM Moshe Koppel Slides adapted from Andrew Moore Copyright © 2001, 2003, Andrew W. Moore.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Classification - SVM CS 685: Special Topics in Data Mining Jinze Liu.
CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Support Vector Machines Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
1 Support Vector Machines Some slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here:
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Support Vector Machines Louis Oliphant Cs540 section 2.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Chapter 18.9 and the paper “Support vector machines” by M. Hearst, ed., 1998 Acknowledgments: These slides combine and modify ones.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support Vector Machines
Support Vector Machines
Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines
Machine Learning Week 2.
Support Vector Machines
CS 485: Special Topics in Data Mining Jinze Liu
Class #212 – Thursday, November 12
Support Vector Machines
Support Vector Machines
Presentation transcript:

1 CSC 463 Fall 2010 Dr. Adam P. Anthony Class #27

2 Machine Learning III Chapter 18.9

3 Today’s class Support vector machines Clustering (unsupervised learning)

Support Vector Machines 4 These SVM slides were borrowed from Andrew Moore’s PowetPoint slides on SVMs. Andrew’s PowerPoint repository is here: Comments and corrections gratefully received.

Methods For Classification Decision Trees –Model-based data structure, works best with discrete data –For a new instance, choose label C based on rules laid out by tree Probabilistic Classifiers –Model-based as well, works with any type of data –For a new instance, choose label C that maximizes P([f 1 …f n,C] | Data) K-Nearest Neighbor –Instance-based –For new instance, choose label based on the majority vote of k nearest points in Data Boundary-Based Classifiers (NEW!) –Model-based, only works with continuous data –Establish a numerical function that acts as a fence between positive, negative examples 5

Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 Classifier: Given values for x 1,x 2 : If formula above > 0 then point is above line If formula < 0 then point is below line f(x,w,b) = sign(w. x + b) Line x 2 = mx 1 + b OR: w 1 x 1 - w 2 x 2 + b’ = 0 where m = w 1 /w 2 and b = b’/w 2

Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x + b) How would you classify this data?

Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) How would you classify this data?

Copyright © 2001, 2003, Andrew W. Moore Linear Classifiers f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Any of these would be fine....but which is best?

Copyright © 2001, 2003, Andrew W. Moore Classifier Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM

Copyright © 2001, 2003, Andrew W. Moore Maximum Margin f x  y est denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM

Copyright © 2001, 2003, Andrew W. Moore Why Maximum Margin? denotes +1 denotes -1 f(x,w,b) = sign(w. x - b) The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against 1.Intuitively this feels safest. 2.If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. 3.LOOCV is easy since the model is immune to removal of any non-support- vector datapoints. 4.There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. 5.Empirically it works very very well.

Copyright © 2001, 2003, Andrew W. Moore Specifying a line and margin How do we represent this mathematically? …in m input dimensions? Plus-Plane Minus-Plane Classifier Boundary “Predict Class = +1” zone “Predict Class = -1” zone

Copyright © 2001, 2003, Andrew W. Moore Specifying a line and margin Plus-plane = { x : w. x + b = +1 } Minus-plane = { x : w. x + b = -1 } Plus-Plane Minus-Plane Classifier Boundary “Predict Class = +1” zone “Predict Class = -1” zone Classify as..+1ifw. x + b >= 1 ifw. x + b <= -1 Universe explodes if-1 < w. x + b < 1 wx+b=1 wx+b=0 wx+b=-1

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width Plus-plane = { x : w. x + b = +1 } Minus-plane = { x : w. x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b?

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width Plus-plane = { x : w. x + b = +1 } Minus-plane = { x : w. x + b = -1 } Claim: The vector w is perpendicular to the Plus Plane. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? Let u and v be two vectors on the Plus Plane. What is w. ( u – v ) ? And so of course the vector w is also perpendicular to the Minus Plane

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width Plus-plane = { x : w. x + b = +1 } Minus-plane = { x : w. x + b = -1 } The vector w is perpendicular to the Plus Plane Let x - be any point on the minus plane Let x + be the closest plus-plane-point to x -. “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? x-x- x+x+ Any location in  m : not necessarily a datapoint Any location in R m : not necessarily a datapoint

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width Plus-plane = { x : w. x + b = +1 } Minus-plane = { x : w. x + b = -1 } The vector w is perpendicular to the Plus Plane Let x - be any point on the minus plane Let x + be the closest plus-plane-point to x -. Claim: x + = x - + w for some value of. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? x-x- x+x+

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width Plus-plane = { x : w. x + b = +1 } Minus-plane = { x : w. x + b = -1 } The vector w is perpendicular to the Plus Plane Let x - be any point on the minus plane Let x + be the closest plus-plane-point to x -. Claim: x + = x - + w for some value of. Why? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width How do we compute M in terms of w and b? x-x- x+x+ The line from x - to x + is perpendicular to the planes. So to get from x - to x + travel some distance in direction w.

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width What we know: w. x + + b = +1 w. x - + b = -1 x + = x - + w |x + - x - | = M It’s now easy to get M in terms of w and b “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width x-x- x+x+

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width What we know: w. x + + b = +1 w. x - + b = -1 x + = x - + w |x + - x - | = M It’s now easy to get M in terms of w and b “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width w. (x - + w) + b = 1 => w. x - + b + w.w = 1 => -1 + w.w = 1 => x-x- x+x+

Copyright © 2001, 2003, Andrew W. Moore Computing the margin width What we know: w. x + + b = +1 w. x - + b = -1 x + = x - + w |x + - x - | = M “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width = M = |x + - x - | =| w |= x-x- x+x+

Copyright © 2001, 2003, Andrew W. Moore Learning the Maximum Margin Classifier Given a guess of w and b we can Compute whether all data points in the correct half-planes Compute the width of the margin So now we just need to write a program to search the space of w’s and b’s to find the widest margin that matches all the datapoints. How? Gradient descent? Simulated Annealing? Matrix Inversion? EM? Newton’s Method? “Predict Class = +1” zone “Predict Class = -1” zone wx+b=1 wx+b=0 wx+b=-1 M = Margin Width = x-x- x+x+

Learning SVMs Trick #1: Just find the points that would be closest to the optimal separating plane (the “support vectors”) and work directly from those instances. Trick #2: Represent as a quadratic optimization problem, and use quadratic programming techniques. Trick #3 (the “kernel trick”): –Instead of just using the features, represent the data using a high- dimensional feature space constructed from a set of basis functions (polynomial and Gaussian combinations of the base features are the most common). –Then find a separating plane / SVM in that high-dimensional space –Voila: A nonlinear classifier! 27

Copyright © 2001, 2003, Andrew W. Moore Common SVM basis functions z k = ( polynomial terms of x k of degree 1 to q ) z k = ( radial basis functions of x k ) z k = ( sigmoid functions of x k )

Copyright © 2001, 2003, Andrew W. Moore SVM Performance Anecdotally they work very very well indeed. Example: They are currently the best-known classifier on a well- studied hand-written-character recognition benchmark Another Example: Andrew knows several reliable people doing practical real-world work who claim that SVMs have saved them when their other favorite classifiers did poorly. There is a lot of excitement and religious fervor about SVMs as of 2001.

Unsupervised Learning: Clustering 30

Unsupervised Learning Learn without a “supervisor” who labels instances –Clustering –Scientific discovery –Pattern discovery –Associative learning Clustering: –Given a set of instances without labels, partition them such that each instance is: similar to other instances in its partition (inter-cluster similarity) dissimilar from instances in other partitions (intra-cluster dissimilarity) 31

Clustering Techniques Partitional clustering –k-means clustering Agglomerative clustering –Single-link clustering –Complete-link clustering –Average-link clustering Spectral clustering 32

33 Formal Data Clustering Data clustering is: –Dividing a set of data objects into groups such that there is a clear pattern (e.g. similarity to each other) for why objects are in the same cluster A clustering algorithm requires: –A data set D –A clustering description C –A clustering objective Obj(C) –An optimization method Opt(D) ~ C Obj measures the goodness of the best clustering C that Opt(D) can find

What does D look like? Training Set 6.3,2.5,5.0,1.9,Iris-virginica 6.5,3.0,5.2,2.0,Iris-virginica 6.2,3.4,5.4,2.3,Iris-virginica 5.9,3.0,5.1,1.8,Iris-virginica 5.7,3.0,4.2,1.2,Iris-versicolor 5.7,2.9,4.2,1.3,Iris-versicolor 6.2,2.9,4.3,1.3,Iris-versicolor 5.1,2.5,3.0,1.1,Iris-versicolor 5.1,3.4,1.5,0.2,Iris-setosa 5.0,3.5,1.3,0.3,Iris-setosa 4.5,2.3,1.3,0.3,Iris-setosa 4.4,3.2,1.3,0.2,Iris-setosa Test Set 5.1,3.5,1.4,0.2,?? 4.9,3.0,1.4,0.2,?? 4.7,3.2,1.3,0.2,?? 4.6,3.1,1.5,0.2,?? 5.0,3.6,1.4,0.2,?? 5.4,3.9,1.7,0.4,?? 4.6,3.4,1.4,0.3,?? 5.0,3.4,1.5,0.2,?? 4.4,2.9,1.4,0.2,?? 4.9,3.1,1.5,0.1,?? 5.4,3.7,1.5,0.2,?? 4.8,3.4,1.6,0.2,?? 34 Supervised learning (KNN, C.45, SVM, etc.)

What does D look like? Training Set 6.3,2.5,5.0,1.9,?? 6.5,3.0,5.2,2.0,?? 6.2,3.4,5.4,2.3,?? 5.9,3.0,5.1,1.8,?? 5.7,3.0,4.2,1.2,?? 5.7,2.9,4.2,1.3,?? 6.2,2.9,4.3,1.3,?? 5.1,2.5,3.0,1.1,?? 5.1,3.4,1.5,0.2,?? 5.0,3.5,1.3,0.3,?? 4.5,2.3,1.3,0.3,?? 4.4,3.2,1.3,0.2,?? Test Set 5.1,3.5,1.4,0.2,?? 4.9,3.0,1.4,0.2,?? 4.7,3.2,1.3,0.2,?? 4.6,3.1,1.5,0.2,?? 5.0,3.6,1.4,0.2,?? 5.4,3.9,1.7,0.4,?? 4.6,3.4,1.4,0.3,?? 5.0,3.4,1.5,0.2,?? 4.4,2.9,1.4,0.2,?? 4.9,3.1,1.5,0.1,?? 5.4,3.7,1.5,0.2,?? 4.8,3.4,1.6,0.2,?? 35 Un-supervised learning (Clustering!)

What does C look like? After clustering, the output looks like a ‘labeled’ data set for a supervised learning algorithm: –6.3,2.5,5.0,1.9,1 6.5,3.0,5.2,2.0,1 6.2,3.4,5.4,2.3,1 5.9,3.0,5.1,1.8,1 5.7,3.0,4.2,1.2,2 5.7,2.9,4.2,1.3,2 6.2,2.9,4.3,1.3,2 5.1,2.5,3.0,1.1,2 5.1,3.4,1.5,0.2,3 5.0,3.5,1.3,0.3,3 4.5,2.3,1.3,0.3,3 4.4,3.2,1.3,0.2, Clustering Vector

Big Questions About Clustering How do we even begin clustering? How do we know we’ve found anything? How do we know if what we found is even useful? –How to evaluate the results? What do we apply this to? –What’s the truth, versus the hope, of reality? 37

38 K-Means Clustering D = numeric d-dimensional data C = partitioning of data points into k clusters Obj(C) = Root Mean Squared Error (RMSE) –Average distance between each object and its cluster’s mean value Optimization Method 1.Select k random objects as the initial means 2.While the current clustering is different from the previous: 1.Move each object to the cluster with the closest mean 2.Re-compute the cluster means

39 K-Means Demo

K-Means Comments K-means has some randomness in its initialization, which means: –Two different executions on the same data, same number of clusters will likely have different results –Two different executions may have very different run-times due to the convergence test In practice, run multiple times and take result with the best RMSE 40

41 ___-Link Clustering 1.Initialize each object in its own cluster 2.Compute the cluster distance matrix M by the selected criterion (below) 3.While there is more than k clusters: 1.Join the clusters with the shortest distance 2.Update M by the selected criterion Criterion for ___-link clustering –Single-link: use the distance of the closest objects between two clusters –Complete-link: use the distance of the most distant objects between the two clusters

42 ___-Link Demo How can we measure the distance between these clusters? What is best for: –Spherical data (above)? –Chain-like data?  Single-Link Distance Complete-Link Distance

___-Link Comments The –Link algorithms are not random in any way, which means: –You’ll get the same results whenever you use the same data and same number of clusters Choosing between these algorithms, and K-means (or any other clustering algorithm) requires lots of research, and careful analysis 43

44 My Research: Relational Data Clustering

45 The task of organizing objects into logical groups, or clusters, taking into account the relational links between objects Relational Data Clustering is:

46 Relational Data Formally: –A set of object domains –Sets of instances from those domains –Sets of relational tuples, or links between instances In Practice: –“Relational data” refers only to data that necessitates the use of links –Information not encoded using a relation is referred to as an attribute Spaces: –Attribute space = Ignore relations –Relation space = Ignore attributes People NameGender SallyF FredM JoeM Friends SallyFred Joe {Sally,F}{Joe,M} {Fred,M}

What does D Look Like Now? Nodes + Edges (pointers!!!): Adjacency Matrix: Aggregation Methods: –AverageAgeOfNeighbors, DominantGenderOfNeighbors, AvgSalaryOfNeighbors –Leads to a non-relational space –Clustered using methods previously discussed 47 Implementation Representation Conceptual Representation

48 Block Models A block model is a partitioning of the links in a relation –Reorder the rows and columns of an adjacency matrix by cluster label, place boundaries between clusters Block b ij : Set of edges from cluster i to cluster j (also referred to as a block position for a single link) If some are dense, and the rest are sparse, we can generate a summary graph Block modeling is useful for both visualization and numerical analysis

49 Two Relational Clustering Algorithms Community Detection Maximizes connectivity within clusters and minimizes connectivity between clusters Intuitive concept that links identify classes Equivalent to maximizing density only on the diagonal blocks Faster than more general relational clustering approaches Stochatic Block Modeling Maximizes the likelihood that two objects in the same cluster have the same linkage pattern –Linkage may be within, or between clusters Subsumes community detection Equivalent to maximizing density in any block, rather than just the diagonal Generalizes relational clustering

50 My Work: Block Modularity General block-model-based clustering approach Models relations only Motivated by poor scalability of stochastic block modeling –Would be useful to have a block modeling approach that scales as well as community detection algorithms Contributions: –A clearly defined measure of general relational structure (block modularity) –An Iterative clustering algorithm that is much faster than prior works

51 Relational Structure What is “structure” –High level: non-randomness –Relational structure: non-random connectivity pattern A relation is structured if its observed connectivity pattern is clearly distinguished from that of a random relation

52 Approach Overview Assume that there exists a “model” random relation: In contrast, for any non-random relation: –There should exist at least one clustering that distinguishes this relation from the random block model: Random Clustering Structure- Identifying Clustering Any clustering of this relation will have a similar block model Structure-Based Clustering Requires: 1.Means of comparing relational structures 2.Definition of a “model” random relation 3.Method for finding the most structure identifying clustering

53 Comparing Structure: Block Modularity Given an input relation, a model random relation*, and a structure- identifying clustering, we compute block modularity: 1.Find the block model for each relation: 2.Compute the absolute difference of the number of links in each block: 3.Compute the sum of all the cells in the difference matrix: (Optional) Normalize value by twice the number of links: Input Relation Model Random Relation *Required: the model random relation should have the same number of links as the input relation

54 Finding a Structure-Identifying Clustering ( Or, Clustering With Block Modularity ) Referred to as BMOD for brevity

55 Experimental Evaluation Work-in-Progress Past Evaluation: Comparing with small, manageable data sets to evaluate increase in speed New Ideas: –Non-Block-Modeling algorithm is a current popular approach Is BMOD faster than it? If not, how much slower? –SCALING UP Demonstrated speed on “small” data sets –~3000 nodes, 4000 edges How would we do on, say, Facebook? –500 M nodes, given avg. 100 friends per node, 5 B edges –Challenges: Can’t download Facebook or any data source that is comparable How to generate a ‘realistic’ artificial data set that has similar features as FB? –Anyone want to help???

56 Block Modularity Clustering Results

Methodology 57 Goals: assess speed, accuracy of block modularity vs. leading stochastic method –Degree-Corrected Stochastic Block Model (DCBM) (Karrer & Newman, 2011) Accuracy: Normalized Mutual Information Data: Generated using DCBM (next slide)

Data Generation 58 Given a degree distribution, and parameters for DCBM, provide a block-model configuration matrix: Mix perfect model with a random graph model:

Results 59

Stress Test: Mock Facebook 60 Sampled degree distribution from subset of 100K Facebook users with 8M edges (Gjoka et. al, 2010) Planted an artificial cluster structure –Repeated bridges for 1000 total clusters

Future Work ’s of clusters: getting nowhere fast? –Post-analysis and applications –Information Propagation Map/Reduce Implementation

62 Conclusion Fast and effective when compared to stochastic block modeling Iterative, and requires some basic counting mechanisms –Much simpler and less error-prone than implementing a stochastic algorithm –Fewer mathematical prerequisites makes the algorithm accessible to more programmers A measure of structure, not just an identifier, and its value can be used for other applications