Jianping Fan Dept of Computer Science UNC-Charlotte

Slides:



Advertisements
Similar presentations
Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.
Advertisements

SEEM Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.
Cluster Analysis: Basic Concepts and Algorithms
1 CSE 980: Data Mining Lecture 16: Hierarchical Clustering.
PARTITIONAL CLUSTERING
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Data Mining Classification: Alternative Techniques
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering by Passing Messages Between Data Points Brendan J. Frey and Delbert Dueck Science, 2007.
K Means Clustering , Nearest Cluster and Gaussian Mixture
A Probabilistic Framework for Semi-Supervised Clustering
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ What is Cluster Analysis? l Finding groups of objects such that the objects in a group will.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Cluster Analysis (1).
What is Cluster Analysis?
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Approximation algorithms for large-scale kernel methods Taher Dameh School of Computing Science Simon Fraser University March 29 th, 2010.
Radial Basis Function Networks
SVM by Sequential Minimal Optimization (SMO)
Machine Learning Problems Unsupervised Learning – Clustering – Density estimation – Dimensionality Reduction Supervised Learning – Classification – Regression.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Clustering.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Clustering by soft-constraint affinity propagation: applications to gene- expression data Michele Leone, Sumedha and Martin Weight Bioinformatics, 2007.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
CSE4334/5334 Data Mining Clustering. What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related)
Clustering Clustering definition: Partition a given set of objects into M groups (clusters) such that the objects of each group are ‘similar’ and ‘different’
Data Mining: Basic Cluster Analysis
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Clustering 28/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
Constrained Clustering -Semi Supervised Clustering-
Data Mining K-means Algorithm
Haim Kaplan and Uri Zwick
B. Jayalakshmi and Alok Singh 2015
Metric Learning for Clustering
Cluster Analysis: Basic Concepts and Algorithms
Jianping Fan Dept of CS UNC-Charlotte
Semi-supervised Affinity Propagation
Clustering 23/03/2016 A diák alatti jegyzetszöveget írta: Balogh Tamás Péter.
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Computer Vision Chapter 4
Clustering Wei Wang.
Support Vector Machines
Birch presented by : Bahare hajihashemi Atefeh Rahimi
“Clustering by Passing Messages Between Data Points”
Concave Minimization for Support Vector Machine Classifiers
Compact routing schemes with improved stretch
Information Theoretical Analysis of Digital Watermarking
SEEM4630 Tutorial 3 – Clustering.
Data Mining Cluster Analysis: Basic Concepts and Algorithms
Introduction to Machine learning
Clustering.
Presentation transcript:

Jianping Fan Dept of Computer Science UNC-Charlotte http://webpages.uncc.edu/jfan/

Key issues for Clustering Similarity or distance function Inter-cluster similarity or distance Intra-cluster similarity or distance Number of clusters Decision for data clustering Objective Function Inter-cluster distances are maximized Intra-cluster distances are minimized

Centers: random & density scan K: start from small K & separate; start Summary of K-means Centers: random & density scan K: start from small K & separate; start from large K and merge Outliers: Problems of K-means Centers locations Number of K Sensitive to Outliers Data Manifolds Experiences

Problems of K-MEANs Distance Function Optimization Step: Inter-cluster distances are maximized Intra-cluster distances are minimized Distance Function Geometry Distance Optimization Step: Assignment Step:

Problems of K-Means & Spectral Clustering One-way decision: center & distance make decision for clustering Data points and clusters (centers) should be in equal position! Two-way decision is expected!

Outliers outlier Inter-cluster distances are maximized Intra-cluster distances are minimized outlier

Outliers outlier How to identify outliers? Inter-cluster distances are maximized Intra-cluster distances are minimized outlier How to identify outliers?

of AP Clustering Two-way decision is used!

Affinity Propagation Clustering algorithm that works by finding a set of exemplars (prototypes) in the data and assigning other data points to the exemplars [Frey07] Input: pair-wise similarities (negative squared error), data point preferences (larger = more likely to be an exemplar) Approximate maximization of the sum of similarities to exemplars Mechanism – message passing in a factor graph

r(i,k) is initialized as s(i,k) Sending responsibility r(i,k) from data point i to data point (exemplar) k: How well-suited data point k is to serve as the exemplar for data point i

a(i,k) is initialized as 0 Sending availability a(i, k) from exemplar point k to data point i: How appropriate it would be for data point i to choose data point k as its exemplar

a(i,k) = 0

S(i,k) is the similarity function between data points i and k P(s(i,k)) is an exemplar-dependent probability model Data points with larger values of s(i,i) are more likely to be chosen as exemplar Number of clusters: (a) values of input preferences (b) message-passing procedure (competition)

Summary 1. The competitive procedure for updating responsibility and availability is data-driven and does not take into account how many other points favor each candidate exemplar; At any point during affinity propagation, availabilities and responsibilities can be combined to identify exemplars. For data point i, the value of k (data point) that maximizes a(i,k)+ r(i,k) either identifies data point i as an exemplar if k = i, or identifies the data point that is the exemplar for data point i. 3. Each iteration of AP procedure consists of (a) updating all responsibilities given the availabilities; (b) updating all availabilities given the responsibilities © combining availabilities and responsibilities to monitor the exemplar decisions and terminate the algorithm when these decisions did not change for 10 iterations.

Semi-supervised Learning Large amounts of unlabeled training data Some limited amounts of side information Partial labels Equivalence constraints Half moon data

Some Motivating examples

AP with partial labels All points sharing the same label should be in the same cluster. Points with different labels should not be in the same cluster. Imposing constraints Via the similarity matrix Explicit function nodes

Same label constraints Set similarity among all similarly labeled data to be maximal. Propagate to other points (teleportation) Without teleportation, local neighborhoods do not ‘move closer’. e.g. Klein02] S(x1,x2)=0 x1 x2 y2 y1

Different labels Can still do a similar trick and set similarity among all pair-wise differently labeled data to be minimal. But no equivalent notion of anti-teleportation. x1 x2

Adding explicit constraints to account for side-information

Adding explicit constraints to account for side-information

Problems Let’s call all the labeled points portals They induce the ability to teleport… At test time, if we want to determine a label for some new point we need to evaluate its closest exemplar, possibly via all pairs of portals - expensive. Pair-wise not-in-class nodes for each pair of differently labeled points is expensive. Introducing…

Meta-Portals An alternative way of propagating neighborhood information. Meta-portals are ‘dummy’ points, constructed using the similarities of all portals of a certain label. We add N new entries to the similarity matrix, where N is the number of unique labels.

Meta-portals mtp’s can be exemplars. Unlike regular exemplars, mtp’s can be exemplars for other points but choose a different exemplars themselves

These function nodes force the MTP’s to choose other data points as their exemplars. Similarities alone are not enough, since both MTP can choose same exemplars and still have –inf similarities.

Some toy data results