Constraint-Based Clustering. Why Constraint-Based Cluster Analysis? Need user feedback: Users know their applications the best Less parameters but more.

Slides:



Advertisements
Similar presentations
Coloring Warm-Up. A graph is 2-colorable iff it has no odd length cycles 1: If G has an odd-length cycle then G is not 2- colorable Proof: Let v 0, …,
Advertisements

O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
GRAPH BALANCING. Scheduling on Unrelated Machines J1 J2 J3 J4 J5 M1 M2 M3.
Iterative Deepening A* & Constraint Satisfaction Problems Lecture Module 6.
Design and Analysis of Algorithms Single-source shortest paths, all-pairs shortest paths Haidong Xue Summer 2012, at GSU.
NUS CS5247 Motion Planning for Camera Movements in Virtual Environments By Dennis Nieuwenhuisen and Mark H. Overmars In Proc. IEEE Int. Conf. on Robotics.
Data Structure and Algorithms (BCS 1223) GRAPH. Introduction of Graph A graph G consists of two things: 1.A set V of elements called nodes(or points or.
Evaluating Path Queries over Route Collections Panagiotis Bouros NTUA, Greece (supervised by Y. Vassiliou)
Cooperative Weakest Link Games Yoram Bachrach, Omer Lev, Shachar Lovett, Jeffrey S. Rosenschein & Morteza Zadimoghaddam CoopMAS 2013 St. Paul, Minnesota.
A Probabilistic Framework for Semi-Supervised Clustering
Spatial Clustering Methods
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
By Fernando Seoane, April 25 th, 2006 Demo for Non-Parametric Classification Euclidean Metric Classifier with Data Clustering.
Placement of Integration Points in Multi-hop Community Networks Ranveer Chandra (Cornell University) Lili Qiu, Kamal Jain and Mohammad Mahdian (Microsoft.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Segmentation Graph-Theoretic Clustering.
CS541 Advanced Networking 1 Routing and Shortest Path Algorithms Neil Tang 2/18/2009.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Chapter 4 Graphs.
The University of Hong Kong 1 Capacity Constrained Assignment in Spatial Databases Authors: Leong Hou U, University of Hong Kong Man Lung Yiu, Aalborg.
Fixed Parameter Complexity Algorithms and Networks.
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
6.1 Hamilton Circuits and Paths: Hamilton Circuits and Paths: Hamilton Path: Travels to each vertex once and only once… Hamilton Path: Travels to each.
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Andrew Nealen / Olga Sorkine / Mark Alexa / Daniel Cohen-Or SoHyeon Jeong 2007/03/02.
MAX FLOW APPLICATIONS CS302, Spring 2012 David Kauchak.
COMP Data Mining: Concepts, Algorithms, and Applications 1 K-means Arbitrarily choose k objects as the initial cluster centers Until no change,
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Cluster Analysis.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Efficient Clustering of Uncertain Data Wang Kay Ngai, Ben Kao, Chun Kit Chui, Reynold Cheng, Michael Chau, Kevin Y. Yip Speaker: Wang Kay Ngai.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
10-1 An Introduction to Systems A _______ is a set of sentences joined by the word ____ or by a ________________. Together these sentences describe a ______.
Airline Optimization Problems Constraint Technologies International
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Mr. Idrissa Y. H. Assistant Lecturer, Geography & Environment Department of Social Sciences School of Natural & Social Sciences State University of Zanzibar.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Chapter 14 Section 3 - Slide 1 Copyright © 2009 Pearson Education, Inc. AND.
Grade 11 AP Mathematics Graph Theory Definition: A graph, G, is a set of vertices v(G) = {v 1, v 2, v 3, …, v n } and edges e(G) = {v i v j where 1 ≤ i,
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Christoph F. Eick Questions Review October 12, How does post decision tree post-pruning work? What is the purpose of applying post-pruning in decision.
Solving Compound Inequalities When the word and is used, the solution includes all values that satisfy both inequalities. This is the intersection of the.
Exhaustive search Exhaustive search is simply a brute- force approach to combinatorial problems. It suggests generating each and every element of the problem.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
CSE 373: Data Structures and Algorithms Lecture 21: Graphs V 1.
COMP24111 Machine Learning K-means Clustering Ke Chen.
What Is Cluster Analysis?
Semi-Supervised Clustering
1-4 Solving Inequalities
Constrained Clustering -Semi Supervised Clustering-
One-layer neural networks Approximation problems
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Systems of Inequalities
CS223 Advanced Data Structures and Algorithms
Segmentation Graph-Theoretic Clustering.
Instructor: Shengyu Zhang
Distance and Midpoint Formulas; Circles
Floyd’s Algorithm (shortest-path problem)
Graphs and Algorithms (2MMD30)
Lecture 19 Linear Program
Graphs of Quadratic Functions Day 2
Graphical Solutions of Trigonometric Equations
All Pairs Shortest Path Examples While the illustrations which follow only show solutions from vertex A (or 1) for simplicity, students should note that.
Presentation transcript:

Constraint-Based Clustering

Why Constraint-Based Cluster Analysis? Need user feedback: Users know their applications the best Less parameters but more user-desired constraints, e.g., an ATM allocation problem: obstacle & desired clusters

A Classification of Constraints in Cluster Analysis Clustering in applications: desirable to have user-guided (i.e., constrained) cluster analysis Different constraints in cluster analysis: Constraints on individual objects (do selection first) Cluster on houses worth over $300K Constraints on distance or similarity functions Weighted functions, obstacles (e.g., rivers, lakes) Constraints on the selection of clustering parameters # of clusters, MinPts, etc. User-specified constraints Contain at least 500 valued customers and 5000 ordinary ones Semi-supervised: giving small training sets as “constraints” or hints

Clustering With Obstacle Objects K-medoids is more preferable since k-means may locate the ATM center in the middle of a lake Visibility graph and shortest path Triangulation and micro-clustering Two kinds of join indices (shortest- paths) worth pre-computation VV index: indices for any pair of obstacle vertices MV index: indices for any pair of micro-cluster and obstacle indices

An Example: Clustering With Obstacle Objects Taking obstacles into accountNot Taking obstacles into account

Clustering with User-Specified Constraints Example: Locating k delivery centers, each serving at least m valued customers and n ordinary ones Proposed approach Find an initial “solution” by partitioning the data set into k groups and satisfying user-constraints Iteratively refine the solution by micro-clustering relocation (e.g., moving δ μ-clusters from cluster C i to C j ) and “deadlock” handling (break the microclusters when necessary) Efficiency is improved by micro-clustering How to handle more complicated constraints? E.g., having approximately same number of valued customers in each cluster?! — Can you solve it?