Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:

Slides:



Advertisements
Similar presentations
Clustering.
Advertisements

Cluster Analysis: Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
Clustering Beyond K-means
Data Mining Classification: Alternative Techniques
Machine Learning and Data Mining Clustering
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Basic Data Mining Techniques
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
What is Cluster Analysis?
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
CSE 185 Introduction to Computer Vision Pattern Recognition.
DATA MINING CLUSTERING K-Means.
Algorithms: The Basic Methods Witten – Chapter 4 Charles Tappert Professor of Computer Science School of CSIS, Pace University.
CogNova Technologies 1 COMP3503 Automated Discovery and Clustering Methods COMP3503 Automated Discovery and Clustering Methods Daniel L. Silver.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Analysis of Massive Data Sets Prof. dr. sc. Siniša Srbljić Doc. dr. sc. Dejan Škvorc Doc. dr. sc. Ante Đerek Faculty of Electrical Engineering and Computing.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Data Mining Practical Machine Learning Tools and Techniques
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
Data Science Algorithms: The Basic Methods
Data Science Algorithms: The Basic Methods
K Nearest Neighbor Classification
Revision (Part II) Ke Chen
Clustering.
Revision (Part II) Ke Chen
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Data Mining CSCI 307, Spring 2019 Lecture 24
Machine Learning and Data Mining Clustering
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH: Data Mining, Chapter 4.8

Rodney Nielsen, Human Intelligence & Language Technologies Lab Algorithms: The Basic Methods Inferring rudimentary rules Naïve Bayes, probabilistic model Constructing decision trees Constructing rules Association rule learning Linear models Instance-based learning Clustering

Rodney Nielsen, Human Intelligence & Language Technologies Lab Clustering Clustering techniques apply when there is no class to be predicted Aim: divide instances into “natural” groups Clusters can be: Disjoint OR Overlapping Deterministic OR Probabilistic Flat OR Hierarchical Classic clustering algorithm: k-means k-means clusters are disjoint, deterministic, and flat

Rodney Nielsen, Human Intelligence & Language Technologies Lab Unsupervised Learning a a b b c c d d e e f f

Rodney Nielsen, Human Intelligence & Language Technologies Lab Hierarchical Agglomerative Clustering a a b b c c d d e e f f a a b b c c d d e e f f bc de def bcdef abcdef

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid Student Q: In k-means clustering, what do the initial k-points represent (what is their function)?

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering a a b b c c a a b c b c a a a b c 12 34

Rodney Nielsen, Human Intelligence & Language Technologies Lab Expectation Maximization Student Q: In probability based clustering can regions for different classifications overlap? Student Q: How do we avoid overfitting with clustering?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Discussion k-means minimizes squared distance to cluster centers Result can vary significantly Based on initial choice of seeds Can get trapped in local minimum Example: To increase chance of finding global optimum: restart with different random seeds For hierarchical clustering, can be applied recursively with k = 2 instances initial cluster centers Student Q: Why are the final clusters sensitive to the initial cluster centers? initial cluster centers

Rodney Nielsen, Human Intelligence & Language Technologies Lab Clustering: How Many Clusters? How to choose k in k-means? Possibilities: Choose k that minimizes cross-validated squared distance to cluster centers Use penalized squared distance on the training data (eg. using an MDL criterion) Apply k-means recursively with k = 2 and use stopping criterion (eg. based on MDL) Seeds for subclusters can be chosen by seeding along direction of greatest variance in cluster (one standard deviation away in each direction from cluster center of parent cluster) Implemented in algorithm called X-means (using Bayesian Information Criterion instead of MDL) Student Q: How can we determine the best attribute to use with k nearest neighbor? Student Q: In k-means clustering, what is a simple way to determine the number of cluster points you will need if you do not already know?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Student Questions Student Q: Why couldn't a training set labeled via the clustering method be used to train for a rule? Student Q: Is there a concrete benefit to visualizing the clusters like in the book, or does it just look nice? Student Q: Why exactly does saying all of the attributes are covariant contribute to overfitting? Student Q: In what sense is clustering different from unsupervised classification? Student Q: Is K-means restricted to data for which there is a notion of a center (centroid)? Student Q: I am wondering how K- means handles outliers or how it deals with empty clusters which can be obtained if no points are allocated to a cluster during the assignment step. This can potentially lead to a larger squared error than necessary? Student Q: If clusters can be formed from the data set couldn't it be said that there are classifications that could be learned as well or that those points were classified as that cluster? And if that is the case is clustering just instance based learning?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Student Questions Student Q: I am wondering how K- means handles outliers or how it deals with empty clusters which can be obtained if no points are allocated to a cluster during the assignment step. This can potentially lead to a larger squared error than necessary? Student Q: If clusters can be formed from the data set couldn't it be said that there are classifications that could be learned as well or that those points were classified as that cluster? And if that is the case is clustering just instance based learning?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Faster Distance Calculations Can we use kD-trees or ball trees to speed up the process? Yes: First, build tree, which remains static, for all the data points At each node, store number of instances and sum of all instances In each iteration, descend tree and find out which cluster each node belongs to Can stop descending as soon as we find out that a node belongs entirely to a particular cluster Use statistics stored at the nodes to compute new cluster centers Student Q: It seems like that when constructing a kD tree, it's possible to process multiple data points in one go. Is this a realistic assumption, or a best case scenario?

Rodney Nielsen, Human Intelligence & Language Technologies Lab Dimensionality Reduction Principle Components Analysis Singular Value Decomposition

Rodney Nielsen, Human Intelligence & Language Technologies Lab

Example