Presentation is loading. Please wait.

Presentation is loading. Please wait.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:

Similar presentations


Presentation on theme: "Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:"— Presentation transcript:

1 Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH: Data Mining, Chapter 4.8

2 Rodney Nielsen, Human Intelligence & Language Technologies Lab Algorithms: The Basic Methods Inferring rudimentary rules Naïve Bayes, probabilistic model Constructing decision trees Constructing rules Association rule learning Linear models Instance-based learning Clustering

3 Rodney Nielsen, Human Intelligence & Language Technologies Lab Clustering Clustering techniques apply when there is no class to be predicted Aim: divide instances into “natural” groups Clusters can be: Disjoint OR Overlapping Deterministic OR Probabilistic Flat OR Hierarchical Classic clustering algorithm: k-means k-means clusters are disjoint, deterministic, and flat

4 Rodney Nielsen, Human Intelligence & Language Technologies Lab Unsupervised Learning a a b b c c d d e e f f

5 Rodney Nielsen, Human Intelligence & Language Technologies Lab Hierarchical Agglomerative Clustering a a b b c c d d e e f f a a b b c c d d e e f f bc de def bcdef abcdef

6 Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid Student Q: In k-means clustering, what do the initial k-points represent (what is their function)?

7 Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid

8 Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid

9 Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering a a b b c c a a b c b c a a a b c 12 34

10 Rodney Nielsen, Human Intelligence & Language Technologies Lab Expectation Maximization 12 34 Student Q: In probability based clustering can regions for different classifications overlap? Student Q: How do we avoid overfitting with clustering?

11 Rodney Nielsen, Human Intelligence & Language Technologies Lab Discussion k-means minimizes squared distance to cluster centers Result can vary significantly Based on initial choice of seeds Can get trapped in local minimum Example: To increase chance of finding global optimum: restart with different random seeds For hierarchical clustering, can be applied recursively with k = 2 instances initial cluster centers Student Q: Why are the final clusters sensitive to the initial cluster centers? initial cluster centers

12 Rodney Nielsen, Human Intelligence & Language Technologies Lab Clustering: How Many Clusters? How to choose k in k-means? Possibilities: Choose k that minimizes cross-validated squared distance to cluster centers Use penalized squared distance on the training data (eg. using an MDL criterion) Apply k-means recursively with k = 2 and use stopping criterion (eg. based on MDL) Seeds for subclusters can be chosen by seeding along direction of greatest variance in cluster (one standard deviation away in each direction from cluster center of parent cluster) Implemented in algorithm called X-means (using Bayesian Information Criterion instead of MDL) Student Q: How can we determine the best attribute to use with k nearest neighbor? Student Q: In k-means clustering, what is a simple way to determine the number of cluster points you will need if you do not already know?

13 Rodney Nielsen, Human Intelligence & Language Technologies Lab Student Questions Student Q: Why couldn't a training set labeled via the clustering method be used to train for a rule? Student Q: Is there a concrete benefit to visualizing the clusters like in the book, or does it just look nice? Student Q: Why exactly does saying all of the attributes are covariant contribute to overfitting? Student Q: In what sense is clustering different from unsupervised classification? Student Q: Is K-means restricted to data for which there is a notion of a center (centroid)? Student Q: I am wondering how K- means handles outliers or how it deals with empty clusters which can be obtained if no points are allocated to a cluster during the assignment step. This can potentially lead to a larger squared error than necessary? Student Q: If clusters can be formed from the data set couldn't it be said that there are classifications that could be learned as well or that those points were classified as that cluster? And if that is the case is clustering just instance based learning?

14 Rodney Nielsen, Human Intelligence & Language Technologies Lab Student Questions Student Q: I am wondering how K- means handles outliers or how it deals with empty clusters which can be obtained if no points are allocated to a cluster during the assignment step. This can potentially lead to a larger squared error than necessary? Student Q: If clusters can be formed from the data set couldn't it be said that there are classifications that could be learned as well or that those points were classified as that cluster? And if that is the case is clustering just instance based learning?

15 Rodney Nielsen, Human Intelligence & Language Technologies Lab Faster Distance Calculations Can we use kD-trees or ball trees to speed up the process? Yes: First, build tree, which remains static, for all the data points At each node, store number of instances and sum of all instances In each iteration, descend tree and find out which cluster each node belongs to Can stop descending as soon as we find out that a node belongs entirely to a particular cluster Use statistics stored at the nodes to compute new cluster centers Student Q: It seems like that when constructing a kD tree, it's possible to process multiple data points in one go. Is this a realistic assumption, or a best case scenario?

16 Rodney Nielsen, Human Intelligence & Language Technologies Lab Dimensionality Reduction Principle Components Analysis Singular Value Decomposition

17 Rodney Nielsen, Human Intelligence & Language Technologies Lab

18 Example


Download ppt "Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:"

Similar presentations


Ads by Google