Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
PARTITIONAL CLUSTERING
UNSUPERVISED ANALYSIS GOAL A: FIND GROUPS OF GENES THAT HAVE CORRELATED EXPRESSION PROFILES. THESE GENES ARE BELIEVED TO BELONG TO THE SAME BIOLOGICAL.
Clustering and Dimensionality Reduction Brendan and Yifang April
1 Machine Learning: Symbol-based 10d More clustering examples10.5Knowledge and Learning 10.6Unsupervised Learning 10.7Reinforcement Learning 10.8Epilogue.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
CS728 Web Clustering II Lecture 14. K-Means Assumes documents are real-valued vectors. Clusters based on centroids (aka the center of gravity or mean)
Basic Data Mining Techniques
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Introduction to Bioinformatics - Tutorial no. 12
Advanced Multimedia Text Clustering Tamara Berg. Reminder - Classification Given some labeled training documents Determine the best label for a test (query)
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Algorithms: The Basic Methods Witten – Chapter 4 Charles Tappert Professor of Computer Science School of CSIS, Pace University.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Methods K- means. K-means Algorithm Assume that K=3 and initially the points are assigned to clusters as follows. C 1 ={x 1,x 2,x 3 }, C 2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.4: Covering Algorithms Rodney Nielsen Many.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
By Timofey Shulepov Clustering Algorithms. Clustering - main features  Clustering – a data mining technique  Def.: Classification of objects into sets.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Machine Learning Queens College Lecture 7: Clustering.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall DM Finals Study Guide Rodney Nielsen.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Machine Learning in Practice Lecture 21 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.5: Instance-based Learning Rodney Nielsen Many / most of these slides were adapted.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Data Mining – Algorithms: K Means Clustering
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Machine Learning Clustering: K-means Supervised Learning
Data Science Algorithms: The Basic Methods
Machine Learning Lecture 9: Clustering
Revision (Part II) Ke Chen
Clustering.
Text Categorization Berlin Chen 2003 Reference:
Unsupervised Learning: Clustering
Data Mining CSCI 307, Spring 2019 Lecture 24
Unsupervised Learning
Presentation transcript:

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall

Rodney Nielsen, Human Intelligence & Language Technologies Lab Algorithms: The Basic Methods ● Inferring rudimentary rules ● Naïve Bayes, probabilistic model ● Constructing decision trees ● Constructing rules ● Association rule learning ● Linear models ● Instance-based learning ● Clustering

Rodney Nielsen, Human Intelligence & Language Technologies Lab Clustering Clustering techniques apply when there is no class to be predicted Aim: divide instances into “natural” groups Clusters can be: Disjoint OR Overlapping Deterministic OR Probabilistic Flat OR Hierarchical Classic clustering algorithm: k-means k-means clusters are disjoint, deterministic, and flat

Rodney Nielsen, Human Intelligence & Language Technologies Lab Unsupervised Learning a a b b c c d d e e f f

Rodney Nielsen, Human Intelligence & Language Technologies Lab Hierarchical Agglomerative Clustering a a b b c c d d e e f f a a b b c c d d e e f f bc de def bcdef abcdef

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering b c a 1. Choose number of clusters e.g., k=3 2. Select random centroids often examples 3. Until convergence 4. Iterate over all examples and assign them to the cluster whose centroid is closest 5. Re-compute the cluster centroid

Rodney Nielsen, Human Intelligence & Language Technologies Lab k -means Clustering a a b b c c a a b c b c a a a b c

Rodney Nielsen, Human Intelligence & Language Technologies Lab Expectation Maximization

Rodney Nielsen, Human Intelligence & Language Technologies Lab Discussion k-means minimizes squared distance to cluster centers Result can vary significantly Based on initial choice of seeds Can get trapped in local minimum Example: To increase chance of finding global optimum: restart with different random seeds For hierarchical clustering, can be applied recursively with k = 2 instances initial cluster centers

Rodney Nielsen, Human Intelligence & Language Technologies Lab Faster Distance Calculations Can we use kD-trees or ball trees to speed up the process? Yes: First, build tree, which remains static, for all the data points At each node, store number of instances and sum of all instances In each iteration, descend tree and find out which cluster each node belongs to Can stop descending as soon as we find out that a node belongs entirely to a particular cluster Use statistics stored at the nodes to compute new cluster centers

Rodney Nielsen, Human Intelligence & Language Technologies Lab Example

Rodney Nielsen, Human Intelligence & Language Technologies Lab Dimensionality Reduction Principle Components Analysis Singular Value Decomposition