Introduction to Machine learning

Slides:



Advertisements
Similar presentations
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Advertisements

Clustering.
Clustering Basic Concepts and Algorithms
PARTITIONAL CLUSTERING
Clustering Paolo Ferragina Dipartimento di Informatica Università di Pisa This is a mix of slides taken from several presentations, plus my touch !
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Machine Learning and Data Mining Clustering
Clustering II.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Slide 1 EE3J2 Data Mining Lecture 16 Unsupervised Learning Ali Al-Shahib.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering 1.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
What is Cluster Analysis?
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Math 5364 Notes Chapter 8: Cluster Analysis Jesse Crawford Department of Mathematics Tarleton State University.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Information Retrieval Lecture 6 Introduction to Information Retrieval (Manning et al. 2007) Chapter 16 For the MSc Computer Science Programme Dell Zhang.
Clustering I. 2 The Task Input: Collection of instances –No special class label attribute! Output: Clusters (Groups) of instances where members of a cluster.
Clustering.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Clustering.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering (1) Chapter 7. Outline Introduction Clustering Strategies The Curse of Dimensionality Hierarchical k-means.
Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Big Data Infrastructure
Combinatorial clustering algorithms. Example: K-means clustering
Fuzzy Logic in Pattern Recognition
Data Mining: Basic Cluster Analysis
Sampath Jayarathna Cal Poly Pomona
Semi-Supervised Clustering
Machine Learning Clustering: K-means Supervised Learning
Clustering Usman Roshan.
Constrained Clustering -Semi Supervised Clustering-
Machine Learning Lecture 9: Clustering
Data Mining K-means Algorithm
Clustering (3) Center-based algorithms Fuzzy k-means
Topic 3: Cluster Analysis
CSE 5243 Intro. to Data Mining
Clustering Evaluation The EM Algorithm
Critical Issues with Respect to Clustering
Roberto Battiti, Mauro Brunato
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Clustering Techniques
Text Categorization Berlin Chen 2003 Reference:
Concave Minimization for Support Vector Machine Classifiers
Clustering Techniques
Topic 5: Cluster Analysis
EM Algorithm and its Applications
Machine Learning and Data Mining Clustering
Clustering Usman Roshan CS 675.
Presentation transcript:

Introduction to Machine learning Prof. Eduardo Bezerra (CEFET/RJ) ebezerra@cefet-rj.br

clustering

Overview Introduction K-means Other clustering techniques

Introduction

Clustering Consists of grouping objects into subsets with the goal of finding trends or patterns in the data. e.g., which objects in the collection are similar to each other? It is an unsupervised learning task. There are no labeled examples as well as classification.

General procedure Input: Output: Collection of unlabeled objects. Similarity measure (e.g., cosine, Euclidean distance, etc.) Output: Multiple groups of objects. Constraint: maximize intra-group similarity and minimize similarity between groups.

Clustering Dataset (in two dimensions) with a clearly grouped structure.

k-means

K-means K-Means determines the centroid (or center of gravity or mean) of points of each group c: Formation of the groups is based on the distance between examples x(i) the centroids sj. K-means is the best known algorithm of the family clustering clustering. Considers that objects to be grouped are represented as vectors. It works with the notion of a vector (point) representative of each group to be formed: prototype. This prototype should be some central point of the group, e.g., point belonging to the collection and closer to the center in the group: medoid. point that is the "average" of all objects in the group: centroid.

K-means - algorithm Select k initial centroids Repeat until convergence criteria is met: For each x(i) Assign x(i) to group cj such that dist(x(i), sj) is minimum, where sj is the centroid of cluster cj. For each cj, update its centroid: sj  (cj)

K-means (example for K=2) Source: https://en.wikipedia.org/wiki/K-means_clustering

K-means – objective function K-means solves an optimization problem. One measure of how well the centroids represent their respective groups is the residual sum of squares (RSS). C = {c1, c2, …, ck} 

K-means – implementation aspects Choice of a value for K Choice of seeds Choice of convergence criterion

Limitations of k-means k-means works properly when groups: are spherical are far apart have similar volumes have similar amounts of points

Limitations of k-means Source: https://en.wikipedia.org/wiki/K-means_clustering

Other clustering techniques

Other clustering techniques K-medoids Gaussian mixtures (EM) DBSCAN OPTICS Hierarchical clustering algoritms “A medoid can be defined as the object of a cluster whose average dissimilarity to all the objects in the cluster is minimal. i.e. it is a most centrally located point in the cluster.” https://en.wikipedia.org/wiki/K-medoids