Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.

Slides:



Advertisements
Similar presentations
PARTITIONAL CLUSTERING
Advertisements

Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Dimension reduction (1)
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Data Mining Techniques: Clustering
Principal Components. Karl Pearson Principal Components (PC) Objective: Given a data matrix of dimensions nxp (p variables and n elements) try to represent.
DNA Microarray Bioinformatics - #27611 Program Normalization exercise (from last week) Dimension reduction theory (PCA/Clustering) Dimension reduction.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
What is Cluster Analysis
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker Part of the slides is adapted from Chris Workman.
Data Mining – Intro.
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Data Mining Techniques
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Presented By Wanchen Lu 2/25/2013
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
BACKGROUND LEARNING AND LETTER DETECTION USING TEXTURE WITH PRINCIPAL COMPONENT ANALYSIS (PCA) CIS 601 PROJECT SUMIT BASU FALL 2004.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
es/by-sa/2.0/. Principal Component Analysis & Clustering Prof:Rui Alves Dept Ciencies Mediques.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Digital Image Processing
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Clustering.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
Remote Sensing Unsupervised Image Classification.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
Unsupervised Classification
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
GROUP 6 KIIZA FELIX 2013/BIT/110 MUHANGUZI EUSTUS 2013/BIT/104/PS TUGIROKWIKIRIZA FLAVIA 2013/BIT/111/PS HAMSTONE NATOSHA 2013/BIT/122/PS GILBERT MUMBERE.
Cluster Analysis This work is created by Dr. Anamika Bhargava, Ms. Pooja Kaul, Ms. Priti Bali and Ms. Rajnipriya Dhawan and licensed under a Creative Commons.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
What Is Cluster Analysis?
Semi-Supervised Clustering
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Dimension reduction : PCA and Clustering by Agnieszka S. Juncker
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Clustering (3) Center-based algorithms Fuzzy k-means
University College London (UCL), UK
REMOTE SENSING Multispectral Image Classification
REMOTE SENSING Multispectral Image Classification
CSE572, CBS598: Data Mining by H. Liu
CSE572, CBS572: Data Mining by H. Liu
Dimension reduction : PCA and Clustering
Clustering Wei Wang.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Text Categorization Berlin Chen 2003 Reference:
CSE572: Data Mining by H. Liu
Unsupervised Learning
Presentation transcript:

Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects are grouped into categories, usually for some specific purpose. Ideally, a category illuminates a relationship between the subjects and objects of knowledge. Categorization is fundamental in language, prediction, inference, decision making and in all kinds of interaction with the environment. Statistical classification is a procedure in which individual items are placed into groups based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, etc) and based on a training set of previously labeled items.

The essential problem multispectral hyperspectral radar categorical and topographic thematic map “classification” Rasters are better. Each cell is a sample point with n layers of attributes.

Methods Rule-based (overlay analysis) Optimization Methods –Neutral Networks –Genetic Algorithms –Fuzzy Logic Statistical Methods –Clustering –Principal Component Analysis (Ordination Analysis) –Regression (ordinal logistic regression) –Classification and Regression Trees (CART) –Bayesian Methods –Maximum Likelihood Spatio-Temporal Analysis –Spatio-Temporal Clustering

Image Classification Legend Water/Shadow/Dark Rock Ponderosa Pine/Pinyon-Juniper Pinyon-Juniper (Mixed) Mixed Grassland w/Scrub Mixed Scrub w/Grass Mixed Scrub (Blackbrush/Shadscale) Dark Volcanic Rock w/Mixed Pinyon Painted Desert Canyon de Chelly Black Mesa Hopi Buttes Unsupervised Classification “is a process whereby numerical operations are performed that search for natural groupings of the spectral properties of pixels.” (Jensen. “Introductory Digital Image Processing.” NJ: Prentice Hall ) Unsupervised

Clustering Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets (clusters), so that the data in each subset (ideally) share some common trait often proximity according to some defined distance measure. An important step in any clustering is to select a distance measure, which will determine how the similarity of two elements is calculated. This will influence the shape of the clusters, as some elements may be close to one another according to one distance and further away according to another. Many methods (Isodata, K-mean, Fuzzy c-means, Hierarchical) The main requirements that a clustering algorithm should satisfy are: –scalability; –dealing with different types of attributes; –discovering clusters with arbitrary shape; –minimal requirements for domain knowledge to determine input parameters; –ability to deal with noise and outliers; –insensitivity to order of input records; –high dimensionality; –interpretability and usability.

Clustering Potential problems with clustering are: –current clustering techniques do not address all the requirements adequately (and concurrently); –dealing with large number of dimensions and large number of data items can be problematic; –the effectiveness of the method depends on the definition of “distance” (for distance-based clustering); –if an obvious distance measure doesn’t exist we must “define” it, which is not always easy, especially in multi-dimensional spaces; –the result of the clustering algorithm (that in many cases can be arbitrary itself) can be interpreted in different ways.

Principal Component Analysis (PCA) Numerical method Dimensionality reduction technique Primarily for visualization of arrays/samples ”Unsupervised” method used to explore the intrinsic variability of the data Performs a rotation of the data that maximizes the variance in the new axes

PCA Projects high dimensional data into a low dimensional sub-space (visualized in 2-3 dims) Often captures much of the total data variation in a few dimensions (< 5) Principal Components –1 st Principal component (PC1) Direction along which there is greatest variation –2 nd Principal component (PC2) Direction with maximum variation left in data, orthogonal to PC1

PCA First Principal Component Second Principal Component

PCA First Principal Component Second Principal Component

Distance Measurement An important component of a clustering algorithm is the distance measure between data points. If the components of the data instance vectors are all in the same physical units then it is possible that the simple Euclidean distance metric is sufficient to successfully group similar data instances. This is what is done in remote sensing. However, even in this case the Euclidean distance can sometimes be misleading. Below is an example of the width and height measurements of an object. As the figure shows, different scalings can lead to different clusterings.

K-Means Clustering K-means is one of the simplest unsupervised learning algorithms to solve a clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. Procedure (for 3 clusters): –Make initial guesses for the means m1, m2,..., mk –Until there are no changes in any mean Use the estimated means to classify the samples into clusters For i from 1 to k –Replace mi with the mean of all of the samples for cluster i end_for –end_until

Classification of watersheds based on abiotic factors