1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 30 Nov 11, 2005 Nanjing University of Science & Technology.

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Clustering Categorical Data The Case of Quran Verses
PARTITIONAL CLUSTERING
Clustering: Introduction Adriano Joaquim de O Cruz ©2002 NCE/UFRJ
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Introduction to Bioinformatics
Clustering II.
EE 7730 Image Segmentation.
CS292 Computational Vision and Language Pattern Recognition and Classification.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Color/Intensity
Clustering.
1 On statistical models of cluster stability Z. Volkovich a, b, Z. Barzily a, L. Morozensky a a. Software Engineering Department, ORT Braude College of.
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Lecture 09 Clustering-based Learning
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Clustering Unsupervised learning Generating “classes”
Clustering Algorithms Mu-Yu Lu. What is Clustering? Clustering can be considered the most important unsupervised learning problem; so, as every other.
Math and the Art of M.C.Escher MT A124. Old style Geometry Courses Start with theory Often starts with polygons: triangles, squares, etc. Talk about SAS.
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 13 Oct 14, 2005 Nanjing University of Science & Technology.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 20 Oct 26, 2005 Nanjing University of Science & Technology.
No. 1 Classification and clustering methods by probabilistic latent semantic indexing model A Short Course at Tamkang University Taipei, Taiwan, R.O.C.,
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
Clustering Supervised vs. Unsupervised Learning Examples of clustering in Web IR Characteristics of clustering Clustering algorithms Cluster Labeling 1.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Computer Graphics and Image Processing (CIS-601).
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 25 Nov 4, 2005 Nanjing University of Science & Technology.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
Prepared by: Mahmoud Rafeek Al-Farra
Chapter 12 Object Recognition Chapter 12 Object Recognition 12.1 Patterns and pattern classes Definition of a pattern class:a family of patterns that share.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 29 Nov 11, 2005 Nanjing University of Science & Technology.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Fuzzy C-Means Clustering
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
PART 10 Pattern Recognition 1. Fuzzy clustering 2. Fuzzy pattern recognition 3. Fuzzy image processing FUZZY SETS AND FUZZY LOGIC Theory and Applications.
Basic Machine Learning: Clustering CS 315 – Web Search and Data Mining 1.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 8. Text Clustering.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 2 Nanjing University of Science & Technology.
May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Unsupervised Classification
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning Lecture 4: Unsupervised Learning (clustering) 1.
Data Mining and Text Mining. The Standard Data Mining process.
Unsupervised Learning
Big data classification using neural network
CSE572, CBS572: Data Mining by H. Liu
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
CSE572: Data Mining by H. Liu
Hierarchical Clustering
Unsupervised Learning
Presentation transcript:

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 30 Nov 11, 2005 Nanjing University of Science & Technology

2 Lecture 30 Topics 1.General Comments about the Clustering Problem 2.Present my small programs that can be used for performing clustering 3. Demonstrate the programs 4. Closing Comments

3 Clustering is the art of grouping together pattern vectors that in some sense belong together because they have similar characteristics and are different from other pattern vectors. In the most general problem the number of clusters or subgroups is unknown as are the properties that make them similar. Review

4 Question: How do we start the process of finding clusters and identifying similarities??? Answer: First realize that clustering is an art and there is no correct answer only feasible alternatives. Second explore structures of data, similarity measures, and limitations of various clustering procedures Review

5 Problems in performing meaningful clustering Scaling The nonuniqueness of results Programs always give clusters even when there are no clusters Review

6 There are no correct answers, the clusters provide us with different interpretations of the data where the closeness of patterns is measured with different definitions of similarity. The results may produce ways of looking at the data that we have not considered or noticed. These structural insights may prove useful in the pattern recognition process. Review

7 Methods for Clustering Quantitative Data 1. K-Means Clustering Algorithm 2. Hierarchical Clustering Algorithm 3. ISODATA Clustering Algorithm 4. Fuzzy Clustering Algorithm Review

8 K-Means Clustering Algorithm Randomly Select K cluster centers from Pattern Space Distribute set of patterns to the cluster center using minimum distance Compute new Cluster centers for each cluster Continue this process until the cluster centers do not change. Review

9 Agglomerative Hierarchical Clustering S = { x 1, x 2,..., x k,..., x N } Consider a set S of patterns to be clustered Define Level N by S 1 (N) = { x 1 } S N (N) = { x N } S 2 (N) = { x 2 } Clusters at level N are the individual pattern vectors... Review

10 Define Level N -1 to be N – 1 Clusters formed by merging two of the Level N clusters by the following process. Compute the distances between all the clusters at level N and merge the two with the smallest distance (resolve ties randomly) to give the Level N-1 clusters as S 1 (N-1) S N-1 (N-1) S 2 (N-1) Clusters at level N -1 result from this merging... Review

11 The process of merging two clusters at each step is performed sequentially until Level 1 is reached. Level one is a single cluster containing all samples S 1 (1) = { x 1, x 2,..., x k,..., x N } Thus Hierarchical clustering provides cluster assignments for all numbers of clusters from N to 1. Review

12 Fuzzy C-Means Clustering Preliminary Given a set S composed of pattern vectors which we wish to cluster ) ] C C C S = { x 1, x 2,..., x N } Define C Cluster Membership Functions... Review C

13 Define C Cluster Centroids as follows Let V i be the Cluster Centroid for Fuzzy Cluster Cl i, i = 1, 2, …, C Define a Performance Objective J m as where Review

14 The Fuzzy C-Means Algorithm minimizes J m by selecting V i and i, i =1, 2, …, C by an alternating iterative procedure as described in the algorithm’s details m = Fuzziness Index (m >1 ) Higher numbers being more fuzzy A is a symmetric positive definite matrix N s is total number of pattern vectors Definitions Review

15 Fuzzy C-Means Clustering Algorithm (a) Flow Diagram No Yes Review

16 General Programs for Performing Clustering 1. Available commercial Packages: 2. Small Programs for classroom use SPSS, SAS, GPSS, LCLKmean.exe LCLHier.exe LCLFuzz.exe

17 LCLKmean.exe LCLHier.exe LCLFuzz.exe 2. Small Programs for classroom use Use the K-Means Algorithm to cluster small data sets Performs Hierarchical Clustering of small data sets Performs Fuzzy and crisp clustering of small data sets

18 Data File Format for the LCL Programs N S = Number of data samples V S = Data vector size DATA in row vectors with space between components NSNS VSVS DATA Text File

19 All the clustering techniques presented so far use a measure of distance or similarity. Many of these give equal distance contours that represent hyper spheres and hyper ellipses. If these techniques are used directly on patterns that are not describable by those type of regions we can expect to obtain poor results. Food for Thought

20 In some cases each cluster occupies a limited region (subspace of the total pattern space ) described by a nonlinear functional relation between components. An example appears below. Existing Pattern vectors Existing Pattern Vectors Standard K-Means, Hierarchical, or Fuzzy cluster analysis directly on the data will produce unsatisfactory results.

21 For this type of problem the patterns should be first preprocessed before a clustering procedure is performed. Two almost contradictory approaches can be used for this processing. 1. Extend the pattern space by techniques comparable to functional link nets so that the clusters can be separated by spherical and elliptical regions. 2. Reduce the dimension of the space by a nonlinear form of processing involving principal component like processing before clustering.

22 Both methods imply that we know additional information about the structure of the data. This additional information may be known to us or it may need to be determined. The process of finding structure within data has been put in the large category of “Data Mining”. So get a shovel and start looking. Good luck in your search for gold in the mounds of practical data.

23 Several very important topics in Pattern Recognition were not covered in this course because of time limitations. The following topics deserve your special attention to make your educational experience complete 1. Feature Selection and Extraction 2. Hopfield and feedback neural nets 3. Syntactical Pattern Recognition 4. Special Learning Theory

24 Nanjing University of Science & Technology Lu Jian Feng Yang Jing-yu Wang Han for inviting me to present this course on Statistical and Neural Pattern Recognition Like to Thank and

25 Lu Jian Feng Wang Qiong Wang Huan A Very Special Thanks to my new friends for looking after me. Their kindness and gentle assistance has made my stay in Nanjing a very enjoyable and unforgettable experience.

26 Last and not least I would like to thank all you students for your kind attention throughout this course. Without your interest and cheerful faces it would have been difficult for me to teach. My apology for teaching in English, which I am sure, made your work a lot harder. Best of Luck to all of you in your studies and life.

27 “As you travel through life may all your trails be down hill and the wind always be at your back”. Bye for now and I hope our paths cross again in the future. I will have pleasant thoughts about NUST Sudents and Faculty, Nanjing, and China as I head back to New Mexico !

28 New Mexico Land of Enchantment

29 End of Lecture 30