Introduction to undirected Data Mining: Clustering

Slides:



Advertisements
Similar presentations
Clustering Basic Concepts and Algorithms
Advertisements

PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Unsupervised learning. Summary from last week We explained what local minima are, and described ways of escaping them. We investigated how the backpropagation.
Self Organization: Competitive Learning
5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.
Kohonen Self Organising Maps Michael J. Watts
Data Mining Techniques: Clustering
X0 xn w0 wn o Threshold units SOM.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.
Basic Data Mining Techniques
Neural Networks Chapter Feed-Forward Neural Networks.
What is Cluster Analysis?
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Neural Networks Lecture 17: Self-Organizing Maps
Lecture 09 Clustering-based Learning
Introduction to Directed Data Mining: Neural Networks
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Radial Basis Function Networks
Introduction to Directed Data Mining: Decision Trees
Microsoft Enterprise Consortium Data Mining Concepts Introduction: The essential background Prepared by David Douglas, University of ArkansasHosted by.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Neural Networks Prepared by David Douglas, University of ArkansasHosted.
Clustering Unsupervised learning Generating “classes”
Introduction: The essential background
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Radial Basis Function Networks
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Presentation on Neural Networks.. Basics Of Neural Networks Neural networks refers to a connectionist model that simulates the biophysical information.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
Chapter 9 Neural Network.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
NEURAL NETWORKS FOR DATA MINING
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Microarrays.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Machine Learning Neural Networks (3). Understanding Supervised and Unsupervised Learning.
So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Unsupervised Learning. Supervised learning vs. unsupervised learning.
Clustering.
Data Science and Big Data Analytics Chap 4: Advanced Analytical Theory and Methods: Clustering Charles Tappert Seidenberg School of CSIS, Pace University.
Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Semiconductors, BP&A Planning, DREAM PLAN IDEA IMPLEMENTATION.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
CHAPTER 14 Competitive Networks Ming-Feng Yeh.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Linear Models & Clustering Presented by Kwak, Nam-ju 1.
Supervised Learning – Network is presented with the input and the desired output. – Uses a set of inputs for which the desired outputs results / classes.
Big data classification using neural network
Self-Organizing Network Model (SOM) Session 11
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Artificial Neural Networks
Neural Networks Winter-Spring 2014
Data Mining, Neural Network and Genetic Programming
Neuro-Computing Lecture 4 Radial Basis Function Network
MIS2502: Data Analytics Clustering and Segmentation
Unsupervised Networks Closely related to clustering
Presentation transcript:

Introduction to undirected Data Mining: Clustering IBM Data Mining Concepts Introduction to undirected Data Mining: Clustering Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Clustering Quick Refresher DM used to find previously unknown meaningful patterns in data Patterns not always easy to find There are no discernable patterns Excess of patterns -- noise Structure so complex difficult to find Clustering provides a way to learn about the structure of complex data Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Clustering (cont) Clustering refers to grouping records, observations, or tasks into classes of similar objects A Cluster is collection records similar to one another Records in one cluster dissimilar to records in other clusters Clustering is a unsupervised (undirected) data mining task; therefore, no target variable specified Clustering algorithms segment records minimizing within-cluster variance and maximizing between cluster variation Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Clustering (cont) Is placed in the exploratory category and seldom used in isolation because finding clusters in not often an end in itself Many times clustering results are used for downstream data mining tasks For example, a cluster number could be added to each record of dataset before doing a decision tree Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Clustering Example Graph from Berry & Linoff Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Clustering k-means Kohonen Networks -- Self-Organizing Maps (SOM) Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

K-Means Cannot gloss over selection of K No apriori reason for a particular K in many cases Thus, try several values of K and then evaluate the strength of the clusters Average distance between records within clusters compared to the average distance between clusters or some other method Sometimes, result is one giant central cluster with a number of small surrounding cluster May identify fraud or defects Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Measurement Issues Convert numeric values into 0 to 1 range Covert categorical values into numeric values By default, some software transforms record set fields as groups of numeric fields between 0 and 1.0 Some software sets the default weighting value for a flag field is the square root of 0.5 (approximately 0.707107) . Values closer to 1.0 will weight set fields more heavily than numeric fields Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Clustering Illustrated Emphasis on clustering is similarity Between-cluster variation: Within-cluster variation: Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

K-means Algorithm Step 1: Analyst specifies k = number of clusters to partition data Step 2: k randomly assigned initial cluster centers Step 3: For each record, find cluster center Each cluster center “owns” subset of records Results in k clusters, C1, C2, ...., Ck Step 4: For each of k clusters, find cluster centroid Update cluster center location to centroid Step 5: Repeats Steps 3 – 5 until convergence or termination k-Means algorithm terminates when centroids no longer change For k clusters, C1, C2, ...., Ck, all records “owned” by cluster remain in cluster Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Numeric Example Step 1: Determining Cluster Centroid Assume n data points (a1, b1, c1), (a2, b2, c2), ..., (an, bn, cn) Centroid of points is center of gravity of points For example, consider these four points (1, 1, 1), (1, 2, 1), (1, 3, 1), and (2, 1, 1) in 3 dimensional space The centroid is Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Numeric Example (cont) Assume k = 2 to cluster following data points Step 1: k = 2 specifies number of clusters to partition Step 2: Randomly assign k = 2 cluster centers For example, c1 = (1, 1) and c2 = (2, 1) First Iteration Step 3: For each record, find nearest cluster center Euclidean distance from points to c1 and c2 shown a b c d e f g h (1, 3) (3, 3) (4, 3) (5, 3) (1, 2) (4, 2) (1, 1) Point a b c d e f g h Distance from c1 2.00 2.83 3.61 4.47 1.00 3.16 0.00 Distance from c2 2.24 1.41 Cluster Membership C1 C2 Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Numeric Example (cont) Cluster c1 contains {a, e, g} and c2 has {b, c, d, f, h} Cluster membership assigned, now SSE calculated SSE = 22+2.242+2.832+3.612+12+2.242+0+0=36 Recall clusters should be constructed where between-cluster variation (BCV) large, as compared to within-cluster variation (WCV) A possible measure for this is cluster centroid distance divided by the SSE. For this example, Note--Ratio expected to increase for successive iterations Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Numeric Example (cont) Step 4: For k clusters, find cluster centroid, update location Cluster 1 = [(1 + 1 + 1)/3, (3 + 2 + 1)/3] = (1, 2) Cluster 2 = [(3 + 4 + 5 + 4 + 2)/5, (3 + 3 + 3 + 2 + 1)/5] = (3.6, 2.4) The figure below shows movement of clusters c1 and c2 (triangles) after first iteration of algorithm 0 1 2 3 4 5 6 1 2 5 4 3 Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Numeric Example (cont) Continue with Steps 3-4 until convergence Recall that convergence may occur when the cluster centroids are essentially static, records do not change clusters or other stopping criteria such as time or numer of iterations Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

K-Means Summary k-Means not guaranteed to find to find global minimum SSE Instead, local minimum may be found Invoking algorithm using variety of initial cluster centers improves probability of achieving global minimum One approach places first cluster at random point, with remaining clusters placed far from previous centers (Moore) What is appropriate value for k? Potential problem for applying k-Means Analyst may have a priori knowledge of k Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Kohonen SOM(Self Organizing Maps) Applicable for clustering Based on Competitive Learning, where output nodes compete to become winning node (neuron) Nodes become selectively tuned to input patterns during the competitive learning process (Haykin) Example SOM architecture shown with two inputs, Age and Income Age Income Output Layer Input Layer Connections with Weights Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Kohonen SOM(CONT) Input nodes pass variable values to the Network SOMs are Feedforward (no looping allowed) and Completely Connected (each node in input layer completely connected to every node in the output layer) Neural Network without hidden layer(s) Every connection between two nodes has weight Weight values initialized randomly 0 – 1 Adjusting weights key feature of learning process Attribute values are normalized or standardized Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

SOM(cont) Assume input records have attributes Age and Income. 1st input record has Age = 0.69 and Income = 0.88 Attribute values for Age and Income enter through respective input nodes Values passed to all output nodes These values, together with connection weights, determine value of Scoring Function for each output node Output node with “best” score designated Winning Node for record Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas Hosted by the University of Arkansas

SOM(cont) Competition Cooperation Three characteristics Output nodes compete with one another for “best” score Euclidean Distance function commonly used Winning node produces smallest distance between inputs and connection weights Cooperation Winning node becomes center of neighborhood Output nodes in neighborhood share “excitement” or “reward” Emulates behavior of biological neurons, which are sensitive to output of neighbors Nodes in output layer not directly connected However, share common features because of neighborhood behavior Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas Hosted by the University of Arkansas

SOM(cont) Adaptation Neighborhood nodes participate in adaptation (learning) Weights adjusted to improve score function For subsequent iterations, increases likelihood of winning records with similar values Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Kohonen Network Algorithm (Fausett) START ALGORITHM: Initialize Assign random values to weights Initial learning rate and neighborhood size values assigned LOOP: For each input record Competition For each output node, calculate scoring function Find winning output node Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Kohonen Network Algorithm (Fausett) Cooperation Identify output nodes j, within neighborhood of J defined by neighborhood size R Adaptation Adjust weights of all neighborhood nodes Adjust learning rate and neighborhood size (decreasing), as needed Nodes not attracting sufficient number of hits may be pruned Stop when termination criteria met END ALGORITHM: Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Example Use simple 2 x 2 Kohonen Network Neighborhood Size = 0, Learning Rate = 0.5 Input data consists of four records, with attributes Age and Income (values normalized) Records with attribute values: Initial network weights (randomly assigned): 1 x11 = 0.8 x12 = 0.8 Older person with high income 2 x21 = 0.8 x22 = 0.1 Older person with low income 3 x31 = 0.2 x32 = 0.9 Younger person with high income 4 x41 = 0.1 x42 = 0.1 Younger person with low income w11 = 0.9 w21 = 0.8 w12 = 0.9 w22 = 0.2 w13 = 0.1 w23 = 0.8 w14 = 0.1 w24 = 0.2 Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Example (cont) Age Income Input Layer Output Layer Record 1 .8 .8 Node 1 Node 3 Node 2 Node 4 W11= .9 W12= .9 W13= .1 W14= .1 W21= .8 W22=.2 W23= .8 W24= .2 Age Income Input Layer Output Layer Record 1 .8 .8 Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Example (cont) First Record x1 = (0.8, 0.8) Competition Phase Compute Euclidean Distance between input and weight vectors The winning node is Node 1 (minimizes distance = 0.10) Note, node 1 weights most similar to input record values Node 1 may exhibit affinity (cluster) for records of “older persons with high income Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Example (cont) First Record x1 = (0.8, 0.8) Cooperation Phase Neighborhood Size R = 0 Therefore, nonexistent “excitement” of neighboring nodes Only winning node receives weight adjustment Adaptation Phase Weights for Node 1 adjusted based 1st record weights and applying learning rate = 0.5: age: .9 + .5(.8 - .9) = .85 income: .8 + .5(.8 - .8) = .8 Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Example (cont) First Record x1 = (0.8, 0.8) Note direction of weight adjustments Weights move toward input field values Initial weight w11 = 0.9, adjusted in direction of x11 = 0.8 With learning rate = 0.5, w11 moved half the distance from 0.9 to 0.8 Therefore, w11 updated to 0.85 Algorithm then moves to 2nd record and repeats process with new node 1 weights Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas

Clustering Lessons Learned Clustering is exploratory As much an art as a science Key is to find interesting and useful clusters Resulting clusters may be used as predictors In this case, field of interest should be excluded from cluster building process For example, churn may be a target variable for a classification DM application Clusters are built without churn Now, cluster membership fields used as input to classification models may improve classification accuracy Adapted from Larose Prepared by David Douglas, University of Arkansas Hosted by the University of Arkansas