Conceptual Clustering

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Data Mining Lecture 9.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Random Forest Predrag Radenković 3237/10
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Imbalanced data David Kauchak CS 451 – Fall 2013.
Copyright 2004 Koren & Krishna ECE655/DataRepl.1 Fall 2006 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Types of Algorithms.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
CS Clustering1 Unsupervised Learning and Clustering In unsupervised learning you are given a data set with no output classifications Clustering is.
Best-First Search: Agendas
Adaptive Resonance Theory (ART) networks perform completely unsupervised learning. Their competitive learning algorithm is similar to the first (unsupervised)
Chapter 7 – Classification and Regression Trees
Tian Zhang Raghu Ramakrishnan Miron Livny Presented by: Peter Vile BIRCH: A New data clustering Algorithm and Its Applications.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Cognitive Modelling Experiment Clodagh Collins. Clodagh Collins.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Iterative Optimization and Simplification of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial.
Induction of Decision Trees
Tree Clustering & COBWEB. Remember: k-Means Clustering.
Clustering. 2 Outline  Introduction  K-means clustering  Hierarchical clustering: COBWEB.
Learning….in a rather broad sense: improvement of performance on the basis of experience Machine learning…… improve for task T with respect to performance.
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
Clustering Unsupervised learning Generating “classes”
Module 04: Algorithms Topic 07: Instance-Based Learning
Issues with Data Mining
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Fundamentals of Algorithms MCS - 2 Lecture # 7
Clustering II. 2 Finite Mixtures Model data using a mixture of distributions –Each distribution represents one cluster –Each distribution gives probabilities.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Ch10 Machine Learning: Symbol-Based
Learning from Observations Chapter 18 Through
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CS Decision Trees1 Decision Trees Highly used and successful Iteratively split the Data Set into subsets one attribute at a time, using most informative.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
1 Branch and Bound Searching Strategies Updated: 12/27/2010.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Data Science Algorithms: The Basic Methods
Instance Based Learning
Chapter 6 Classification and Prediction
Types of Algorithms.
Types of Algorithms.
Clustering.
Clustering Wei Wang.
Text Categorization Berlin Chen 2003 Reference:
Types of Algorithms.
Clustering Techniques
Data Mining CSCI 307, Spring 2019 Lecture 21
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Conceptual Clustering Unsupervised, spontaneous - categorizes or postulates concepts without a teacher Conceptual clustering forms a classification tree - all initial observations in root - create new children using single attribute (not good), attribute combinations (all), information metrics, etc. - Each node is a class Should decide quality of class partition and significance (noise) Many models use search to discover hierarchies which fulfill some heuristic within and/or between clusters - similarity, cohesiveness, etc.

Cobweb Cobweb is an incremental hill-climbing strategy with bidirectional operators - not backtrack, but could return in theory Starts empty. Creates a full concept hierarchy (classification tree) with each leaf representing a single instance/object. You can choose how deep in the tree hierarchy you want to go for the specific application at hand Objects described as nominal attribute-value pairs Each created node is a probabilistic concept (a class) which stores probability of being matched (count/total), and for each attribute, probability of being on, P(a=v|C), only counts need be stored. Arcs in tree are just connections - nodes store info across all attributes (unlike ID3, etc.)

Category Utility: Heuristic Measure Tradeoff between intra-class similarity and inter-class dissimilarity - sums measures from each individual attribute Intra-class similarity a function of P(Ai = Vij|Ck), Predictability of C given V - Larger P means if class is C, A likely to be V. Objects within a class should have similar attributes. Inter-class dissimilarity a function of P(Ck|Ai = Vij), Predictiveness of C given V - Larger P means A=V suggests instance is member of class C rather than some other class. A is a stronger predictor of class C. Players from men’s football and basketball team Tall and Lanky Beefy Male

Category Utility Intuition Both should be high over all (most) attributes for a good class breakdown Predictability: P(V|C) could be high for multiple classes, giving a relatively low P(C|V), thus not good for discrimination Predictiveness: P(C|V) could be high for a class, while P(V|C) is relatively low, due to V occurring rarely, thus good for discrimination, but not intra-class similarity When both are high, get best categorization balance between discrimination and intra-class similarity Male in first example 7 foot tall

Category Utility For each category sum predictability times predictiveness for each attribute weighted by P(Ai = Vij), with k proposed categories, i attributes, j values/attribute The expected number of attribute values one could guess given C

Category Utility Category Utility is the increase in expected attributes that could be guessed, given a partitioning of categories - leaf nodes. CU({C1, C2, ... Ck}) = K normalizes CU for different numbers of categories in the candidate partition Since incremental, there is a limited number of possible categorization partitions If Ai = Vij is independent (irrelevant) of class membership, CU = 0

Cobweb Learning Algorithm 1. Incrementally add a new training example 2. Recurse down the at root until new node with just this example is added. Update appropriate probabilities at each level. 3. At each level of the tree calculate the scores for all valid modifications using category utility (CU) 4. Depending on which of the following gives the best score: Classify into an existing class - then recurse Create a new class node – done, can get next example Combine two classes into a single class (Merging) - then recurse Divide a class into multiple classes (Splitting) - then recurse

Cobweb Learning Mechanisms Classifying (Matching) - calculate overall CU for each case of putting the example in a node at current level New Class - calculate overall CU for putting example into a single new class- Note gradient descent (greedy) nature. Does not go back and try all possible new partitions. If created from internal node, simply add If created from leaf node, split into two, one for new and old These alone are quite order dependent - splitting and merging allow bi-directionality - ability to undo

Cobweb Learning Mechanisms Merging - For best matching node (the one that would be chosen for classification) and the second best matching node at that level, calculate CU when both are merged into one node, with two children Splitting - For best matching node, calculate CU if that node were deleted and it’s children added to the current level. Both schemes could be extended to test other nodes, at the cost of increased computational complexity Can overcome initial “misconceptions”

Cobweb Comments Generalization done by just executing recursive classification step Could use different variations on CU and search strategy Complexity: O(AVB2logK) for each example, where B is branching factor, A (attributes), V (average number of values), K (classes) Empirically, B usually between 2 and 5 Does not directly handle noise - no defined significance mechanism Tends to make “bushy” trees, however high levels should be most important class categories (because of merge/split causing best breaks to float up, though no optimal guarantee), and one could just use nodes highest in the tree for classification Does not support continuous values

Extensions - Classit Cannot store probability counts for continuous data Classit uses a scheme similar to Cobweb, but assumes normal distribution around an attribute and thus can just store a mean and variance - not always a reasonable assumption Also uses a formal cut-off (significance) mechanism to better support generalization and noise handling (a class node can then include outliers) More work needed