Ch10 Machine Learning: Symbol-Based

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

Conceptual Clustering
1 Machine Learning: Lecture 3 Decision Tree Learning (Based on Chapter 3 of Mitchell T.., Machine Learning, 1997)
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
IT 433 Data Warehousing and Data Mining
Decision Tree Approach in Data Mining
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Classification Techniques: Decision Tree Learning
Clustering II.
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
Iterative Optimization of Hierarchical Clusterings Doug Fisher Department of Computer Science, Vanderbilt University Journal of Artificial Intelligence.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Decision Tree Algorithm
1 Text Clustering. 2 Clustering Partition unlabeled examples into disjoint subsets of clusters, such that: –Examples within a cluster are very similar.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Machine Learning: Symbol-Based
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Classification.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
MACHINE LEARNING. What is learning? A computer program learns if it improves its performance at some task through experience (T. Mitchell, 1997) A computer.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
1 Lecture 10 Clustering. 2 Preview Introduction Partitioning methods Hierarchical methods Model-based methods Density-based methods.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
George F Luger ARTIFICIAL INTELLIGENCE 6th edition Structures and Strategies for Complex Problem Solving Machine Learning: Symbol-Based Luger: Artificial.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
CS690L Data Mining: Classification
CS 8751 ML & KDDDecision Trees1 Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 6 Classification and Prediction Dr. Bernard Chen Ph.D. University of Central Arkansas.
CS 5751 Machine Learning Chapter 3 Decision Tree Learning1 Decision Trees Decision tree representation ID3 learning algorithm Entropy, Information gain.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Cluster Analysis Dr. Bernard Chen Ph.D. Assistant Professor Department of Computer Science University of Central Arkansas Fall 2010.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Fuzzy C-means Clustering Dr. Bernard Chen University of Central Arkansas.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering.
Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
DECISION TREES An internal node represents a test on an attribute.
Data Science Algorithms: The Basic Methods
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Machine Learning Lecture 9: Clustering
Chapter 6 Classification and Prediction
Data Science Algorithms: The Basic Methods
Classification and Prediction
Chapter 8 Tutorial.
Clustering.
Machine Learning: Lecture 3
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Clustering Wei Wang.
Text Categorization Berlin Chen 2003 Reference:
Chapter 7: Transformations
©Jiawei Han and Micheline Kamber
Presentation transcript:

Ch10 Machine Learning: Symbol-Based Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011

Machine Learning Outline The book present four chapters on machine learning, reflecting four approaches to the problem: Symbol Based Connectionist Genetic/Evolutionary Stochastic

Ch.10 Outline A framework for Symbol-Based Learning ID3 Decision Tree Unsupervised Learning

The Framework for Symbol-Based Learning

The Framework Example Data The representation: Size(small)^color(red)^shape(round) Size(large)^color(red)^shape(round)

The Framework Example A set of operations: Based on Size(small)^color(red)^shape(round) replace a single constant with a variable produces the generalizations: Size(X)^color(red)^shape(round) Size(small)^color(X)^shape(round) Size(small)^color(red)^shape(X)

The Framework Example The concept space The learner must search this space to find the desired concept. The complexity of this concept space is a primary measure of the difficulty of a learning problem

The Framework Example

The Framework Example Heuristic search: Based on Size(small)^color(red)^shape(round) The learner will make that example a candidate “ball” concept; this concept correctly classifies the only positive instance If the algorithm is given a second positive instance Size(large)^color(red)^shape(round) The learner may generalize the candidate “ball” concept to Size(Y)^color(red)^shape(round)

Learning process The training data is a series of positive and negative examples of the concept: examples of blocks world structures that fit category, along with near misses. The later are instances that almost belong to the category but fail on one property or relation

Examples and near misses for the concept arch

Examples and near misses for the concept arch

Examples and near misses for the concept arch

Examples and near misses for the concept arch

Learning process This approach is proposed by Patrick Winston (1975) The program performs a hill climbing search on the concept space guided by the training data Because the program does not backtrack, its performance is highly sensitive to the order of the training examples A bad order can lead the program to dead ends in the search space

Ch.10 Outline A framework for Symbol-Based Learning ID3 Decision Tree Unsupervised Learning

ID3 Decision Tree ID3, like candidate elimination, induces concepts from examples It is particularly interesting for Its representation of learned knowledge Its approach to the management of complexity Its heuristic for selecting candidate concepts Its potential for handling noisy data

ID3 Decision Tree

ID3 Decision Tree The previous table can be represented as the following decision tree:

ID3 Decision Tree In a decision tree, each internal node represents a test on some property Each possible value of that property corresponds to a branch of the tree Leaf nodes represents classification, such as low or moderate risk

ID3 Decision Tree A simplified decision tree for credit risk management

ID3 Decision Tree ID3 constructs decision trees in a top-down fashion. ID3 selects a property to test at the current node of the tree and uses this test to partition the set of examples The algorithm recursively constructs a sub-tree for each parturition This continues until all members of the partition are in the same class

ID3 Decision Tree For example, ID3 selects income as the root property for the first step

ID3 Decision Tree

ID3 Decision Tree How to select the 1st node? (and the following nodes) ID3 measures the information gained by making each property the root of current subtree It picks the property that provides the greatest information gain

ID3 Decision Tree If we assume that all the examples in the table occur with equal probability, then: P(risk is high)=6/14 P(risk is moderate)=3/14 P(risk is low)=5/14

ID3 Decision Tree ID3 Decision Tree I[6,3,5]= Based on

ID3 Decision Tree

ID3 Decision Tree The information gain form income is: Gain(income)= I[6,3,5]-E[income]= 1.531-0.564=0.967 Similarly, Gain(credit history)=0.266 Gain(debt)=0.063 Gain(colletral)=0.206

ID3 Decision Tree Since income provides the greatest information gain, ID3 will select it as the root of the tree

Attribute Selection Measure: Information Gain (ID3/C4.5) Select the attribute with the highest information gain Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D| Expected information (entropy) needed to classify a tuple in D:

Attribute Selection Measure: Information Gain (ID3/C4.5) Information needed (after using A to split D into v partitions) to classify D: Information gained by branching on attribute A

ID3 Decision Tree Pseudo Code

Another Decision Tree Example

Decision Tree Example Info(Tenured)=I(3,3)= log2(12)=log12/log2=1.07918/0.30103=3.584958. Teach you what is log2 http://www.ehow.com/how_5144933_calculate-log.html Convenient tool: http://web2.0calc.com/

Decision Tree Example InfoRANK (Tenured)= 3/6 I(1,2) + 2/6 I(1,1) + 1/6 I(1,0)= 3/6 * ( ) + 2/6 (1) + 1/6 (0)= 0.79 3/6 I(1,2) means “Assistant Prof” has 3 out of 6 samples, with 1 yes’s and 2 no’s. 2/6 I(1,1) means “Associate Prof” has 2 out of 6 samples, with 1 yes’s and 1 no’s. 1/6 I(1,0) means “Professor” has 1 out of 6 samples, with 1 yes’s and 0 no’s.

Decision Tree Example InfoYEARS (Tenured)= 1/6 I(1,0) + 2/6 I(0,2) + 1/6 I(0,1) + 2/6 I (2,0)= 0 1/6 I(1,0) means “years=2” has 1 out of 6 samples, with 1 yes’s and 0 no’s. 2/6 I(0,2) means “years=3” has 2 out of 6 samples, with 0 yes’s and 2 no’s. 1/6 I(0,1) means “years=6” has 1 out of 6 samples, with 0 yes’s and 1 no’s. 2/6 I(2,0) means “years=7” has 2 out of 6 samples, with 2 yes’s and 0 no’s.

Ch.10 Outline A framework for Symbol-Based Learning ID3 Decision Tree Unsupervised Learning

Unsupervised Learning The learning algorithms discussed so far implement forms of supervised learning They assume the existence of a teacher, some fitness measure, or other external method of classifying training instances Unsupervised Learning eliminates the teacher and requires that the learners form and evaluate concepts their own

Unsupervised Learning Science is perhaps the best example of unsupervised learning in humans Scientists do not have the benefit of a teacher. Instead, they propose hypotheses to explain observations,

Unsupervised Learning The clustering problem starts with (1) a collection of unclassified objects and (2) a means for measuring the similarity of objects The goal is to organize the objects into classes that meet some standard of quality, such as maximizing the similarity of objects in the same class

Unsupervised Learning Numeric taxonomy is one of the oldest approaches to the clustering problem A reasonable similarity metric treats each object as a point in n-dimensional space The similarity of two objects is the Euclidean distance between them in this space

Unsupervised Learning Using this similarity metric, a common clustering algorithm builds clusters in a bottom-up fashion, also known as agglomerative clustering: Examining all pairs of objects, select the pair with the highest degree of similarity, and mark that pair a cluster Defining the features of the cluster as some function (such as average) of the features of the component members and then replacing the component objects with this cluster definition Repeat this process on the collection of objects until all objects have been reduced to a single cluster

Unsupervised Learning The result of this algorithm is a Binary Tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size We may also extend this algorithm to objects represented as sets of symbolic features.

Unsupervised Learning Object1={small, red, rubber, ball} Object1={small, blue, rubber, ball} Object1={large, black, wooden, ball} This metric would compute the similary values: Similarity(object1, object2)= ¾ Similarity(object1, object3)=1/4

Partitioning Algorithms: Basic Concept Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion Global optimal: exhaustively enumerate all partitions Heuristic methods: k-means and k-medoids algorithms k-means (MacQueen’67): Each cluster is represented by the center of the cluster k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each cluster is represented by one of the objects in the cluster

The K-Means Clustering Method Given k, the k-means algorithm is implemented in four steps: Partition objects into k nonempty subsets Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster) Assign each object to the cluster with the nearest seed point Go back to Step 2, stop when no more new assignment

K-means Clustering

K-means Clustering

K-means Clustering

K-means Clustering

K-means Clustering

The K-Means Clustering Method 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 Update the cluster means 4 Assign each objects to most similar center 3 2 1 1 2 3 4 5 6 7 8 9 10 reassign reassign K=2 Arbitrarily choose K object as initial cluster center Update the cluster means

Example Run K-means clustering with 3 clusters (initial centroids: 3, 16, 25) for at least 2 iterations

Example Centroids: 3 – 2 3 4 7 9 new centroid: 5

Example Centroids: 5 – 2 3 4 7 9 new centroid: 5

In class Practice Run K-means clustering with 3 clusters (initial centroids: 3, 12, 19) for at least 2 iterations