Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.

Similar presentations


Presentation on theme: "1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures."— Presentation transcript:

1 1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures Transparencies prepared by Ho Tu Bao [JAIST]

2 2 Lecture 5: Automatic Cluster Detection One of the most widely used KDD classification techniques for unsupervised data. Content of the lecture 1. Introduction 2. Partitioning Clustering 3. Hierarchical Clustering 4. Software and case-studies Prerequisite: Nothing special

3 3 Partitioning Clustering Each cluster must contain at least one object Each object must belong to exactly one group

4 4 Partitioning Clustering What is a “good” partitioning clustering? Key ideas: Objects in each group are similar and objects between different groups are dissimilar. Minimize the within-group distance and Maximize the between-group distance. Notice: Many ways to define the “within-group distance” (the average of distance to the group’s center or the average of distance between all pairs of objects, etc.) and to define the “between-group distance”. It is in general impossible to find the optimal clustering.

5 5 Hierarchical Clustering A hierarchical clustering is a sequence of partitions in which each partition is nested into the next partition in the sequence. Partition Q is nested into partition P if every component of Q is a subset of a component of P. (This definition is for bottom-up hierarchical clustering. In case of top-down hierarchical clustering, “next” becomes “previous”).

6 6 Bottom-up Hierarchical Clustering x 1 x 2 x 3 x 4 x 5 x 6

7 7 Top-Down Hierarchical Clustering x 1 x 2 x 3 x 4 x 5 x 6

8 8 OSHAM: Hybrid Model Wisconsin Breast Cancer Data Attributes Brief Description of Concepts Concept Hierarchy Multiple Inheritance Concepts Discovered Concepts

9 9 Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures

10 10 Lecture 6: Neural networks One of the most widely used KDD classification techniques. Content of the lecture Prerequisite: Nothing special 1. Neural network representation 2. Feed-forward neural networks 3. Using back-propagation algorithm 4. Case-studies

11 11 Lecture 1: Overview of KDD Lecture 2: Preparing data Lecture 3: Decision tree induction Lecture 4: Mining association rules Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures

12 12 Lecture 7 Evaluation of discovered knowledge One of the most widely used KDD classification techniques. Content of the lecture 1. Cross validation 2. Bootstrapping 3. Case-studies Prerequisite: Nothing special

13 13 Out-of-sample testing Historical Data (warehouse) Sampling method Sample data Sampling method Training data Induction method Testing data Error estimation Model 2/3 1/3 error The quality of the test sample estimate is dependent on the number of test cases and the validity of the independent assumption

14 14 Cross Validation Historical Data (warehouse) Sampling method Sample data Sampling method Sample 1 Induction method Sample n Error estimation Model Run’s error 10-fold cross validation appears adequate (n = 10) Sample 2...... Error estimation iterate - Mutually exclusive - Equal size

15 15 randomly split the data set into 3 subsets of equal size run on each 2 subsets as training data to find knowledge test on the rest subset as testing data to evaluate the accuracy average the accuracies as final evaluation 2 3 1 1 2 2 A data set A method to be evaluated Evaluation: k-fold cross validation (k=3) 1 3 3 2 3 1

16 16 Outline of the presentation Objectives, Prerequisite and Content Brief Introduction to Lectures Discussion and Conclusion This presentation summarizes the content and organization of lectures in module “Knowledge Discovery and Data Mining”


Download ppt "1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures."

Similar presentations


Ads by Google