Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.

Slides:



Advertisements
Similar presentations
COMP3740 CR32: Knowledge Management and Adaptive Systems
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Data Mining Lecture 9.
Data Mining Techniques: Classification. Classification What is Classification? –Classifying tuples in a database –In training set E each tuple consists.
IT 433 Data Warehousing and Data Mining
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Decision Tree Approach in Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan,
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Classification Techniques: Decision Tree Learning
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
ICS 421 Spring 2010 Data Mining 2 Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 4/8/20101Lipyeow Lim.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Decision Tree Algorithm
Basic Data Mining Techniques Chapter Decision Trees.
Induction of Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu1.
ROUGH SET THEORY AND FUZZY LOGIC BASED WAREHOUSING OF HETEROGENEOUS CLINICAL DATABASES Yiwen Fan.
Evaluating Performance for Data Mining Techniques
Chapter 7 Decision Tree.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Data Mining: Classification
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Decision Trees & the Iterative Dichotomiser 3 (ID3) Algorithm David Ramos CS 157B, Section 1 May 4, 2006.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Ch10 Machine Learning: Symbol-Based
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
CS690L Data Mining: Classification
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
Machine Learning Queens College Lecture 7: Clustering.
Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Bootstrapped Optimistic Algorithm for Tree Construction
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
By N.Gopinath AP/CSE.  A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, each.
Chapter 6 Decision Tree.
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Prepared by: Mahmoud Rafeek Al-Farra
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Presentation transcript:

Data Mining By Farzana Forhad CS 157B

Agenda Decision Tree and ID3 Rough Set Theory Clustering

Introduction Data mining is a component of a wider process called knowledge discovery from databases. Data mining is a component of a wider process called knowledge discovery from databases. The basic foundations of data mining: The basic foundations of data mining: –decision tree –association rules –clustering –other statistical techniques

Decision Tree ID3 (Quinlan 1986), represents concepts as decision trees. ID3 (Quinlan 1986), represents concepts as decision trees. A decision tree is a classifier in the form of a tree structure where each node is either: A decision tree is a classifier in the form of a tree structure where each node is either: –a leaf node, indicating a class of instances OR –a decision node, which specifies a test to be carried out on a single attribute value, with one branch and a sub-tree for each possible outcome of the test

Decision Tree The set of records available for classification is divided into two disjoint subsets: The set of records available for classification is divided into two disjoint subsets: –a training set : used for deriving the classifier –a test set: used to measure the accuracy of the classifier Attributes whose domain is numerical are called numerical attributes Attributes whose domain is not numerical are called categorical attributes.

Decision Tree A decision tree is a tree with the following properties: A decision tree is a tree with the following properties: –An inner node represents an attribute –An edge represents a test on the attribute of the father node –A leaf represents one of the classes Construction of a decision tree Construction of a decision tree –Based on the training data –Top-Down strategy

Training Dataset

Test Dataset

Decision Tree RULE 1 If it is sunny and the humidity is not above 75%, then play. RULE 2 If it is sunny and the humidity is above 75%, then do not play. RULE 3 If it is overcast, then play. RULE 4 If it is rainy and not windy, then play. RULE 5 If it is rainy and windy, then don't play.

Training Dataset

Decision Tree for Zip Code and Age

Iterative Dichotomizer 3 (ID3) Quinlan (1986) Quinlan (1986) Each node corresponds to a splitting attribute Each node corresponds to a splitting attribute –Entropy is used to measure how informative is a node. –The algorithm uses the criterion of information gain to determine the goodness of a split.

Iterative Dichotomizer 3 (ID3)

Rough Set Theory –Useful means for studying delivery patterns, rules, and knowledge in data –The rough set is the estimate of a vague concept by a pair of specific concepts, called the lower and upper approximations.

Rough Set Theory –The lower approximation is a type of the domain objects which are known with certainty to belong to the subset of interest. – The upper approximation is a description of the objects which may perhaps belong to the subset. –Any subset defined through its lower and upper approximations is called a rough set, if the boundary region is not empty.

Lower and Upper Approximations of a Rough Set

Association Rule Mining Basket Analysis Basket Analysis

Definition of Association Rules

Mining the Rules

Two Steps of Association Rule Mining

Clustering Clustering The process of organizing objects into groups whose members are similar in some way The process of organizing objects into groups whose members are similar in some way Statistics, machine learning, and database researchers have studied data clustering Statistics, machine learning, and database researchers have studied data clustering Recent emphasis on large datasets Recent emphasis on large datasets

Different Approaches to Clustering Two main approaches to clustering: Two main approaches to clustering: -partitioning clustering -hierarchical clustering Clustering algorithms differ among themselves in the following ways: Clustering algorithms differ among themselves in the following ways: –in their ability to handle different types of attributes (numeric and categorical) –in accuracy of clustering –in their ability to handle disk-resident data

Problem Statement N objects to be grouped in k clusters N objects to be grouped in k clusters Number of different possibilities: Number of different possibilities: The objective is to find a grouping such that the distances between objects in a group is minimum The objective is to find a grouping such that the distances between objects in a group is minimum Several algorithms to find near optimal solution Several algorithms to find near optimal solution

k-Means Algorithm 1. Randomly select k points to be the starting points for the centroids of the k clusters. 2. Assign each object to the centroid closest to the object, forming k exclusive clusters of examples. 3. Calculate new centroids of the clusters. Take the average of all the attribute values of the objects belonging to the same cluster. 4. Check if the cluster centroids have changed their coordinates. If yes, repeat from Step If no, cluster detection is finished, and all objects have their cluster memberships defined.

Example One-dimensional database with N = 9 One-dimensional database with N = 9 Objects labeled z 1 …z 9 Objects labeled z 1 …z 9 Let k = 2 Let k = 2 Let us start with z 1 to z 2 as the initial centroids Let us start with z 1 to z 2 as the initial centroids Table: One- dimensional database

Example Table: New cluster assignments

Example Table: Reassignment of objects to two clusters

Questions? Thank You