Presentation on theme: "Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda."— Presentation transcript:
Data Mining and Decision Tree CS157B Spring 2006 Masumi Shimoda
Outline Brief introduction to data mining Definition Objective Application Decision tree
What is Data Mining? Process of automatically finding the relationships and patterns, and extracting the meaning of enormous amount of data. Also called “knowledge discovery”
Objective Extracting the hidden, or not easily recognizable knowledge out of the large data… Know the past Predicting what is likely to happen if a particular type of event occurs … Predict the future
Application Marketing example Sending direct mail to randomly chosen people Database of recipients’ attribute data (e.g. gender, marital status, # of children, etc) is available How can this company increase the response rate of direct mail?
Application (Cont’d) Figure out the pattern, relationship of attributes that those who responded has in common Helps making decision of what kind of group of people the company should target
Data mining helps analyzing large amount of data, and making decision…but how exactly does it work? One method that is commonly used is decision tree
Decision Tree One of many methods to perform data mining - particularly classification Divides the dataset into multiple groups by evaluating attributes Decision tree can be explained a series of nested if-then-else statements.
Decision Tree (Cont’d) Each non-leaf node has a predicate associated, testing an attribute of data Leaf node represents a class, or category To classify a data, start from root node and traverse down the tree by testing predicates and taking branches
Advantages of Decision Tree Easy to visualize the process of classification Can easily tell why the data is classified in a particular category - just trace the path to get to the leaf and it explains the reason Simple, fast processing Once the tree is made, just traverse down the tree to classify the data
Decision Tree is for… Classifying the dataset which The predicates return discrete values Does not have an attributes that all data has the same value