Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.

Slides:



Advertisements
Similar presentations
CHAPTER 9: Decision Trees
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Decision Trees with Numeric Tests
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Decision Tree Approach in Data Mining
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Classification Techniques: Decision Tree Learning
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
CSCI 347 / CS 4206: Data Mining Module 02: Input Topic 03: Attribute Characteristics.
Decision Trees.
Algorithms: The basic methods. Inferring rudimentary rules Simplicity first Simple algorithms often work surprisingly well Many different kinds of simple.
Decision Trees Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei Han.
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Constructing Decision Trees. A Decision Tree Example The weather data example. ID codeOutlookTemperatureHumidityWindyPlay abcdefghijklmnabcdefghijklmn.
Classification: Decision Trees
1 Classification with Decision Trees I Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Eibe Frank and Jiawei.
Decision Trees an Introduction.
1 1 1 Berendt: Advanced databases, 2010, Advanced databases – Inferring implicit/new knowledge from data(bases):
Knowledge Representation. 2 Outline: Output - Knowledge representation  Decision tables  Decision trees  Decision rules  Rules involving relations.
Classification.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
Berendt: Knowledge and the Web, 2014, 1 Knowledge and the Web – Inferring new knowledge from data(bases):
Machine Learning Chapter 3. Decision Tree Learning
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Data Mining – Input: Concepts, instances, attributes Chapter 2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 2: Input: Concepts, Instances and Attributes Rodney Nielsen Many of these slides were.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Basic Data Mining Technique
1 Data preparation: Selection, Preprocessing, and Transformation Literature: Literature: I.H. Witten and E. Frank, Data Mining, chapter 2 and chapter 7.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.3: Decision Trees Rodney Nielsen Many of.
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
CS690L Data Mining: Classification
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Berendt: Knowledge and the Web, 2015, 1 Knowledge and the Web – Inferring new knowledge from data(bases):
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
1 Classification: predicts categorical class labels (discrete or nominal) classifies data (constructs a model) based on the training set and the values.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Data Science Input: Concepts, Instances and Attributes WFH: Data Mining, Chapter 2 Rodney Nielsen Many/most of these slides were adapted from: I. H. Witten,
Decision Trees by Muhammad Owais Zahid
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data Mining Chapter 4 Algorithms: The Basic Methods - Constructing decision trees Reporter: Yuen-Kuei Hsueh Date: 2008/7/24.
Review of Decision Tree Learning Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data Mining Practical Machine Learning Tools and Techniques
DECISION TREES An internal node represents a test on an attribute.
Decision Trees an introduction.
Classification Algorithms
Artificial Intelligence
Data Science Algorithms: The Basic Methods
CSE 711: DATA MINING Sargur N. Srihari Phone: , ext. 113.
Clustering.
Machine Learning: Lecture 3
CSCI N317 Computation for Scientific Applications Unit Weka
Data Mining CSCI 307 Spring, 2019
Data Mining CSCI 307, Spring 2019 Lecture 15
Data Mining CSCI 307, Spring 2019 Lecture 6
Data Mining CSCI 307, Spring 2019 Lecture 9
Presentation transcript:

Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new knowledge from data(bases): Single-table and multirelational data mining Bettina Berendt Katholieke Universiteit Leuven, Department of Computer Science Last update: 26 November 2007

Berendt: Advanced databases, winter term 2007/08, 2 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 3 Recall: Knowledge discovery and styles of reasoning 1. Business understanding  Evaluation n Learn a model from the data (observed instances) n Generally involves induction (during Modelling) 2. Deployment n Apply the model to new instances n Corresponds to deduction l (if one assumes that the model is true)

Berendt: Advanced databases, winter term 2007/08, 4 Phases talked about today 1. Business understanding  Evaluation n Learn a model from the data (observed instances) n Generally involves induction (during Modelling)

Berendt: Advanced databases, winter term 2007/08, 5 Decision tree learning (1): Decision rules What contact lenses to give to a patient? (Could be based on background knowledge, but can also be learned from the WEKA contact lens data)

Berendt: Advanced databases, winter term 2007/08, 6 Decision tree learning (2): Classification / prediction In which weather will someone play (tennis etc.)? (Learned from the WEKA weather data)

Berendt: Advanced databases, winter term 2007/08, 7 Learned from data like this: NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

Berendt: Advanced databases, winter term 2007/08, 8 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 9 What’s a concept? Styles of learning:  Classification learning: predicting a discrete class  Association learning: detecting associations between features  Clustering: grouping similar instances into clusters  Numeric prediction: predicting a numeric quantity Concept: thing to be learned Concept description: output of learning scheme

Berendt: Advanced databases, winter term 2007/08, 10 Classification learning Example problems: weather data, contact lenses, irises, labor negotiations Classification learning is supervised  Scheme is provided with actual outcome Outcome is called the class of the example Measure success on fresh data for which class labels are known (test data)‏ In practice success is often measured subjectively

Berendt: Advanced databases, winter term 2007/08, 11 Example weather data NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

Berendt: Advanced databases, winter term 2007/08, 12 Association learning Can be applied if no class is specified and any kind of structure is considered “interesting” Difference to classification learning:  Can predict any attribute’s value, not just the class, and more than one attribute’s value at a time  Hence: far more association rules than classification rules  Thus: constraints are necessary Minimum coverage and minimum accuracy

Berendt: Advanced databases, winter term 2007/08, 13 Clustering Finding groups of items that are similar Clustering is unsupervised  The class of an example is not known Success often measured subjectively … … … Iris virginica Iris virginica Iris versicolor Iris versicolor Iris setosa Iris setosa TypePetal widthPetal lengthSepal widthSepal length

Berendt: Advanced databases, winter term 2007/08, 14 Numeric prediction Variant of classification learning where “class” is numeric (also called “regression”)‏ Learning is supervised  Scheme is being provided with target value Measure success on test data …………… 40FalseNormalMildRainy 55FalseHighHotOvercast 0TrueHighHotSunny 5FalseHighHotSunny Play-timeWindyHumidityTemperatureOutlook

Berendt: Advanced databases, winter term 2007/08, 15 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 16 What’s in an example? Instance: specific type of example Thing to be classified, associated, or clustered Individual, independent example of target concept Characterized by a predetermined set of attributes Input to learning scheme: set of instances/dataset Represented as a single relation/flat file Rather restricted form of input No relationships between objects Most common form in practical data mining

Berendt: Advanced databases, winter term 2007/08, 17 Instances in the weather data NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

Berendt: Advanced databases, winter term 2007/08, 18 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 19 What’s in an attribute? Each instance is described by a fixed predefined set of features, its “attributes” But: number of attributes may vary in practice  Possible solution: “irrelevant value” flag Related problem: existence of an attribute may depend of value of another one Possible attribute types (“levels of measurement”):  Nominal, ordinal, interval and ratio

Berendt: Advanced databases, winter term 2007/08, 20 Nominal quantities Values are distinct symbols  Values themselves serve only as labels or names  Nominal comes from the Latin word for name Example: attribute “outlook” from weather data  Values: “sunny”,”overcast”, and “rainy” No relation is implied among nominal values (no ordering or distance measure)‏ Only equality tests can be performed

Berendt: Advanced databases, winter term 2007/08, 21 Ordinal quantities Impose order on values But: no distance between values defined Example: attribute “temperature” in weather data  Values: “hot” > “mild” > “cool” Note: addition and subtraction don’t make sense Example rule: temperature < hot  play = yes Distinction between nominal and ordinal not always clear (e.g. attribute “outlook”)‏

Berendt: Advanced databases, winter term 2007/08, 22 Interval quantities Interval quantities are not only ordered but measured in fixed and equal units Example 1: attribute “temperature” expressed in degrees Fahrenheit Example 2: attribute “year” Difference of two values makes sense Sum or product doesn’t make sense  Zero point is not defined!

Berendt: Advanced databases, winter term 2007/08, 23 Ratio quantities Ratio quantities are ones for which the measurement scheme defines a zero point Example: attribute “distance”  Distance between an object and itself is zero Ratio quantities are treated as real numbers  All mathematical operations are allowed But: is there an “inherently” defined zero point?  Answer depends on scientific knowledge (e.g. Fahrenheit knew no lower limit to temperature)‏

Berendt: Advanced databases, winter term 2007/08, 24 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 25 Notes on Preparing the input (CRISP-DM Data preparation stage) Denormalization is not the only issue Problem: different data sources (e.g. sales department, customer billing department, …)‏  Differences: styles of record keeping, conventions, time periods, data aggregation, primary keys, errors  Data must be assembled, integrated, cleaned up  “Data warehouse”: consistent point of access External data may be required (“overlay data”)‏ Critical: type and level of data aggregation

Berendt: Advanced databases, winter term 2007/08, 26 The ARFF format % % ARFF file for weather data with some numeric features outlook {sunny, overcast, temperature humidity windy {true, play? {yes, sunny, 85, 85, false, no sunny, 80, 90, true, no overcast, 83, 86, false, yes...

Berendt: Advanced databases, winter term 2007/08, 27 Additional attribute types ARFF supports string attributes: n Similar to nominal attributes but list of values is not pre- specified It also supports date attributes: n Uses the ISO-8601 combined date and time format yyyy- description today date

Berendt: Advanced databases, winter term 2007/08, 28 Sparse data In some applications most attribute values in a dataset are zero n E.g.: word counts in a text categorization problem ARFF supports sparse data This also works for nominal attributes (where the first value corresponds to “zero”)‏ 0, 26, 0, 0, 0,0, 63, 0, 0, 0, “class A” 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, “class B” {1 26, 6 63, 10 “class A”} {3 42, 10 “class B”}

Berendt: Advanced databases, winter term 2007/08, 29 Attribute types Interpretation of attribute types in ARFF depends on learning scheme  Numeric attributes are interpreted as ordinal scales if less-than and greater-than are used ratio scales if distance calculations are performed (normalization/standardization may be required)‏  Instance-based schemes define distance between nominal values (0 if values are equal, 1 otherwise)‏ Integers in some given data file: nominal, ordinal, or ratio scale?

Berendt: Advanced databases, winter term 2007/08, 30 Nominal vs. ordinal Attribute “age” nominal Attribute “age” ordinal (e.g. “young” < “pre-presbyopic” < “presbyopic”)‏ If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age  pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft

Berendt: Advanced databases, winter term 2007/08, 31 Missing values Frequently indicated by out-of-range entries  Types: unknown, unrecorded, irrelevant  Reasons: malfunctioning equipment changes in experimental design collation of different datasets measurement not possible Missing value may have significance in itself (e.g. missing test in a medical examination)‏  Most schemes assume that is not the case: “missing” may need to be coded as additional value  “?” in ARFF denotes a missing value

Berendt: Advanced databases, winter term 2007/08, 32 Inaccurate values Reason: data has not been collected for mining it Result: errors and omissions that don’t affect original purpose of data (e.g. age of customer)‏ Typographical errors in nominal attributes  values need to be checked for consistency Typographical and measurement errors in numeric attributes  outliers need to be identified Errors may be deliberate (e.g. wrong zip codes)‏ Other problems: duplicates, stale data

Berendt: Advanced databases, winter term 2007/08, 33 Getting to know the data Simple visualization tools are very useful  Nominal attributes: histograms (Distribution consistent with background knowledge?)‏  Numeric attributes: graphs (Any obvious outliers?)‏ 2-D and 3-D plots show dependencies Need to consult domain experts Too much data to inspect? Take a sample!

Berendt: Advanced databases, winter term 2007/08, 34 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 35 Decision trees “Divide-and-conquer” approach produces tree Nodes involve testing a particular attribute Usually, attribute value is compared to constant Other possibilities: Comparing values of two attributes Using a function of one or more attributes Leaves assign classification, set of classifications, or probability distribution to instances Unknown instance is routed down the tree

Berendt: Advanced databases, winter term 2007/08, 36 Nominal and numeric attributes Nominal: number of children usually equal to number values  attribute won’t get tested more than once Other possibility: division into two subsets Numeric: test whether value is greater or less than constant  attribute may get tested several times Other possibility: three-way split (or multi-way split)‏ Integer: less than, equal to, greater than Real: below, within, above

Berendt: Advanced databases, winter term 2007/08, 37 Missing values Does absence of value have some significance? Yes  “missing” is a separate value No  “missing” must be treated in a special way  Solution A: assign instance to most popular branch  Solution B: split instance into pieces Pieces receive weight according to fraction of training instances that go down each branch Classifications from leave nodes are combined using the weights that have percolated to them

Berendt: Advanced databases, winter term 2007/08, 38 Classification rules Popular alternative to decision trees Antecedent (pre-condition): a series of tests (just like the tests at the nodes of a decision tree)‏ Tests are usually logically ANDed together (but may also be general logical expressions)‏ Consequent (conclusion): classes, set of classes, or probability distribution assigned by rule Individual rules are often logically ORed together  Conflicts arise if different conclusions apply

Berendt: Advanced databases, winter term 2007/08, 39 An example If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes

Berendt: Advanced databases, winter term 2007/08, 40 Trees for numeric prediction Regression: the process of computing an expression that predicts a numeric quantity Regression tree: “decision tree” where each leaf predicts a numeric quantity n Predicted value is average value of training instances that reach the leaf Model tree: “regression tree” with linear regression models at the leaf nodes n Linear patches approximate continuous function

Berendt: Advanced databases, winter term 2007/08, 41 An example …………… 40FalseNormalMildRainy 55FalseHighHotOvercast 0TrueHighHotSunny 5FalseHighHotSunny Play-timeWindyHumidityTemperatureOutlook

Berendt: Advanced databases, winter term 2007/08, 42 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 43 Constructing decision trees Strategy: top down Recursive divide-and-conquer fashion  First: select attribute for root node Create branch for each possible attribute value  Then: split instances into subsets One for each branch extending from the node  Finally: repeat recursively for each branch, using only instances that reach the branch Stop if all instances have the same class

Berendt: Advanced databases, winter term 2007/08, 44 Which attribute to select?

Berendt: Advanced databases, winter term 2007/08, 45 Which attribute to select?

Berendt: Advanced databases, winter term 2007/08, 46 Criterion for attribute selection Which is the best attribute?  Want to get the smallest tree  Heuristic: choose the attribute that produces the “purest” nodes Popular impurity criterion: information gain  Information gain increases with the average purity of the subsets Strategy: choose attribute that gives greatest information gain

Berendt: Advanced databases, winter term 2007/08, 47 Computing information Measure information in bits  Given a probability distribution, the info required to predict an event is the distribution’s entropy  Entropy gives the information required in bits (can involve fractions of bits!)‏ Formula for computing the entropy:

Berendt: Advanced databases, winter term 2007/08, 48 Example: attribute Outlook

Berendt: Advanced databases, winter term 2007/08, 49 Computing information gain Information gain: information before splitting – information after splitting Information gain for attributes from weather data: gain(Outlook ) = bits gain(Temperature ) = bits gain(Humidity ) = bits gain(Windy ) = bits gain(Outlook )= info([9,5]) – info([2,3],[4,0],[3,2])‏ = – = bits

Berendt: Advanced databases, winter term 2007/08, 50 Continuing to split gain(Temperature )= bits gain(Humidity ) = bits gain(Windy )= bits

Berendt: Advanced databases, winter term 2007/08, 51 Final decision tree Note: not all leaves need to be pure; sometimes identical instances have different classes  Splitting stops when data can’t be split any further

Berendt: Advanced databases, winter term 2007/08, 52 Wishlist for a purity measure Properties we require from a purity measure:  When node is pure, measure should be zero  When impurity is maximal (i.e. all classes equally likely), measure should be maximal  Measure should obey multistage property (i.e. decisions can be made in several stages): Entropy is the only function that satisfies all three properties!

Berendt: Advanced databases, winter term 2007/08, 53 Properties of the entropy The multistage property: Simplification of computation: Note: instead of maximizing info gain we could just minimize information

Berendt: Advanced databases, winter term 2007/08, 54 Discussion / outlook decision trees Top-down induction of decision trees: ID3, algorithm developed by Ross Quinlan  Various improvements, e.g.  C4.5: deals with numeric attributes, missing values, noisy data  Gain ratio instead of information gain (see Witten & Frank slides, ch. 4, pp ) Similar approach: CART …

Berendt: Advanced databases, winter term 2007/08, 55 Weather data with mixed attributes Some attributes have numeric values …………… YesFalse8075Rainy YesFalse8683Overcast NoTrue9080Sunny NoFalse85 Sunny PlayWindyHumidityTemperatureOutlook If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = yes If none of the above then play = yes

Berendt: Advanced databases, winter term 2007/08, 56 Dealing with numeric attributes Discretize numeric attributes Divide each attribute’s range into intervals  Sort instances according to attribute’s values  Place breakpoints where class changes (majority class)‏  This minimizes the total error Example: temperature from weather data Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No …………… YesFalse8075Rainy YesFalse8683Overcast NoTrue9080Sunny NoFalse85 Sunny PlayWindyHumidityTemperatureOutlook

Berendt: Advanced databases, winter term 2007/08, 57 The problem of overfitting This procedure is very sensitive to noise  One instance with an incorrect class label will probably produce a separate interval Also: time stamp attribute will have zero errors Simple solution: enforce minimum number of instances in majority class per interval Example (with min = 3): Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No

Berendt: Advanced databases, winter term 2007/08, 58 Agenda Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning

Berendt: Advanced databases, winter term 2007/08, 59 Motivation: Fictitious E-commerce example – who is a big spender? Typical possible result so far: However, this might be the case:

Berendt: Advanced databases, winter term 2007/08, 60 On terminology (1) Reduces to propositional logic  „Propositional rule“ Info from multiple relations from a relational DB; expressed in a subset of first-order logic (aka predicate l. aka relational logic)  „Relational rule“

Berendt: Advanced databases, winter term 2007/08, 61 On terminology (2) Info from multiple relations from a relational DB; expressed in a subset of first-order logic (aka predicate l. aka relational logic)  „Relational rule“ „relational data mining“ „multirelational data mining“ Intersection of machine learning and logic programming: „inductive logic programming“

Berendt: Advanced databases, winter term 2007/08, 62 The (most common) problem: Learning the logical definition of a relation n Given: l (Training) examples: tuples that belong or do not belong to the target relation l Background knowledge: other relation definitions etc. in the same logical language n Find: l A predicate definition that defines the target relation in terms of the relations from the background knowledge n Formally (learning from entailment setting, Muggleton 1991)

Berendt: Advanced databases, winter term 2007/08, 63 Enter real data In most cases: must relax criteria of consistency and completeness  Statistical criteria

Berendt: Advanced databases, winter term 2007/08, 64 Example Given: Learn the predicate definition:

Berendt: Advanced databases, winter term 2007/08, 65 First solution idea: join or aggregate towards a single table Slightly modify the example: n customer(CustID; Name; Age; SpendsALot) n purchase(CustID; ProductID; Date; Value; PaymentMode) Combination 1: natural join  purchase1(CustID; ProductID; Date; Value; PaymentMode; Name; Age; SpendsALot) Combination 2: aggregation  customer1(CustID; Age; NOfPurchases; TotalValue; SpendsALot) Can we find this pattern?:

Berendt: Advanced databases, winter term 2007/08, 66 Second solution idea: A cleverer way of propositionalization n Early approach: LINUS (Lavrac, Dzeroski, & Grobelnik, 1991) n Use the background knowledge to create new attributes n Works for a restricted class of problems: Function-free program clauses which are l typed (each var. Is associated with a predetermined set of values), l Constrained (all var.s in the body of a clause also appear in the head), and l Nonrecursive (the predicate symbol in the head does not appear in any of the literals in the body)

Berendt: Advanced databases, winter term 2007/08, 67 Third solution idea: Upgrade propositional approaches n MRDM algorithms have much in common with their propositional counterparts: l Learning as search (seach for patterns valid in the given data) l A lattice of hypotheses as search space n Key differences: l Representation of data and patterns l Refinement operators l Testing coverage (whether a rule explains an example) n A general „recipe“ for upgrading: Van Laer & De Raedt (2001) l Keep as much as possible, upgrade only the key notions needed

Berendt: Advanced databases, winter term 2007/08, 68 An example: First-order decision trees: TILDE (Blockeel & De Raedt, 1998) with propositional counterpart and special case C4.5 This tree predicts the maintenance action that should be taken on machine M: maintenance(M,A) How to expand a node? Search in the space of clauses (cf. prop.: attributes/values) use constraints to limit no. of candidates („declarative bias“) Each refinement operator: -- App lies a sub stit utio n to the cla use -- Add s a liter al to the bod y of the cla use

Berendt: Advanced databases, winter term 2007/08, 69 Major difference between propositional and relational dec. trees n Is in the tests that can appear in internal nodes n Each test is a query: a conjuntion of literals with existentially quantified variables n If the query succeeds  take the „yes“ branch n Variables can be shared among nodes n The actual test in a node = the conjunction of literals in the node itself and the literals on the path from the root of the tree

Berendt: Advanced databases, winter term 2007/08, 70 Why order matters (another example) Forall x Exists y : haspart(x,y) Exists y Forall x : haspart(x,y)

Berendt: Advanced databases, winter term 2007/08, 71 Interpretation of the tree in terms of a decision list n List: the order of tests expressed by the clauses matters!

Berendt: Advanced databases, winter term 2007/08, 72 Next lecture Input: Concepts Input: Instances Input: Attributes and levels of measurement Data preparation for WEKA and beyond Output: Decision trees (and related patterns) Algorithm: ID3 (and variants) Multirelational data mining Motivation / Focus on decision tree learning How good are these models? Evaluation

Berendt: Advanced databases, winter term 2007/08, 73 References / background reading; acknowledgements n All parts on single-table processing are based on l Witten, I.H., & Frank, E.(2005). Data Mining. Practical Machine Learning Tools and Techniques with Java Implementations. 2nd ed. Morgan Kaufmann. l In particular, pp are based on the instructor slides for that book available at (chapters 1-4): (and...chapter2.pdf, chapter3.pdf, chapter4.pdf) or (and...chapter2.odp, chapter3.odp, chapter4.odp) n The MRDM part is based on D ž eroski, S. (2003). Multi-Relational Data Mining: An Introduction. SIGKDD Explorations 5(1):