Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar * Artificial Intelligence.

Similar presentations


Presentation on theme: "Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar * Artificial Intelligence."— Presentation transcript:

1 Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar * Artificial Intelligence Laboratory Department of Computer Science and Graduate Program in Bioinformatics and Computational Biology Iowa State University Ames, IA 50011, USA www.cs.iastate.edu/~honavar/aigroup/html * Support provided in part by National Science Foundation, Carver Foundation, and Pioneer Hi-Bred, Inc. Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm

2 Motivation Importance of multi-relational learning: Growth of data stored in MRDB Techniques for learning unstructured data often extract the data into MRDB Expanding of the techniques for multi-relational learning: Blockeel’s framework (ILP)(1998) Getoor’s framework (first order extensions of PM)(2001) Knobbe’s framework (MRDM)(1999) Problem: no experimental results available Goals Perform experiments and evaluate performance of the Knobbe’s framework Understand strengths and limits of the approach

3 Multi-Relational Learning Literature  Inductive Logic Programming  First order extensions of probabilistic models  Multi-Relational Data Mining  Propositionalization methods  PRMs extension for cumulative learning for learning and reasoning as agents interact with the world  Approaches for mining data in form of graph Blockeel, 1998; De Raedt, 1998; Knobbe et al., 1999; Friedman et al., 1999; Koller, 1999; Krogel and Wrobel, 2001; Getoor, 2001; Kersting et al., 2000; Pfeffer, 2000; Dzeroski and Lavrac, 2001; Dehaspe and De Raedt, 1997; Dzeroski et al., 2001; Jaeger, 1997; Karalic and Bratko, 1997;Holder and Cook, 2000; Gonzalez et al., 2000 Blockeel, 1998; De Raedt, 1998; Knobbe et al., 1999; Friedman et al., 1999; Koller, 1999; Krogel and Wrobel, 2001; Getoor, 2001; Kersting et al., 2000; Pfeffer, 2000; Dzeroski and Lavrac, 2001; Dehaspe and De Raedt, 1997; Dzeroski et al., 2001; Jaeger, 1997; Karalic and Bratko, 1997; Holder and Cook, 2000; Gonzalez et al., 2000

4 Problem Formulation Example of multi-relational database Given: Data stored in relational data base Goal: Build decision tree for predicting target attribute in the target table schema instances Department d1Math1000 d2Physics300 d3Computer Science400 Staff p1Daled1Professor70 - 80k p2Martind3Postdoc30-40k p3Victord2Visitor Scientist 40-50k p4Davidd3Professor80-100k Graduate Student s1John2.04p1d3 s2Lisa3.510p4d3 s3Michel3.93p4d4 Department ID Specialization #Students Staff ID Name Department Position Salary Grad.Student ID Name GPA #Publications Advisor Department

5 No {d3, d4}{d1, d2} {d1, d2, d3, d4} Tree_induction(D: data) A = optimal_attribute(D) if stopping_criterion (D) return leaf(D) else D left := split(D, A) D right := split complement (D, A) child left := Tree_induction(D left ) child right := Tree_induction(D right ) return node(A, child left, child right ) Propositional decision tree algorithm. Construction phase DayOutlookTemp-reHumidityWindPlay Tennis d1SunnyHotHighWeakNo d2SunnyHotHighStrongNo d3OvercastHotHighWeakYes d4OvercastColdNormalWeakNo Outlook not sunny … … … … Temperature hot not hot No Yes {d3} {d4} sunny DayOutlookTempHum-tyWindPlayT d1SunnyHotHighWeakNo d2SunnyHotHighStrongNo DayOutlookTempHum-tyWindPlayT d3OvercastHotHighWeakYes d4OvercastColdNormalWeakNo

6 MR setting. Splitting data with Selection Graphs IDSpecialization#Students d1Math1000 d2Physics300 d3Computer Science400 DepartmentGraduate Student IDNameDepartmentPositionSalary p1Daled1Professor70 - 80k p2Martind3Postdoc30-40k p3Victord2Visitor Scientist 40-50k p4Davidd3Professor80-100k Staff IDNameGPA#Public.AdvisorDepartment s1John2.04p1d3 s2Lisa3.510p4d3 s3Michel3.93p4d4 Staff Grad. Student GPA >2.0 Department Staff Grad.Student complement selection graphs StaffGrad. Student GPA >2.0 StaffGrad. Student IDNameDepartmentPositionSalary p1Daled1Professor70-80k IDNameDepartmentPositionSalary p4Davidd3Professor80-100k IDNameDepartmentPositionSalary p2Martind3Postdoc30-40k p3Victord2Visitor Scientist 40-50k

7 What is selection graph? Staff Grad.Student GPA >3.9 Grad.Student Department It corresponds to the subset of the instances from target table Nodes correspond to the tables from the database Edges correspond to the associations between tables Open edge = “have at least one” Closed edge = “have non of ” Department Staff Grad.Student Specialization =math

8 Automatic transforming selection graphs into SQL query Staff Grad. Student Select T0.id Select distinct T0.id From From Staff Where T0.position=Professor Position = Professor Select T0.id Select distinct T0.id From T0, Graduate_Student T1 From Staff T0, Graduate_Student T1 Where T0.id=T1.Advisor Select T0.id Select distinct T0.id From T0 From Staff T0 Where T0.id not in ( Select T1. id ( Select T1. id From Graduate_Student T1) From Graduate_Student T1) GPA >3.9 Select distinct T0. id Graduate_Student T1 From Staff T0, Graduate_Student T1 T0.id=T1.Advisor Where T0.id=T1.Advisor T0. id not in ( Select T1. id From Graduate_Student T1 From Graduate_Student T1 Where T1.GPA > 3.9) Where T1.GPA > 3.9) Generic query: select distinct T0.primary_key from table_list where join_list and condition_list

9 MR decision tree Staff …… …… … … Grad. Student GPA >3.9 Grad.Student Each node contains selection graph Each children selection graph is a supergraph of the parent selection graph

10 How to choose selection graphs in nodes? Problem: There are too many supergraph selection graphs to choose from in each node Solution: start with initial selection graph find greedy heuristic to choose supergraph selection graphs: refinements use binary splits for simplicity for each refinement get complement refinement choose the best refinement based on information gain criterion Problem: Some potentially good refinements may give no immediate benefit Solution: look ahead capability Staff …… …… … … Grad. Student GPA >3.9 Grad.Student

11 Refinements of selection graph add condition to the node - explore attribute information in the tables add present edge and open node – explore relational properties between the tables Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student Specialization =math

12 Refinements of selection graph Position = Professor Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Position != Professor Staff Grad.Student GPA >3.9 Grad.Student Department refinement complement refinement Department Staff Grad.Student add condition to the node add condition to the node add present edge and open node Specialization =math

13 Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department GPA >2.0 Staff Grad.Student GPA >3.9 Grad.Student Department Grad.Student GPA >2.0 Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student add condition to the node add condition to the node add present edge and open node refinement complement refinement Specialization =math

14 Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department #Students >200 Staff Grad.Student GPA >3.9 Grad.Student Department #Students >200 Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student add condition to the node add condition to the node add present edge and open node refinement complement refinement Specialization =math

15 Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student add condition to the node add present edge and open node add present edge and open node refinement complement refinement Note: information gain = 0 Specialization =math

16 Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student refinement complement refinement add condition to the node add present edge and open node add present edge and open node Specialization =math

17 Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student DepartmentStaff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student refinement complement refinement add condition to the node add present edge and open node add present edge and open node Specialization =math

18 Refinements of selection graph Staff Grad.Student GPA >3.9 Grad.Student DepartmentGrad.S Staff Grad.Student GPA >3.9 Grad.Student DepartmentGrad.S Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student refinement complement refinement add condition to the node add present edge and open node add present edge and open node Specialization =math

19 Look ahead capability Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department refinement complement refinement Specialization =math

20 Look ahead capability Department Staff Grad.Student #Students > 200 Staff Grad.Student GPA >3.9 Grad.Student Department refinement complement refinement #Students > 200 Staff Grad.Student GPA >3.9 Grad.Student Department Staff Grad.Student GPA >3.9 Grad.Student Department Specialization =math

21 MR decision tree algorithm. Construction phase Staff …… …… …… Grad.Student StaffGrad. Student GPA >3.9 Staff Grad.Student GPA >3.9 Grad.Student for each non-leaf node: consider all possible refinements and their complements of the node’s selection graph choose the best ones based on information gain criterion create children nodes

22 MR decision tree algorithm. Classification phase Staff …… …… … … Grad. Student GPA >3.9 Grad.Student StaffGrad. Student GPA >3.9 Department Spec=math StaffGrad. Student GPA >3.9 Department Spec=physics Position = Professor …………….. 70-80k80-100k for each leaf: apply selection graph of the leaf to the test data classify resulting instances with classification of the leaf

23 Experimental results. Mutagenesis Most widely DB used in ILP. Describes molecules of certain nitro aromatic compounds. Goal: predict their mutagenic activity (label attribute) – ability to cause DNA to mutate. High mutagenic activity can cause cancer. Class distribution. CompoundsActiveInactiveTotal Regression friendly12563188 Regression unfriendly132942 Total13892230 5 levels of background knowledge: B0, B1, B2, B3, B4. They provide richer descriptions of the examples. The first three levels (B0, B1, B2) are used only.

24 Experimental results. Mutagenesis Results of 10-fold cross-validation for regression friendly set. SystemsAccuracy (%)Time (secs.) B0B1B2B0B1B2 Progol7986 859546276530 Progol768183117k64k42k FOIL61 83495091380.5 TILDE75798541170142 MRDTL6787880.85332221 Size of decision trees. SystemsNumber of nodes B0B1B2 MRDTL15351

25 Experimental results. Mutagenesis Results of leave-one-out cross-validation for regression unfriendly set. BackgroundAccuracyTime#Nodes B070%0.6 secs.1 B181%86 secs.24 B281%60 secs.22 Two recent approaches (Sebag and Rauveirol, 1997) and (Kramer and De Raedt, 2001) using B3 have achieved 93.6% and 94.7%, respectively for mutagenesis database.

26 Experimental results. KDD Cup 2001 Consists of a variety of details about the various genes of one particular type of organism. Genes code for proteins, and these proteins tend to localize in various parts of cells and interact with one another in order to perform crucial functions. Task: Prediction of gene/protein localization (15 possible values) Target table: Gene Target attribute: Localization 862 training genes, 381 test genes.  Challenge: many attribute values are missing.  Approach: using a special value to encode a missing value. Result: accuracy of 50% Have to find good techniques for filling in missing values.

27 Experimental results. KDD Cup 2001 Approach: Replacing missing values by the most common value of the attribute for the class. Results: - accuracy of around 85% with a decision tree of 367 nodes, with no limit in the number of times an association can be instantiated. - accuracy of 80%, when limiting the number of times an association can be instantiated. - accuracy of around 75% is obtained when following associations only in the forward direction. This shows that providing reasonable guesses for missing values can significantly enhance the performance of MRDTL on real world data sets. In practice, since the class labels for test data are unknown, it is not possible to apply this method. Approach: Extension of the Naïve Bayes algorithm for relational data Result: -no improvement comparing to the first approach Have to incorporate handling missing values into decision tree algorithm

28 Experimental results. Adult database Result after removal of missing values and using original train/test split: 82.2%. Filling missing values with Naïve Bayes approach yields 83% C4.5 result: 84.46% TrainingTestTotal >50k<=50k>50k<=50k With missing values7841 2472038461243548842 W/o missing values75082265437001136045222 Suitable for propositional learning. One table, 6 numerical attributes, 8 nominal attributes. Information from 1994 census. Task: determine whether a person makes over 50k a year. Class distribution for adult database:

29 Summary the algorithm is a promising alternative to existing algorithms, such as Progol, Foil, and Tilde the running time is comparable with the best existing approaches if equipped with principled approaches to handle missing values it is an effective algorithm for learning real-world relational data the approach is an extension of propositional learning, and can be successfully applied for propositional learning Questions: - why can’t we split the data based on the value of the attribute in arbitrary table right away? - is there less restrictive and more simple way of representing the splits of data than selection graphs? - the running time for computing the first nodes in decision tree is much less then for the rest of the nodes. Is it unavoidable? Can we implement the same idea more efficiently?

30 Future work Incorporation of the more sophisticated techniques for handling missing values Incorporating of more sophisticated pruning techniques or complexity regularizations More extensive evaluation of MRDTL on real-world data sets Development of ontology-guided multi-relational decision tree learning algotihms to generate classifiers at multiple levels of abstraction [Zhang et al., 2002] Development of variants of MRDTL for classification tasks where the classes are not disjoint, based on the recently developed propositional decision tree counterparts of such algorithms [Caragea et al., 2002] Development of variants of MRDTL that can learn from heterogeneous, distributed, autonomous data sources, based on recently developed techniques for distributed learning and ontology based data integration


Download ppt "Experiments with MRDTL – A Multi-relational Decision Tree Learning Algorithm Hector Leiva, Anna Atramentov and Vasant Honavar * Artificial Intelligence."

Similar presentations


Ads by Google