A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers School of Computing Science Simon Fraser University Vancouver, Canada.

Slides:



Advertisements
Similar presentations
Bayesian network classification using spline-approximated KDE Y. Gurwicz, B. Lerner Journal of Pattern Recognition.
Advertisements

Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Probabilistic Reasoning Bayesian Belief Networks Constructing Bayesian Networks Representing Conditional Distributions Summary.
+ Multi-label Classification using Adaptive Neighborhoods Tanwistha Saha, Huzefa Rangwala and Carlotta Domeniconi Department of Computer Science George.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada
Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Jim Little Nov (Textbook 6.3)
Random Regression: Example Target Query: P(gender(sam) = F)? Sam is friends with Bob and Anna. Unnormalized Probability: Oliver Schulte, Hassan Khosravi,
Christine Preisach, Steffen Rendle and Lars Schmidt- Thieme Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim Germany Relational.
School of Computing Science Simon Fraser University Vancouver, Canada.
Modelling Relational Statistics With Bayes Nets School of Computing Science Simon Fraser University Vancouver, Canada Tianxiang Gao Yuke Zhu.
Causal Modelling for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Review: Bayesian learning and inference
Pseudo-Likelihood for Relational Data Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada To appear at SIAM SDM conference.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
CPSC 322, Lecture 28Slide 1 Reasoning Under Uncertainty: More on BNets structure and construction Computer Science cpsc322, Lecture 28 (Textbook Chpt 6.3)
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Today Logistic Regression Decision Trees Redux Graphical Models
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Algorithms for Data Mining and Querying with Graphs Investigators: Padhraic Smyth, Sharad Mehrotra University of California, Irvine Students: Joshua O’
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A simple method for multi-relational outlier detection Sarah Riahi and Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada.
Margin Learning, Online Learning, and The Voted Perceptron SPLODD ~= AE* – 3, 2011 * Autumnal Equinox.
Bayesian Belief Networks. What does it mean for two variables to be independent? Consider a multidimensional distribution p(x). If for two features we.
Using Transactional Information to Predict Link Strength in Online Social Networks Indika Kahanda and Jennifer Neville Purdue University.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Collective Classification A brief overview and possible connections to -acts classification Vitor R. Carvalho Text Learning Group Meetings, Carnegie.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Introduction to Bayesian Networks
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CPSC 322, Lecture 33Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 33 Nov, 30, 2015 Slide source: from David Page (MIT) (which were.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Bayesian Networks.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Foundational Issues Machine Learning 726 Simon Fraser University.
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Reasoning Under Uncertainty: More on BNets structure and construction
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Learning Coordination Classifiers
Read R&N Ch Next lecture: Read R&N
General Graphical Model Learning Schema
Reasoning Under Uncertainty: More on BNets structure and construction
Associative Query Answering via Query Feature Similarity
Read R&N Ch Next lecture: Read R&N
Prepared by: Mahmoud Rafeek Al-Farra
Discriminative Probabilistic Models for Relational Data
Label and Link Prediction in Relational Data
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presentation transcript:

A Hierarchy of Independence Assumptions for Multi-Relational Bayes Net Classifiers School of Computing Science Simon Fraser University Vancouver, Canada

2/18 Outline A Hierarchy of Independence Assumptions Multi-Relational Classifiers Multi-Relational Independence Assumptions Classification Formulas Bayes Nets Evaluation

3/18 Database Tables Student s-idIntelligenceRanking Jack???1 Kim21 Paul12 Professor p-idPopularityTeaching-a Oliver31 Jim21 Course c-idRatingDifficulty A Hierarchy of Independence Assumptions Link-based Classification Target table: Student Target entity: Jack Target attribute (class): Intelligence Tables for Entities, Relationships Can visualize as network Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1 Jack Ranking = 1Diff = 1 Registration

4/18 Extended Database Tables A Hierarchy of Independence Assumptions s-idc.idGradeSatisfactionIntelligenceRankingRatingDifficulty Jack101A1???131 Jack102B2???122 Kim102A12122 Paul101B11231 Student s-idIntelligenceRanking Jack???1 Kim21 Paul12 Course c-idRatingDifficulty Registration s-idc.idGradeSatisfaction Jack101A1 Jack102B2 Kim102A1 Paul101B1

5/18 Multi-Relational Classifiers A Hierarchy of Independence Assumptions Aggregate relational features Propositionalization Example: use average grade Disadvantages: loses information slow to learn (up to several CPU days) Log-Linear Models Example: 1.use number of A,s number of Bs,… 2.ln(P(class)) = Σ x i w i – Z Disadvantage: slow learning Count relational features Log-Linear Models With Independencies + Fast to learn −Independence Assumptions may be only approximately true + Independence Assumptions

6/18 Independence Assumptions A Hierarchy of Independence Assumptions

7/18 Naive Bayes: non-class attributes are independent of each other, given the target class label. s-idc.idGradeSatisfactionIntelligenceRankingRatingDifficulty Jack101A1???131 Jack102B2???122 Kim102A12122 Paul101B11231 Legend: Given the blue information, the yellow columns are independent. Independence Assumptions: Naïve Bayes A Hierarchy of Independence Assumptions

8/18 Path Independence: Links/paths are independent of each other, given the attributes of the linked entities. Naive Bayes: non-class attributes are independent of each other, given the target class label. s-idc.idGradeSatisfactionIntelligenceRankingRatingDifficulty Jack101A1???131 Jack102B2???122 Kim102A12122 Paul101B11231 Legend: Given the blue information, the yellow rows are independent. Path Independence

9/18 Path Independence: Links/paths are independent of each other, given the attributes of the linked entities. Influence Independence: Attributes of the target entity are independent of attributes of related entities, given the target class label. Naive Bayes: non-class attributes are independent of each other, given the target class label. Path-Class Independence: the existence of a link/path is independent of the class label. s-idc.idGradeSatisfactionIntelligenceRankingRatingDifficulty Jack101A1???131 Jack102B2???122 Kim102A12122 Paul101B11231 Legend: Given the blue information, the yellow columns are independent from the orange columns Influence Independence

10/18 Classification Formulas Can rigorously derive log-linear prediction formulas from independence assumptions. Path Independence: predict max class for: log(P(class|target attributes)) + sum over each table, each row: [log(P(class|information in row)) – log(P(class|target atts))] PI + Influence Independence: predict max class for: log(P(class|target attributes)) + sum over each table, each row: [log(P(class|information in row)) – log(prior P(class))] A Hierarchy of Independence Assumptions

11/18 Relationship to Previous Formulas AssumptionPrevious Work with Classification Formula Path Independencenone; our new model. PI + Influence IndependenceHeterogeneous Naive Bayes Classifier Manjunath et al. ICPR PI + II + Naive BayesExists + Naive Bayes (single relation only) Getoor, Segal, Taskar, Koller 2001 PI + II + NB + Path-ClassMulti-relational Bayesian Classifier Chen, Han et al. Decision Support Systems 2009 A Hierarchy of Independence Assumptions

12/18 Evaluation A Hierarchy of Independence Assumptions

13/18 Data Sets and Base Classifier BiopsyPatient Out-Hosp In-Hosp Interferon Hepatitis Country Borders Continent Country2 Economy Government Mondial LoanAccount Order Transaction DispositionDistrict CardClient Financial Standard Databases KDD Cup, UC Irvine MovieLens not shown. Classifier Can plug in any single- table probabilistic base classifier with classification formula. We use Bayes nets.

14/18 Qualitative part: Directed acyclic graph (DAG) Nodes - random vars. Edges - direct influence Quantitative part: Set of conditional probability distributions e b e be b b e BE P(A | E,B) Family of Alarm Earthquake Radio Burglary Alarm Call Compact representation of joint probability distributions via conditional independence Together: Define a unique distribution in a factored form What is a Bayes net? Figure from N. Friedman

15/18 Independence-Based Learning is Fast A Hierarchy of Independence Assumptions Training Time in seconds weakest assumption strongest assumption

16/18 Independence-Based Models are Accurate A Hierarchy of Independence Assumptions Similar results for F-measure, Area Under Curve weakest assumption strongest assumption

17/18 Conclusion Several plausible independence assumptions/classification formulas investigated in previous work. Organized in unifying hierarchy. New assumption: multi-relational path independence. most general, implicit in other models. Big advantage: Fast scalable simple learning. Plug in single-table probabilistic classifier. Limitation: no pruning or weighting of different tables. Can use logistic regression to learn weights (Bina, Schulte et al. 2013). Bina, B.; Schulte, O.; Crawford, B.; Qian, Z. & Xiong, Y. “Simple decision forests for multi-relational classification”, Decision Support Systems, 2013

18/18 Thank you! A Hierarchy of Independence Assumptions Any questions?