Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Bayesian network for gene regulatory network construction
Markov Networks Alan Ritter.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
“Using Weighted MAX-SAT Engines to Solve MPE” -- by James D. Park Shuo (Olivia) Yang.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
Experiments We measured the times(s) and number of expanded nodes to previous heuristic using BFBnB. Dynamic Programming Intuition. All DAGs must have.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Bayesian Network Representation Continued
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Simulation and Application on learning gene causal relationships Xin Zhang.
Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Bayes Net Perspectives on Causation and Causal Inference
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A Brief Introduction to Graphical Models
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Slides for “Data Mining” by I. H. Witten and E. Frank.
An Introduction to Variational Methods for Graphical Models
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
K2 Algorithm Presentation KDD Lab, CIS Department, KSU
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Lecture 2: Statistical learning primer for biologists
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Pattern Recognition and Machine Learning
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
INTRODUCTION TO Machine Learning 2nd Edition
CS 2750: Machine Learning Directed Graphical Models
CHAPTER 16: Graphical Models
Prof. Adriana Kovashka University of Pittsburgh April 4, 2017
Markov Properties of Directed Acyclic Graphs
Bayesian Networks: Motivation
Regulation Analysis using Restricted Boltzmann Machines
An Algorithm for Bayesian Network Construction from Data
Markov Random Fields Presented by: Vladan Radosavljevic.
Graduate School of Information Sciences, Tohoku University
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Presentation transcript:

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana

Outline  Introduction  Learning single Bayes networks from data  Learning from related tasks  Experimental results  Conclusions

Introduction Graphical model: Node represents random variables; edge represents dependency. Undirected graphical model: Markov network Directed graphical model: Bayesian network x1x1 x2x2 x3x3 x4x4 Causal relationships between nodes; Directed acyclic graph (DAG) : No directed cycles allowed; B={ G,θ }

Introduction Goal: simultaneously learn Bayes Net structures for multiple tasks. Different tasks are related; Structures might be similar, but not identical. Example: gene expression data. 1) Learning one single structure from data. 2) Generalizing to multiple task learning by setting joint prior of structures.

Single Bayesian network learning from data Bayes Network B={ G, θ }, including a set of n random variables X ={ X 1, X 2,…, X n } Joint probability P ( X) can be factorized by Given dataset D={x 1, x 2, …, x m }, where x i = (x 1,x 2,…,x n ), we can learn structure G and parameter θ from the dataset D.

Single Bayesian network learning from data Model selection : find the highest P(G|D) for all possible G Searching for all possible G is impossible:  n=4, there are 543 possible DAGs  n=10, there are O(10 18 ) possible DAGs Question: How to search the best structure in the huge amount of possible DAGs?

Algorithm: 1) Randomly generate an initial DAG, evaluate its score; 2) Evaluate the scores of all the neighbors of current DAG; 3) while {some neighbors have higher scores than current DAG} move to the neighbor that has the highest score Evaluate the scores of all the neighbors of the new DAG; end 4) Repeat (1) - (3) a number of times starting from different DAG every time. Single Bayesian network learning from data

Neighbors of a structure G: the set of all the DAGs that can be obtained by adding, removing or reversing an edge in G Single Bayesian network learning from data  Must satisfy acyclic constraint x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4 x1x1 x2x2 x3x3 x4x4

Given iid dataset D 1, D 2, …, D k, Simultaneously learn the structure B 1 ={G 1, θ 1 },B 2 ={G 2, θ 2 },…,B k ={G k, θ k } Structures (G 1,G 2,…,G k ) – similar, but not identical Learning from related task

One more assumption: the parameters of different networks are independent: Not true, but make structure learning more efficient. Since we focus on structure learning, not parameter learning, this is acceptable.

Learning from related task Prior:  If structures are not related: G 1,…,G k are independent a priori Structures are learned independently for each task.  If structures are identical, Learning the same structure: Learning the single structure under the restriction that TSK is always the parent of all the other nodes. Common structure: remove node TSK and all the edges connected to it.

Learning from related task Prior:  Between independent and identical: Penalize each edge ( X i, X j ) that is different in two DAGs δ=0: independent δ=1: identical 0<δ<1 For the k task prior

Learning from related task Model selection : find the highest P(G 1,…,G k |D 1,…D k )  Same idea as single task structure learning.  Question: what is a neighbor of (G 1,…,G k ) ? Def 1: Size of neighbors: O( n 2k ) Def 2: Def1 + one more constraint: All the changes of edges happen between the same two nodes for all DAGs in ( G 1,…, G k ) Size of neighbors: O( n 2 3 k )

Learning from related task Acceleration : At each iteration, algorithm must find best score from a set of neighbors Not necessary search all the elements in The first i tasks are specified and the rest k-i tasks are not specified. where is the upper bound of the neighbor subset

Results  Original network, delete edges with probability P del, create 5 tasks.  1000 data points.  10 trials  Compute KL-divergence and editing distance between learned structure and true structure. KL-divergenceEditing distance

Learning from related task