K2 Algorithm Presentation KDD Lab, CIS Department, KSU

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

A Tutorial on Learning with Bayesian Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
“Using Weighted MAX-SAT Engines to Solve MPE” -- by James D. Park Shuo (Olivia) Yang.
Lauritzen-Spiegelhalter Algorithm
Bayesian Networks. Introduction A problem domain is modeled by a list of variables X 1, …, X n Knowledge about the problem domain is represented by a.
Applied Probability Lecture 5 Tina Kapur
IMPORTANCE SAMPLING ALGORITHM FOR BAYESIAN NETWORKS
Statistical NLP: Lecture 11
Graduate School of Information Sciences, Tohoku University
Complexity 26-1 Complexity Andrei Bulatov Interactive Proofs.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Algorithms in Exponential Time. Outline Backtracking Local Search Randomization: Reducing to a Polynomial-Time Case Randomization: Permuting the Evaluation.
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Aspects of Bayesian Inference and Statistical Disclosure Control in Python Duncan Smith Confidentiality and Privacy Group CCSR University of Manchester.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
The Fast Optimal Voltage Partitioning Algorithm For Peak Power Density Minimization Jia Wang, Shiyan Hu Department of Electrical and Computer Engineering.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Comp 538 Course Presentation Discrete Factor Analysis Learning Hidden Variables in Bayesian Network Calvin Hua & Lily Tian Computer Science Dep, HKUST.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 28 of 41 Friday, 22 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Generalizing Variable Elimination in Bayesian Networks 서울 시립대학원 전자 전기 컴퓨터 공학과 G 박민규.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Kansas State University Department of Computing and Information Sciences Laboratory for Knowledge Discovery in Databases (KDD) KDD Group Research Seminar.
Slides for “Data Mining” by I. H. Witten and E. Frank.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
An Introduction to Variational Methods for Graphical Models
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Dependency Networks for Collaborative Filtering and Data Visualization UAI-2000 발표 : 황규백.
Lecture 2: Statistical learning primer for biologists
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Complexity 24-1 Complexity Andrei Bulatov Interactive Proofs.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Meredith L. Wilcox FIU, Department of Epidemiology/Biostatistics
Exam Preparation Class
Learning Bayesian Network Models from Data
INTRODUCTION TO Machine Learning
Suppose you roll two dice, and let X be sum of the dice. Then X is
Pattern Recognition and Image Analysis
CAP 5636 – Advanced Artificial Intelligence
An Algorithm for Bayesian Network Construction from Data
CS 188: Artificial Intelligence
Bayesian Learning Chapter
Pegna, J.M., Lozano, J.A., and Larragnaga, P.
Discrete Mathematics 7th edition, 2009
Using Bayesian Network in the Construction of a Bi-level Multi-classifier. A Case Study Using Intensive Care Unit Patients Data B. Sierra, N. Serrano,
Presented by Uroš Midić
Chapter 14 February 26, 2004.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

K2 Algorithm Presentation KDD Lab, CIS Department, KSU Learning Bayes Networks from Data Haipeng Guo Friday, April 21, 2000 KDD Lab, CIS Department, KSU

Presentation Outline Bayes Networks Introduction What’s K2? Basic Model and the Score Function K2 algorithm Demo

Bayes Networks Introduction A Bayes network B = (Bs, Bp) A Bayes Network structure Bs is a directed acyclic graph in which nodes represent random domain variables and arcs between nodes represent probabilistic independence. Bs is augmented by conditional probabilities, Bp, to form a Bayes Network B.

Bayes Networks Introduction Example: Sprinkler - Bs of Bayes Network: the structure x1 x2 x3 x4 x5 Season Sprinkler Rain Ground_moist Ground_state

Bayes Networks Introduction - Bp of Bayes Network: the conditional probability season sprinkler Rain , Ground-moist, and Ground-state

What’s K2? K2 is an algorithm for constructing a Bayes Network from a database of records “A Bayesian Method for the Induction of Probabilistic Networks from Data”, Gregory F. Cooper and Edward Herskovits, Machine Learning 9, 1992

Basic Model The problem: to find the most probable Bayes-network structure given a database D – a database of cases Z – the set of variables represented by D Bsi , Bsj – two bayes network structures containing exactly those variables that are in Z

Basic Model By computing such ratios for pairs of bayes network structures, we can rank order a set of structures by their posterior probabilities. Based on four assumptions, the paper introduces an efficient formula for computing P(Bs,D), let B represent an arbitrary bayes network structure containing just the variables in D

Computing P(Bs,D) Assumption 1 The database variables, which we denote as Z, are discrete Assumption 2 Cases occur independently, given a bayes network model Assumption 3 There are no cases that have variables with missing values Assumption 4 The density function f(Bp|Bs) is uniform. Bp is a vector whose values denotes the conditional-probability assignment associated with structure Bs

Computing P(Bs,D) Where D - dataset, it has m cases(records) Z - a set of n discrete variables: (x1, …, xn) ri - a variable xi in Z has ri possible value assignment: Bs - a bayes network structure containing just the variables in Z i - each variable xi in Bs has a set of parents which we represent with a list of variables i qi - there are has unique instantiations of i wij - denote jth unique instantiation of i relative to D. Nijk - the number of cases in D in which variable xi has the value of and i is instantiated as wij. Nij -

Decrease the computational complexity Three more assumptions to decrease the computational complexity to polynomial-time: <1> There is an ordering on the nodes such that if xi precedes xj, then we do not allow structures in which there is an arc from xj to xi . <2> There exists a sufficiently tight limit on the number of parents of any nodes <3> P(i xi) and P(j xj) are independent when i j.

K2 algorithm: a heuristic search method Use the following functions: Where the Nijk are relative to i being the parents of xi and relative to a database D Pred(xi) = {x1, ... xi-1} It returns the set of nodes that precede xi in the node ordering

K2 algorithm: a heuristic search method {Input: A set of nodes, an ordering on the nodes, an upper bound u on the number of parents a node may have, and a database D containing m cases} {Output: For each nodes, a printout of the parents of the node}

K2 algorithm: a heuristic search method Procedure K2 For i:=1 to n do i = ; Pold = g(i, i ); OKToProceed := true while OKToProceed and | i |<u do let z be the node in Pred(xi)- i that maximizes g(i, i {z}); Pnew = g(i, i {z}); if Pnew > Pold then Pold := Pnew ; i :=i {z} ; else OKToProceed := false; end {while} write(“Node:”, “parents of this nodes :”, i ); end {for} end {K2}

Conditional probabilities Let ijk denote the conditional probabilities P(xi =vik | i = wij )-that is, the probability that xi has value v for some k from 1 to ri , given that the parents of x , represented by , are instantiated as wij. We call ijk a network conditional probability. Let  be the four assumptions. The expected value of ijk :

The dataset is generated from the following structure: Demo Example Input: The dataset is generated from the following structure: x1 x2 x3

Demo Example Note: -- use log[g(i, i )] instead of g(i, i ) to save running time