Structure Refinement in First Order Conditional Influence Language Sriraam Natarajan, Weng-Keen Wong, Prasad Tadepalli School of EECS, Oregon State University.

Slides:



Advertisements
Similar presentations
A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan Fern School.
Advertisements

Probabilistic analog of clustering: mixture models
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Unsupervised Learning
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Model Assessment, Selection and Averaging
Back-propagation Chih-yun Lin 5/16/2015. Agenda Perceptron vs. back-propagation network Network structure Learning rule Why a hidden layer? An example:
Statistical Classification Rong Jin. Classification Problems X Input Y Output ? Given input X={x 1, x 2, …, x m } Predict the class label y  Y Y = {-1,1},
Model assessment and cross-validation - overview
Learning First-Order Probabilistic Models with Combining Rules Sriraam Natarajan Prasad Tadepalli Eric Altendorf Thomas G. Dietterich Alan Fern Angelo.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Nir Friedman, Iftach Nachman, and Dana Peer Announcer: Kyu-Baek Hwang
Overview Full Bayesian Learning MAP learning
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
COMP 328: Midterm Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
© Daniel S. Weld 1 Naïve Bayes & Expectation Maximization CSE 573.
Learning Bayesian Networks
Thanks to Nir Friedman, HU
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks.
Boosting Markov Logic Networks
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
A Brief Introduction to Graphical Models
Issues with Data Mining
Optimal predictions in everyday cognition Tom Griffiths Josh Tenenbaum Brown University MIT Predicting the future Optimality and Bayesian inference Results.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
GATree: Genetically Evolved Decision Trees 전자전기컴퓨터공학과 데이터베이스 연구실 G 김태종.
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
INTRODUCTION TO Machine Learning 3rd Edition
Slides for “Data Mining” by I. H. Witten and E. Frank.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Advances in Bayesian Learning Learning and Inference in Bayesian Networks Irina Rish IBM T.J.Watson Research Center
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
A Cooperative Coevolutionary Genetic Algorithm for Learning Bayesian Network Structures Arthur Carvalho
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Feature Generation and Selection in SRL Alexandrin Popescul & Lyle H. Ungar Presented By Stef Schoenmackers.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
Lecture 1.31 Criteria for optimal reception of radio signals.
Boosted Augmented Naive Bayes. Efficient discriminative learning of
CS 9633 Machine Learning Inductive-Analytical Methods
CH 5: Multivariate Methods
Lecture 15: Text Classification & Naive Bayes
Artificial Intelligence
Irina Rish IBM T.J.Watson Research Center
Case-Based Reasoning System for Bearing Design
Combining Species Occupancy Models and Boosted Regression Trees
Learning Markov Networks
CS 188: Artificial Intelligence
Cross-validation for the selection of statistical models
Bayesian Learning Chapter
Learning Probabilistic Graphical Models Overview Learning Problems.
LECTURE 23: INFORMATION THEORY REVIEW
Presentation transcript:

Structure Refinement in First Order Conditional Influence Language Sriraam Natarajan, Weng-Keen Wong, Prasad Tadepalli School of EECS, Oregon State University Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder t.id, r.id Qinf (Mean) d.folder If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder s.folder Qinf (Mean) d.folder} t1.id d.folder s2.folder d.folder r1.id t2.id r2.id s1.folder W 1 W 2 “Unrolled” Network for Folder Prediction First-order Conditional Influence Language (FOCIL) t1.id d.f s2.f r1.id r2.id t2.cd s1.f t1.cd t2.la s1.la s2.la d.f t1.laT2.id Prior Network d.f s2.f r1.id t2.id r2.id s1.f t1.id d.f Learned Network Prior Program Learned Program Conditional BIC score = -2 *CLL + d m logNConditional BIC score = -2 *CLL + d m logN Different instantiations of the same rule share parameters Different instantiations of the same rule share parameters Conditional Likelihood: EM – Maximize the joint likelihood Conditional Likelihood: EM – Maximize the joint likelihood CBIC score with penalty scaled down CBIC score with penalty scaled down Greedy Search with random restartsGreedy Search with random restarts Scoring metric Folder Prediction RankExhaustive -RHC+RR - RExhaustive - IHC+RR - I Score Synthetic Data Set Irrelevant attributes Relevant attributes Data is expensive – Exploit prior knowledge in structure search Data is expensive – Exploit prior knowledge in structure search Derived the CBIC score for our setting Derived the CBIC score for our setting Learned the “true” network in the synthetic dataset Learned the “true” network in the synthetic dataset Folder dataset: Learned the best network with only relevant attributes Folder dataset: Learned the best network with only relevant attributes Folder dataset with irrelevant attributes: Folder dataset with irrelevant attributes: Conclusions CBIC l earne d < CBIC b es t Different scoring metrics BDeu BDeu Bias/Variance Bias/Variance Choose the best combining rule that fits the data Choose the best combining rule that fits the data Structure refinement in large real-world domains Structure refinement in large real-world domains Future work What is the correct complexity penalty in the presence of multi-valued variables? What is the correct complexity penalty in the presence of multi-valued variables? Counting the # of parameters may not be the right solution Counting the # of parameters may not be the right solution What is the right scoring metric in relational setting for classification? What is the right scoring metric in relational setting for classification? Can the search space be intelligently pruned? Can the search space be intelligently pruned? Issues Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id Qinf (Mean) d.folder t.id, r.id Qinf (Mean) d.folder If {doc(s), doc(d), source(s,d) } then s.folder Qinf (Mean) d.folder s.folder Qinf (Mean) d.folder} Weighted Mean{ If {task(t), doc(d), role(d,r,t)} then t.id, r.id, t.creationDate, t.lastAccessed Qinf (Mean) d.folder t.id, r.id, t.creationDate, t.lastAccessed Qinf (Mean) d.folder If {doc(s), doc(d), source(s,d) } then s.folder, s.lastAccessed Qinf (Mean) d.folder s.folder, s.lastAccessed Qinf (Mean) d.folder}