Parameter and Structure Learning Dhruv Batra, 10-708 Recitation 10/02/2008.

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Advertisements

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7. Statistical Estimation and Sampling Distributions
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Parameter Learning in Markov Nets Dhruv Batra, Recitation 11/13/2008.
Maximum likelihood (ML) and likelihood ratio (LR) test
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Visual Recognition Tutorial
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Maximum likelihood (ML)
1 EM for BNs Graphical Models – Carlos Guestrin Carnegie Mellon University November 24 th, 2008 Readings: 18.1, 18.2, –  Carlos Guestrin.
Estimation Basic Concepts & Estimation of Proportions
STATISTICAL INFERENCE PART I POINT ESTIMATION
Random Sampling, Point Estimation and Maximum Likelihood.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.
1 Bayesian Param. Learning Bayesian Structure Learning Graphical Models – Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings:
Chapter 7 Point Estimation
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
CLASS: B.Sc.II PAPER-I ELEMENTRY INFERENCE. TESTING OF HYPOTHESIS.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
CSE 517 Natural Language Processing Winter 2015
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Conditional Expectation
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Statistical Estimation
STATISTICS POINT ESTIMATION
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
ECE 5424: Introduction to Machine Learning
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
ECE 5424: Introduction to Machine Learning
Maximum Likelihood Estimation
ECE 5424: Introduction to Machine Learning
Random Sampling Population Random sample: Statistics Point estimate
Important Distinctions in Learning BNs
Sampling Distribution
Sampling Distribution
Statistical Assumptions for SLR
POINT ESTIMATOR OF PARAMETERS
Lecture 2 Interval Estimation
Parameter Learning 2 Structure Learning 1: The good
BN Semantics 3 – Now it’s personal! Parameter Learning 1
12. Principles of Parameter Estimation
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Applied Statistics and Probability for Engineers
Maximum Likelihood Estimation (MLE)
Learning Bayesian networks
Presentation transcript:

Parameter and Structure Learning Dhruv Batra, Recitation 10/02/2008

Overview Parameter Learning –Classical view, estimation task –Estimators, properties of estimators –MLE, why MLE? –MLE in BNs, decomposability Structure Learning –Structure score, decomposable scores –TAN, Chow-Liu –HW2 implementation steps

Note Plagiarism alert –Some slides taken from others –Credits/references at the end

Parameter Learning Classical statistics view / Point Estimation –Parameters unknown but not random –Point estimation = “find the right parameter” –Estimate parameters (or functions of parameters) of the model from data Estimators –Any statistic –Function of data alone Say you have a dataset –Need to estimate mean –Is 5, an estimator? –What would you do?

Properties of estimator Since estimator gives rise an estimate that depends on sample points (x1,x2,,,xn) estimate is a function of sample points. Sample points are random variable therefore estimate is random variable and has probability distribution. We want that estimator to have several desirable properties like Consistency Unbiasedness Minimum variance In general it is not possible for an estimator to have all these properties.

So why MLE? MLE has some nice properties –MLEs are often simple and easy to compute. –MLEs have asymptotic optimality properties (consistency and efficiency). –MLEs are invariant under reparameterization. –and more..

Let’s try

Back to BNs MLE in BN –Data –Model DAG G –Parameters CPTs –Learn parameters from data x (1) … x (m) Data structure parameters CPTs – P(X i | Pa Xi )

–  Carlos Guestrin Learning the CPTs x (1) … x (m) Data For each discrete variable X i

Example Learning MLE parameters

–  Carlos Guestrin Learning the CPTs x (1) … x (m) Data For each discrete variable X i

–  Carlos Guestrin Maximum likelihood estimation (MLE) of BN parameters – example Given structure, log likelihood of data: Flu Allergy Sinus Nose

Decomposability Likelihood Decomposition Local likelihood function What’s the difference? Global parameter independence!

Taking derivatives of MLE of BN parameters – General case

Structure Learning Constraint Based –Check independences, learn PDAG –HW1 Score Based –Give a score for all possible structures –Maximize score

Score Based What’s a good score function? How about our old friend, log likelihood? So here’s our score function:

Score Based [Defn]: Decomposable scores Why do we care about decomposable scores? Log likelihood based score decomposes! Need regularization

Score Based Chow-Liu

Score Based Chow-Liu modification for TAN (HW2)

Slide and other credits Zoubin Ghahramani, guest lectures in Andrew Moore tutorial – Lecture slides by Carlos Guestrin