580.691 Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
The Simple Regression Model
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Statistical Estimation and Sampling Distributions
Sampling: Final and Initial Sample Size Determination
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Integration of sensory modalities
The General Linear Model. The Simple Linear Model Linear Regression.
The Simple Linear Regression Model: Specification and Estimation
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
SAMPLING DISTRIBUTIONS. SAMPLING VARIABILITY
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Pattern Recognition Topic 2: Bayes Rule Expectant mother:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Maximum likelihood (ML)
Modern Navigation Thomas Herring
Matrix Approach to Simple Linear Regression KNNL – Chapter 5.
Standard error of estimate & Confidence interval.
Maximum Likelihood Estimation
Estimation Basic Concepts & Estimation of Proportions
AP Statistics Chapter 9 Notes.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Lab 3b: Distribution of the mean
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Machine Learning 5. Parametric Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Chapter 9 Day 2. Warm-up  If students picked numbers completely at random from the numbers 1 to 20, the proportion of times that the number 7 would be.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Estimator Properties and Linear Least Squares
Identify the random variable of interest
Chapter 3: Maximum-Likelihood Parameter Estimation
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Classification of unlabeled data:
ECE 417 Lecture 4: Multivariate Gaussians
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
OVERVIEW OF LINEAR MODELS
Integration of sensory modalities
5.2 Least-Squares Fit to a Straight Line
OVERVIEW OF LINEAR MODELS
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Mathematical Foundations of BME Reza Shadmehr
Probabilistic Surrogate Models
Presentation transcript:

Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models

Review: maximum likelihood estimate of parameters and noise The “true” underlying process What we measured Our model of the process Our ML estimate, given X : Log-likelihood function to maximize:

Variance of scalar and vector random variables cov of vector random variables produce symmetric positive definite matrices

We say that vector x has a multivariate Gaussian distribution with mean vector  and variance-covariance matrix , iff each element has Normal distribution. Multivariate Normal distribution d: dimension of x Variance-covariance matrix

Bias of the parameter estimates for a given X Suppose the outputs y were actually produced by the process: The “true” underlying process What we measured Our model of the process Given a constant X, the underlying process would give us different y every time that we run it. If on each run we find an ML w and , how would w and  vary with respect to w* ? If there were no noise: ML estimate: In the absence of noise, ML estimate would recover w* exactly.

Bias of the parameter estimates for a given X How does the ML estimate behave in the presence of noise in y? The “true” underlying process What we measured Our model of the process nx1 vector ML estimate: Because  is normally distributed: In other words:

Bias and variance of an estimator Parameters of a distribution can be estimated (e.g., via ML). Here we assess the “goodness” of that estimate by quantifying its bias and variance. Given some data samples: Bias of the estimator is the expectation of the deviation from the true value of the parameter. Variance of the estimator is the anticipated uncertainty in the estimate due to the particular selection of the samples. Note that bias of an estimator is similar to structural errors. Variance of the estimator is similar to approximation errors.

… so is unbiased. But what about its variance? Bias of the ML parameter estimates for a given X

Variance of the parameter estimates for a given X For a given X, the ML (or least square) estimate of our parameter has this normal distribution: Matrix of constants vector of random variables Assume: mxm

Variance of the parameter estimates for a given X More formally:

Example m: dimension of w determinant probability density

Example m: dimension of w determinant

Bias of the ML estimate of  Preliminaries: a. compute expected value of the residuals Preliminaries: b. compute variance of the residuals nxn matrix n:number of data points

Bias of the ML estimate of  Preliminaries: b. compute variance of the residuals

Bias of the ML estimate of  Preliminaries: C. useful properties of trace operator

Bias of the ML estimate of  Remember that residuals are mean zero.

Bias of the ML estimate of  Number of parameters in w Number of data points nxn nxm So the ML estimate of  is biased, it tends to underestimate the actual variance of the noise. The bias becomes smaller as number of data points increase with respect to the number of unknown parameters.

When noise in each data sample has an independent variance (Midterm 2005)

When noise in each data sample has an independent variance This is the weighted least squares solution where each data point is weighted by the inverse of the noise.

Example: sensory noise may be proportional to the “signal”.

Summary n=Number of data points m=Number of parameters in w