Maximum Likelihood Estimation

Slides:



Advertisements
Similar presentations
State Space Models. Let { x t :t T} and { y t :t T} denote two vector valued time series that satisfy the system of equations: y t = A t x t + v t (The.
Advertisements

Some additional Topics. Distributions of functions of Random Variables Gamma distribution,  2 distribution, Exponential distribution.
General Linear Model With correlated error terms  =  2 V ≠  2 I.
Chapter 7. Statistical Estimation and Sampling Distributions
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Use of moment generating functions. Definition Let X denote a random variable with probability density function f(x) if continuous (probability mass function.
The General Linear Model. The Simple Linear Model Linear Regression.
Multivariate distributions. The Normal distribution.
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
3.3 Brownian Motion 報告者:陳政岳.
Probability theory 2011 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different definitions.
Visual Recognition Tutorial1 Random variables, distributions, and probability density functions Discrete Random Variables Continuous Random Variables.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Probability theory 2008 Outline of lecture 5 The multivariate normal distribution  Characterizing properties of the univariate normal distribution  Different.
Techniques for studying correlation and covariance structure
Correlation. The sample covariance matrix: where.
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Separate multivariate observations
The Neymann-Pearson Lemma Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  ) where  is either  1 or  2. Let g(x 1, …,
Chapter 5 Sampling and Statistics Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Estimation Basic Concepts & Estimation of Proportions
Profile Analysis. Definition Let X 1, X 2, …, X p denote p jointly distributed variables under study Let  1,  2, …,  p denote the means of these variables.
The Multiple Correlation Coefficient. has (p +1)-variate Normal distribution with mean vector and Covariance matrix We are interested if the variable.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Moment Generating Functions
Marginal and Conditional distributions. Theorem: (Marginal distributions for the Multivariate Normal distribution) have p-variate Normal distribution.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
Use of moment generating functions 1.Using the moment generating functions of X, Y, Z, …determine the moment generating function of W = h(X, Y, Z, …).
Sec 15.6 Directional Derivatives and the Gradient Vector
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Techniques for studying correlation and covariance structure Principal Components Analysis (PCA) Factor Analysis.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Chapter 5 Statistical Inference Estimation and Testing Hypotheses.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Multivariate Time Series Analysis
§2.The hypothesis testing of one normal population.
Stats & Summary. The Woodbury Theorem where the inverses.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
Chapter 5: The Basic Concepts of Statistics. 5.1 Population and Sample Definition 5.1 A population consists of the totality of the observations with which.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Other Models for Time Series. The Hidden Markov Model (HMM)
Conditional Expectation
Probability Theory and Parameter Estimation I
Hidden Markov Models.
Factor Analysis An Alternative technique for studying correlation and covariance structure.
Inference for the mean vector
PRODUCT MOMENTS OF BIVARIATE RANDOM VARIABLES
The distribution function F(x)
t distribution Suppose Z ~ N(0,1) independent of X ~ χ2(n). Then,
POINT ESTIMATOR OF PARAMETERS
The Multivariate Normal Distribution, Part 2
Factor Analysis An Alternative technique for studying correlation and covariance structure.
6.3 Sampling Distributions
Chapter-1 Multivariate Normal Distributions
Moments of Random Variables
Presentation transcript:

Maximum Likelihood Estimation Multivariate Normal distribution

The Method of Maximum Likelihood Suppose that the data x1, … , xn has joint density function f(x1, … , xn ; q1, … , qp) where q = (q1, … , qp) are unknown parameters assumed to lie in W (a subset of p-dimensional space). We want to estimate the parametersq1, … , qp

Definition: The Likelihood function Suppose that the data x1, … , xn has joint density function f(x1, … , xn ; q1, … , qp) Then given the data the Likelihood function is defined to be = L(q1, … , qp) = f(x1, … , xn ; q1, … , qp) Note: the domain of L(q1, … , qp) is the set W.

Definition: Maximum Likelihood Estimators Suppose that the data x1, … , xn has joint density function f(x1, … , xn ; q1, … , qp) Then the Likelihood function is defined to be = L(q1, … , qp) = f(x1, … , xn ; q1, … , qp) and the Maximum Likelihood estimators of the parameters q1, … , qp are the values that maximize

i.e. the Maximum Likelihood estimators of the parameters q1, … , qp are the values Such that Note: is equivalent to maximizing the log-likelihood function

The Multivariate Normal Distribution Maximum Likelihood Estiamtion

Let denote a sample (independent) from the p-variate normal distribution with mean vector and covariance matrix Note:

The matrix is called the data matrix.

The vector is called the data vector.

The mean vector

The vector is called the sample mean vector note

also

In terms of the data vector where

Graphical representation of sample mean vector The sample mean vector is the centroid of the data vectors.

The Sample Covariance matrix

The sample covariance matrix: where

There are different ways of representing sample covariance matrix:

Maximum Likelihood Estimation Multivariate Normal distribution

Let denote a sample (independent) from the p-variate normal distribution with mean vector and covariance matrix Then the joint density function of is:

The Likelihood function is: and the Log-likelihood function is:

To find the Maximum Likelihood estimators of we need to find to maximize or equivalently maximize

Note: thus hence

Now

Now

Summary: the Maximum Likelihood estimators of are and

Sampling distribution of the MLE’s

Note is: The joint density function of

This distribution is np-variate normal with mean vector

Thus the distribution of is p-variate normal with mean vector

Summary The sampling distribution of is p-variate normal with

The sampling distribution of the sample covariance matrix S and

The Wishart distribution A multivariate generalization of the c2 distribution

Definition: the p-variate Wishart distribution Let be k independent random p-vectors Each having a p-variate normal distribution with Then U is said to have the p-variate Wishart distribution with k degrees of freedom

The density ot the p-variate Wishart distribution Suppose Then the joint density of U is: where Gp(·) is the multivariate gamma function. It can be easily checked that when p = 1 and S = 1 then the Wishart distribution becomes the c2 distribution with k degrees of freedom.

Theorem Suppose then Corollary 1: Corollary 2: Proof

Theorem Suppose are independent, then Theorem are independent and Suppose then

Theorem Let be a sample from then Theorem Let be a sample from then

Theorem Proof etc

Theorem Let be a sample from then is independent of Proof be orthogonal Then

Note H* is also orthogonal

Properties of Kronecker-product

This the distribution of is np-variate normal with mean vector

Thus the joint distribution of is np-variate normal with mean vector

Thus the joint distribution of is np-variate normal with mean vector

Summary: Sampling distribution of MLE’s for multivatiate Normal distribution Let be a sample from then and