Presentation is loading. Please wait.

Presentation is loading. Please wait.

Brief Review Probability and Statistics. Probability distributions Continuous distributions.

Similar presentations


Presentation on theme: "Brief Review Probability and Statistics. Probability distributions Continuous distributions."— Presentation transcript:

1 Brief Review Probability and Statistics

2 Probability distributions Continuous distributions

3 Defn (density function) Let x denote a continuous random variable then f(x) is called the density function of x 1) f(x) ≥ 0 2) 3)

4 Defn (Joint density function) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables then f(x) = f(x 1,x 2,x 3,..., x n ) is called the joint density function of x = (x 1,x 2,x 3,..., x n ) if 1) f(x) ≥ 0 2) 3)

5 Note:

6 Defn (Marginal density function) The marginal density of x 1 = (x 1,x 2,x 3,..., x p ) (p < n) is defined by: f 1 (x 1 ) = = where x 2 = (x p+1,x p+2,x p+3,..., x n ) The marginal density of x 2 = (x p+1,x p+2,x p+3,..., x n ) is defined by: f 2 (x 2 ) = = where x 1 = ( x 1,x 2,x 3,..., x p )

7 Defn (Conditional density function) The conditional density of x 1 given x 2 (defined in previous slide) (p < n) is defined by: f 1|2 (x 1 |x 2 ) = conditional density of x 2 given x 1 is defined by: f 2|1 (x 2 |x 1 ) =

8 Marginal densities describe how the subvector x i behaves ignoring x j Conditional densities describe how the subvector x i behaves when the subvector x j is held fixed

9 Defn (Independence) The two sub-vectors (x 1 and x 2 ) are called independent if: f(x) = f(x 1, x 2 ) = f 1 (x 1 )f 2 (x 2 ) = product of marginals or the conditional density of x i given x j : f i|j (x i |x j ) = f i (x i ) = marginal density of x i

10 Example (p-variate Normal) The random vector x (p × 1) is said to have the p-variate Normal distribution with mean vector  (p × 1) and covariance matrix  (p × p) (written x ~ N p ( ,  )) if:

11 Example (bivariate Normal) The random vector is said to have the bivariate Normal distribution with mean vector and covariance matrix

12

13

14

15 Theorem (Transformations) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x 1,x 2,x 3,..., x n ) = f(x). Let y 1 =  1 (x 1,x 2,x 3,..., x n ) y 2 =  2 (x 1,x 2,x 3,..., x n )... y n =  n (x 1,x 2,x 3,..., x n ) define a 1-1 transformation of x into y.

16 Then the joint density of y is g(y) given by: g(y) = f(x)|J| where = the Jacobian of the transformation

17 Corollary (Linear Transformations) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x 1,x 2,x 3,..., x n ) = f(x). Let y 1 = a 11 x 1 + a 12 x 2 + a 13 x 3,... + a 1n x n y 2 = a 21 x 1 + a 22 x 2 + a 23 x 3,... + a 2n x n... y n = a n1 x 1 + a n2 x 2 + a n3 x 3,... + a nn x n define a 1-1 transformation of x into y.

18 Then the joint density of y is g(y) given by:

19 Corollary (Linear Transformations for Normal Random variables) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables having an n-variate Normal distribution with mean vector  and covariance matrix . i.e. x ~ N n ( ,  ) Let y 1 = a 11 x 1 + a 12 x 2 + a 13 x 3,... + a 1n x n y 2 = a 21 x 1 + a 22 x 2 + a 23 x 3,... + a 2n x n... y n = a n1 x 1 + a n2 x 2 + a n3 x 3,... + a nn x n define a 1-1 transformation of x into y. Then y = (y 1,y 2,y 3,..., y n ) ~ N n (A ,A  A')

20 Defn (Expectation) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ). Let U = h(x) = h(x 1,x 2,x 3,..., x n ) Then

21 Defn (Conditional Expectation) Let x = (x 1,x 2,x 3,..., x n ) = (x 1, x 2 ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ) = f(x 1, x 2 ). Let U = h(x 1 ) = h(x 1,x 2,x 3,..., x p ) Then the conditional expectation of U given x 2

22 Defn (Variance) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ). Let U = h(x) = h(x 1,x 2,x 3,..., x n ) Then

23 Defn (Conditional Variance) Let x = (x 1,x 2,x 3,..., x n ) = (x 1, x 2 ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ) = f(x 1, x 2 ). Let U = h(x 1 ) = h(x 1,x 2,x 3,..., x p ) Then the conditional variance of U given x 2

24 Defn (Covariance, Correlation) Let x = (x 1,x 2,x 3,..., x n ) denote a vector of continuous random variables with joint density function f(x) = f(x 1,x 2,x 3,..., x n ). Let U = h(x) = h(x 1,x 2,x 3,..., x n ) and V = g(x) =g(x 1,x 2,x 3,..., x n ) Then the covariance of U and V.

25 Properties Expectation Variance Covariance Correlation

26 1. E[a 1 x 1 + a 2 x 2 + a 3 x 3 +... + a n x n ] = a 1 E[x 1 ] + a 2 E[x 2 ] + a 3 E[x 3 ] +... + a n E[x n ] or E[a'x] = a'E[x]

27 2.E[UV] = E[h(x 1 )g(x 2 )] = E[U]E[V] = E[h(x 1 )]E[g(x 2 )] if x 1 and x 2 are independent

28 3. Var[a 1 x 1 + a 2 x 2 + a 3 x 3 +... + a n x n ] or Var[a'x] = a′  a

29 4. Cov[a 1 x 1 + a 2 x 2 +... + a n x n, b 1 x 1 + b 2 x 2 +... + b n x n ] or Cov[a'x, b'x] = a′  b

30 5. 6.

31 Multivariate distributions

32 The Normal distribution

33 1.The Normal distribution – parameters  and  (or  2 ) Comment: If  = 0 and  = 1 the distribution is called the standard normal distribution Normal distribution with  = 50 and  =15 Normal distribution with  = 70 and  =20

34 The probability density of the normal distribution If a random variable, X, has a normal distribution with mean  and variance  2 then we will write:

35 The multivariate Normal distribution

36 Let = a random vector Let = a vector of constants (the mean vector)

37 Let = a p × p positive definite matrix

38 Definition The matrix A is positive semi definite if Further the matrix A is positive definite if

39 Suppose that the joint density of the random vector The random vector, [x 1, x 2, … x p ] is said to have a p-variate normal distribution with mean vector and covariance matrix  We will write:

40 Example: the Bivariate Normal distribution withand

41 Now and

42

43 Hence where

44 Note: is constant when is constant. This is true when x 1, x 2 lie on an ellipse centered at  1,  2.

45

46 Surface Plots of the bivariate Normal distribution

47 Contour Plots of the bivariate Normal distribution

48 Scatter Plots of data from the bivariate Normal distribution

49 Trivariate Normal distribution - Contour map x1x1 x2x2 x3x3 mean vector

50 Trivariate Normal distribution x1x1 x2x2 x3x3

51 x1x1 x2x2 x3x3

52 x1x1 x2x2 x3x3

53 example In the following study data was collected for a sample of n = 183 females on the variables Age, Height (Ht), Weight (Wt), Birth control pill use (Bpl - 1=no pill, 2=pill) and the following Blood Chemistry measurements Cholesterol (Chl), Albumin (Abl), Calcium (Ca) and Uric Acid (UA). The data are tabulated next page:

54 The data :

55

56 Alb, Chl, Bp

57 Marginal and Conditional distributions

58 Theorem: (Woodbury) Proof:

59 Example: Solution:

60 Theorem: (Inverse of a partitioned symmetric matrix)

61 Proof:

62

63 Theorem: (Determinant of a partitioned symmetric matrix) Proof:

64 Theorem: (Marginal distributions for the Multivariate Normal distribution) have p-variate Normal distribution with mean vector and Covariance matrix Then the marginal distribution of is q i -variate Normal distribution (q 1 = q, q 2 = p - q) with mean vector and Covariance matrix

65 Theorem: (Conditional distributions for the Multivariate Normal distribution) have p-variate Normal distribution with mean vector and Covariance matrix Then the conditional distribution of given is q i -variate Normal distribution with mean vector and Covariance matrix

66 Proof: (of Previous two theorems) is where, The joint density of and

67 where, and

68 also, and

69 ,

70

71 The marginal distribution of is

72 The conditional distribution of given is:

73 is called the matrix of partial variances and covariances. is called the partial covariance (variance if i = j) between x i and x j given x 1, …, x q. is called the partial correlation between x i and x j given x 1, …, x q.

74 is called the matrix of regression coefficients for predicting x q+1, x q+2, …, x p from x 1, …, x q. Mean vector of x q+1, x q+2, …, x p given x 1, …, x q is:

75 Example: Suppose that Is 4-variate normal with

76 The marginal distribution of is bivariate normal with The marginal distribution of is trivariate normal with

77 Find the conditional distribution of given Now and

78

79 The matrix of regression coefficients for predicting x 3, x 4 from x 1, x 2.

80

81 The Chi-square distribution

82 The Chi-square (  2 ) distribution with d.f. The Chi-square distribution

83 Graph: The  2 distribution ( = 4) ( = 5) ( = 6)

84 1.If z has a Standard Normal distribution then z 2 has a  2 distribution with 1 degree of freedom. Basic Properties of the Chi-Square distribution 2.If z 1, z 2,…, z are independent random variables each having Standard Normal distribution then has a  2 distribution with degrees of freedom. 3.Let X and Y be independent random variables having a  2 distribution with 1 and 2 degrees of freedom respectively then X + Y has a  2 distribution with degrees of freedom 1 + 2.

85 continued 4.Let x 1, x 2,…, x n, be independent random variables having a  2 distribution with 1, 2,…, n degrees of freedom respectively then x 1 + x 2 +…+ x n has a  2 distribution with degrees of freedom 1 +…+ n. 5.Suppose X and Y are independent random variables with X and X + Y having a  2 distribution with 1 and (  > 1 ) degrees of freedom respectively then Y has a  2 distribution with degrees of freedom - 1.

86 The non-central Chi-squared distribution If z 1, z 2,…, z are independent random variables each having a Normal distribution with mean  i and variance  2 = 1, then has a non-central  2 distribution with degrees of freedom and non-centrality parameter

87 Mean and Variance of non-central  2 distribution If U has a non-central  2 distribution with degrees of freedom and non-centrality parameter Then If U has a central  2 distribution with degrees of freedom and is zero, thus

88 Distribution of Linear and Quadratic Forms

89 Suppose Consider the random variable Questions 1.What is the distribution of U? (many statistics have this form) 2.When is this distribution simple? 3.When we have two such statistics when are they independent?

90 Simplest Case Then the distribution of U is the central  2 distribution with = n degrees of freedom.

91 Now consider the distribution of other quadratic forms where

92 with

93

94 also

95 with

96 if and only if A is symmetric idempotent of rank r. Proof Since A is symmetric idempotent of rank r, there exists an orthogonal matrix P such that A = PDP or PAP = D Since A is idempotent the eigenvalues of A are 0 or 1 and the # of 1’s is r, the rank of A.

97

98

99 if and only if A is symmetric idempotent of rank r. Proof Similar to previous theorem

100 with if and only if the following two conditions are satisfied: 1. A  is idempotent of rank r. 2.  A is idempotent of rank r.

101

102

103 with if and only if the following two conditions are satisfied: 1. A  is idempotent of rank r. 2.  A is idempotent of rank r.

104 Application: Let y 1, y 2, …, y n be a sample from the Normal distribution with mean , and variance  2. Then has a  2 distribution with = n -1 d.f. and = 0 (central)

105 Proof

106

107

108 Hence has a  2 distribution with = n -1 d.f. and non- centrality parameter

109 Independence of Linear and Quadratic Forms

110 Theorem Proof Since A is symmetric there exists an orthogonal matrix P such that PAP = D where D is diagonal. Note: since CA = 0 then rank(A) = r  n and some of the eigenvalues (diagonal elements of D) are zero.

111

112

113 Theorem Proof Exercise. Similar to previous thereom.

114 Application: Let y 1, y 2, …, y n be a sample from the Normal distribution with mean , and variance  2. Then Proof

115 Q.E.D.

116 Theorem (Independence of quadratic forms) Proof Let  =  is non- singular).

117 Theorem Expected Value and Variance of quadratic forms

118 Summary

119 Example – One-way Anova y 11, y 12, y 13, … y 1n a sample from N(  1,  2 ) y 21, y 22, y 23, … y 2n a sample from N(  2,  2 )  y k1, y k2, y k3, … y kn a sample from N(  k,  2 )

120 Thus

121 Now let

122 Thus

123 Statistical Inference Making decisions from data

124 There are two main areas of Statistical Inference Estimation – deciding on the value of a parameter –Point estimation –Confidence Interval, Confidence region Estimation Hypothesis testing –Deciding if a statement (hypotheisis) about a parameter is True or False

125 The general statistical model Most data fits this situation

126 Defn (The Classical Statistical Model) The data vector x = (x 1,x 2,x 3,..., x n ) The model Let f(x|  ) = f(x 1,x 2,..., x n |  1,  2,...,  p ) denote the joint density of the data vector x = (x 1,x 2,x 3,..., x n ) of observations where the unknown parameter vector    (a subset of p-dimensional space).

127 An Example The data vector x = (x 1,x 2,x 3,..., x n ) a sample from the normal distribution with mean  and variance  2 The model Then f(x| ,  2 ) = f(x 1,x 2,..., x n | ,  2 ), the joint density of x = (x 1,x 2,x 3,..., x n ) takes on the form: where the unknown parameter vector  ( ,  2 )   ={(x,y)|-∞ < x < ∞, 0 ≤ y < ∞}.

128 Defn (Sufficient Statistics) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is called a set of sufficient statistics for the parameter vector  if the conditional distribution of x given S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is not functionally dependent on the parameter vector . A set of sufficient statistics contains all of the information concerning the unknown parameter vector

129 A Simple Example illustrating Sufficiency Suppose that we observe a Success-Failure experiment n = 3 times. Let  denote the probability of Success. Suppose that the data that is collected is x 1, x 2, x 3 where x i takes on the value 1 is the i th trial is a Success and 0 if the i th trial is a Failure.

130 The following table gives possible values of (x 1, x 2, x 3 ). The data can be generated in two equivalent ways: 1.Generating (x 1, x 2, x 3 ) directly from f (x 1, x 2, x 3 |  ) or 2.Generating S from g(S|  ) then generating (x 1, x 2, x 3 ) from f (x 1, x 2, x 3 |S). Since the second step does involve  no additional information will be obtained by knowing (x 1, x 2, x 3 ) once S is determined

131 The Sufficiency Principle Any decision regarding the parameter  should be based on a set of Sufficient statistics S 1 (x), S 2 (x),...,S k (x) and not otherwise on the value of x.

132 A useful approach in developing a statistical procedure 1.Find sufficient statistics 2.Develop estimators, tests of hypotheses etc. using only these statistics

133 Defn (Minimal Sufficient Statistics) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Minimal Sufficient statistics for the parameter vector  if S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Sufficient statistics and can be calculated from any other set of Sufficient statistics.

134 Theorem (The Factorization Criterion) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Sufficient statistics for the parameter vector  if f(x|  ) = h(x)g(S,  ) = h(x)g(S 1 (x),S 2 (x),S 3 (x),..., S k (x),  ). This is useful for finding Sufficient statistics i.e. If you can factor out q-dependence with a set of statistics then these statistics are a set of Sufficient statistics

135 Defn (Completeness) Let x have joint density f(x|  ) where the unknown parameter vector   . Then S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Complete Sufficient statistics for the parameter vector  if S = (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) is a set of Sufficient statistics and whenever E[  (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) ] = 0 then P[  (S 1 (x),S 2 (x),S 3 (x),..., S k (x)) = 0] = 1

136 Defn (The Exponential Family) Let x have joint density f(x|  )| where the unknown parameter vector   . Then f(x|  ) is said to be a member of the exponential family of distributions if:  ,where

137 1)- ∞ < a i < b i < ∞ are not dependent on . 2)  contains a nondegenerate k-dimensional rectangle. 3) g(  ), a i,b i and p i (  ) are not dependent on x. 4) h(x), a i,b i and S i (x) are not dependent on q.

138 If in addition. 5) The S i (x) are functionally independent for i = 1, 2,..., k. 6)  [S i (x)]/  x j exists and is continuous for all i = 1, 2,..., k j = 1, 2,..., n. 7) p i (  ) is a continuous function of  for all i = 1, 2,..., k. 8) R = {[p 1 (  ),p 2 (  ),...,p K (  )] |   ,} contains nondegenerate k-dimensional rectangle. Then the set of statistics S 1 (x), S 2 (x),...,S k (x) form a Minimal Complete set of Sufficient statistics.

139 Defn (The Likelihood function) Let x have joint density f(x|  ) where the unkown parameter vector  . Then for a given value of the observation vector x,the Likelihood function, L x (  ), is defined by: L x (  ) = f(x|  ) with   The log Likelihood function l x (  ) is defined by: l x (  ) =lnL x (  ) = lnf(x|  ) with  

140 The Likelihood Principle Any decision regarding the parameter  should be based on the likelihood function L x (  ) and not otherwise on the value of x. If two data sets result in the same likelihood function the decision regarding  should be the same.

141 Some statisticians find it useful to plot the likelihood function L x (  ) given the value of x. It summarizes the information contained in x regarding the parameter vector .

142 An Example The data vector x = (x 1,x 2,x 3,..., x n ) a sample from the normal distribution with mean  and variance  2 The joint distribution of x Then f(x| ,  2 ) = f(x 1,x 2,..., x n | ,  2 ), the joint density of x = (x 1,x 2,x 3,..., x n ) takes on the form: where the unknown parameter vector  ( ,  2 )   ={(x,y)|-∞ < x < ∞, 0 ≤ y < ∞}.

143 The Likelihood function Assume data vector is known x = (x 1,x 2,x 3,..., x n ) The Likelihood function Then L( ,  )= f(x| ,  ) = f(x 1,x 2,..., x n | ,  2 ),

144 or

145 hence Now consider the following data: (n = 10)

146   0 20 50 70

147   0 20 50 70

148 Now consider the following data: (n = 100)

149   0 20 50 70

150   0 20 50 70

151 The Sufficiency Principle Any decision regarding the parameter  should be based on a set of Sufficient statistics S 1 (x), S 2 (x),...,S k (x) and not otherwise on the value of x. If two data sets result in the same values for the set of Sufficient statistics the decision regarding  should be the same.

152 Theorem (Birnbaum - Equivalency of the Likelihood Principle and Sufficiency Principle) L x 1 (  )  L x 2 (  ) if and only if S 1 (x 1 ) = S 1 (x 2 ),..., and S k (x 1 ) = S k (x 2 )

153 The following table gives possible values of (x 1, x 2, x 3 ). The Likelihood function

154 Estimation Theory Point Estimation

155 Defn (Estimator) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Then an estimator of the parameter  (  ) =  (  1,  2,...,  k ) is any function T(x)=T(x 1,x 2,x 3,..., x n ) of the observation vector.

156 Defn (Mean Square Error) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let T(x) be an estimator of the parameter  (  ). Then the Mean Square Error of T(x) is defined to be:

157 Defn (Uniformly Better) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let T(x) and T*(x) be estimators of the parameter  (  ). Then T(x) is said to be uniformly better than T*(x) if:

158 Defn (Unbiased ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let T(x) be an estimator of the parameter  (  ). Then T(x) is said to be an unbiased estimator of the parameter  (  ) if:

159 Theorem (Cramer Rao Lower bound) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Suppose that: i) exists for all x and for all. ii) iii) iv)

160 Let M denote the p x p matrix with ij th element. Then V = M -1 is the lower bound for the covariance matrix of unbiased estimators of . That is, var(c' ) = c'var( )c ≥ c'M -1 c = c'Vc where is a vector of unbiased estimators of .

161 Defn (Uniformly Minimum Variance Unbiased Estimator) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   . Then T*(x) is said to be the UMVU (Uniformly minimum variance unbiased) estimator of  (  ) if: 1) E[T*(x)] =  (  ) for all   . 2) Var[T*(x)] ≤ Var[T(x)] for all    whenever E[T(x)] =  (  ).

162 Theorem (Rao-Blackwell) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let S 1 (x), S 2 (x),...,S K (x) denote a set of sufficient statistics. Let T(x) be any unbiased estimator of  (  ). Then T*[S 1 (x), S 2 (x),...,S k (x)] = E[T(x)|S 1 (x), S 2 (x),...,S k (x)] is an unbiased estimator of  (  ) such that: Var[T*(S 1 (x), S 2 (x),...,S k (x))] ≤ Var[T(x)] for all   .

163 Theorem (Lehmann-Scheffe') Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let S 1 (x), S 2 (x),...,S K (x) denote a set of complete sufficient statistics. Let T*[S 1 (x), S 2 (x),...,S k (x)] be an unbiased estimator of  (  ). Then: T*(S 1 (x), S 2 (x),...,S k (x)) )] is the UMVU estimator of  (  ).

164 Defn ( Consistency ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   . Let T n (x) be an estimator of  (  ). Then T n (x) is called a consistent estimator of  (  ) if for any  > 0:

165 Defn (M. S. E. Consistency ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   . Let T n (x) be an estimator of  (  ). Then T n (x) is called a M. S. E. consistent estimator of  (  ) if for any  > 0:

166 Methods for Finding Estimators 1.The Method of Moments 2.Maximum Likelihood Estimation

167 Methods for finding estimators 1.Method of Moments 2.Maximum Likelihood Estimation

168 Let x 1, …, x n denote a sample from the density function f(x;  1, …,  p ) = f(x;  ) Method of Moments The k th moment of the distribution being sampled is defined to be:

169 To find the method of moments estimator of  1, …,  p we set up the equations: The k th sample moment is defined to be:

170 for  1, …,  p. We then solve the equations The solutions are called the method of moments estimators

171 The Method of Maximum Likelihood Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  1, …,  p ) where  (  1, …,  p ) are unknown parameters assumed to lie in  (a subset of p-dimensional space). We want to estimate the parameters  1, …,  p

172 Definition: Maximum Likelihood Estimation Suppose that the data x 1, …, x n has joint density function f(x 1, …, x n ;  1, …,  p ) Then the Likelihood function is defined to be L(  ) = L(  1, …,  p ) = f(x 1, …, x n ;  1, …,  p ) the Maximum Likelihood estimators of the parameters  1, …,  p are the values that maximize L(  ) = L(  1, …,  p )

173 the Maximum Likelihood estimators of the parameters  1, …,  p are the values Such that Note: is equivalent to maximizing the log-likelihood function

174 Hypothesis Testing

175 Defn (Test of size  ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Let  be any subset of . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   .

176 Let A denote the acceptance region for the test. (all values x = (x 1,x 2,x 3,..., x n ) of such that the decision to accept H 0 is made.) and let C denote the critical region for the test (all values x = (x 1,x 2,x 3,..., x n ) of such that the decision to reject H 0 is made.). Then the test is said to be of size  if

177 Defn (Power) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of . Then the Power of the test for    is defined to be:

178 Defn (Uniformly Most Powerful (UMP) test of size  ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :  . where  is any subset of . Let C denote the critical region for the test. Then the test is called the UMP test of size  if:

179 Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of . Let C denote the critical region for the test. Then the test is called the UMP test of size  if:

180 and for any other critical region C* such that: then

181 Theorem (Neymann-Pearson Lemma) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector   = (  0,  1 ). Consider testing the the Null Hypothesis H 0 :  =  0 against the alternative hypothesis H 1 :  =  1. Then the UMP test of size  has critical region: where K is chosen so that

182 Defn (Likelihood Ratio Test of size  ) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of  Then the Likelihood Ratio (LR) test of size a has critical region: where K is chosen so that

183 Theorem (Asymptotic distribution of Likelihood ratio test criterion) Let x = (x 1,x 2,x 3,..., x n ) denote the vector of observations having joint density f(x|  ) where the unknown parameter vector  . Consider testing the the Null Hypothesis H 0 :   against the alternative hypothesis H 1 :   . where  is any subset of  Then under proper regularity conditions on U = -2ln (x) possesses an asymptotic Chi-square distribution with degrees of freedom equal to the difference between the number of independent parameters in  and .


Download ppt "Brief Review Probability and Statistics. Probability distributions Continuous distributions."

Similar presentations


Ads by Google