Software Statistical Methods Statistical Inference The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These methods utilize the information contained in a simple form the population in drawing conclusions. Statistical inference may be divided into two major areas: parameter estimation and hypothesis testing. Random Sampling Definition A population consists of the totality of the observations with which we are concerned. A sample is a subset of observations selected from a population.
Definition A statistic is any function of the observations in a random sample. Since a statistic is a random variable, it has a probability distribution. We call the probability distribution of a statistic a sampling distribution. In general, if X is a random variable with probability distribution f(x), characterized by the unknown parameter , and if X 1,X 2,…,X n is a random sample of size n from X, then the statistic = h (X 1,X 2..X n ) is called a point estimator of . Note that is a random variable, because it is a function of random variables. After the sample has been selected, takes no particular numerical value called the point estimate of . Definition A point estimate of some population parameter is a single numerical value of a statistic .
Estimation problems occur frequently in engineering. We often need to estimate The mean of a single population The variance 2 (or standard deviation ) of a single population The population p of items in a population that belong to a class of interest The difference in means of two populations, 1 - 2 The difference in two population proportions, p 1 – p 2 Reasonable point estimates of these parameters are as follows: For , the estimate is = x, the sample mean. For 2, the estimate is 2 = s 2, the sample variance. For p, the estimate is p = x/n, the sample portion, where x is the number of items in a random sample of size n that belong to the class of interest. For 1 - 2, the estimate is 1 - 2 = x1 – x2, the difference between the sample means of two independent random samples. For p1 – p2, the estimate is p1 – p2, the difference between two sample portions computed from two independent random samples.
Properties of Estimators Unbiased Estimators The point estimator is an unbiased estimator for the parameter if E( ) = If the estimator is not unbiased, then the difference E( ) - is called the bias of the estimator . Definition The probability distribution of a statistic is called a sampling distribution.
Theorem If X 1,X 2,…..X n is a random sample of size n taken from a population (either finite or infinite) with mean and finite variance 2, and if X is the sample mean, then the limiting form of the distribution of Z = as n , is the standard normal distribution. If x is the sample mean of a random sample of size n from a population with known variance 2, a 100(1 - ) percent confidence interval on is given by Where z /2 is the upper 100 /2 percentage point of the standard normal distribution.
Definition If x is used as an estimate of , we can be 100(1- ) percent confident that the error | x - | will not exceed a specified amount E when the sample size is The 100(1- ) percent upper-confidence interval for is and the 100(1- ) percent lower-confidence interval for is Inference on the Mean of a Population, Variance Unknown
Definition (Cont.) Let X 1,X 2,…..,X n be a random sample for a normal distribution with unknown mean and unknown variance 2. The quantity has a t distribution with n – 1 degrees of freedom. If s 2 is the sample variance from a random sample of n observations from a normal distribution with unknown variance 2 is where X 2 /2,n-1 and X 2 1- /2,n-1 are the upper and lower 100 /2 percentage points of the chi-square distribution with n – 1 degrees of freedom, respectively. The 100(1- ) percent lower-confidence interval on 2 is
Simple Linear Regression Definition The least squares estimates of the intercept and slope in the simple linear regression model are 0 = y - 1 x whereand where
Definition In simple linear regression the estimated standard error of the slope is and the estimated standard error of the intercept is where 2 is computed from equation
Parameter Estimation The regression line is fitted to the data points by finding the line which is the “closest” to the data points in some sense. Consider the vertical deviations between the line and the data points Minimizes the sum of the squares of these vertical deviations and this is referred to as the least squares fit.
Parameter Estimation (Cont.) The parameter estimates and are easily found by taking partial derivatives of Q with respect to and and setting the resulting expressions equal to zero. Since and
Parameter Estimation (Cont.) The parameter estimates and are thus the solutions to the normal equations and The normal equations can be solved to give and
Inference Procedures For a particular value of the explanatory variable, the true regression line specifies the expected value of the dependent variable or, in other words, the expected response. Thus, if the random variable measures the value of the dependent variable when the explanatory variable is equal to, then It is useful to be able to construct confidence intervals for this expected value.
Inference Procedures (Cont.) The point estimate of the average response at is This is an observation from the random variable where
Inference Procedures (Cont.) Since it is a linear combination of normal random variables, this random variable is also normally distributed, and it can be shown that it has an expectation of and a variance of
Inference on the Expected Value of the Dependent Variable A confidence level two-sided confidence interval for, the expected value of the dependent variable for a particular value of the explanatory variable, is where