Presentation is loading. Please wait.

Presentation is loading. Please wait.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Similar presentations


Presentation on theme: "Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap."— Presentation transcript:

1 Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap

2 Why resampling? Purpose of statistics is to estimate some parameter(s) and reliability of them. Since estimators are function of the sample points they are random variables. If we could find distribution of this random variable (sample statistic) then we could estimate reliability of the estimators. Unfortunately apart from the simplest cases, sampling distribution is not easy to derive. There are several techniques to approximate these distributions. These include: Edgeworth series, Laplace approximation, saddle-point approximations and others. These approximations give analytical form for the approximate distributions. With advent of computers more computationally intensive methods are emerging. They work in many cases satisfactorily. If we would have sampling distribution for the sampling statistics then we can estimate variance of the estimator, interval, even test hypotheses. Examples of simplest cases where sample distribution is known include: 1)Sample mean when sample is from the normal distribution – normal distribution with mean value equal to sample mean and variance equal to variance of the population divided by sample size if population variance is known. If population variance is not known then variance of sample mean is sample variance divided by n. 2)Sample variance has the distribution of multiple of  2 distribution. Again it is valid if population distribution is normal. 3)Sample mean divided by square root of sample variance has the multiple of the t distribution – again normal case 4)For independent samples: sample variance divided by sample variance has the multiple of F-distribution.

3 Jacknife Jacknife is used for bias removal. As we know that mean-square error for an estimator is equal to square of bias plus variance of the estimator. If bias is much higher than variance then under some circumstances Jacknife could be used. Description of Jacknife: Let us assume that we have a sample of size n. We estimate some sample statistics using all data – t n. Then by removing one point at a time we estimate t n-1,i, where subscript indicates size of the sample and index of removed sample point. Then new estimator is derived as: If the order of the bias of the statistic t n is O(n -1 ) then after jacknife order of the bias becomes O(n -2 ). Variance is estimated using: This procedure can be applied iteratively. I.e. for new estimator jacknife can be applied again. First application of Jacknife can reduce bias without changing variance of the estimator. But its second and higher order application can in general increase the variance of the estimator.

4 Cross-validation Cross-validation is a resampling technique to overcome overfitting. Let us consider least-squares technique. Let us assume that we have sample of size n y=(y 1,y 2,,,y n ). We want to estimate parameters  =(  1,  2,,,  m ). Now let us further assume that mean value of the observations is a function of these parameters (we may not know form of this function). Then we can postulate that function has a form g. Then we can find values of the parameters using least-squares techniques. Where X is a fixed matrix or random variables. After this technique we will have values of the parameters therefore form of the function. Form of the function g defines model we want to use. We may have several forms of the function. Obviously if we have more parameters fit will be “better”. Question is what would happen if we would observe new values of observations. Using estimated values of the parameters we could estimate square differences. Let us say we have new observation (y n+1,,,y n+l ). Can our function predict new observations? Which function can predict better? To answer to these questions we can calculate new differences: Where PE is prediction error. Function g that gives smallest value for PE will have higher predictive power. Function that gives smaller h but larger PE will be called overfitted function.

5 Cross-validation: Cont. When we choose the function using current sample how can we avoid overfitting? Cross- validation is an approach to deal with this problem. Description of cross-validation: We have a sample of size n. 1)Divide sample into K roughly equal size parts. 2)For the kth part, estimate parameters using K-1 parts excluding kth part. Calculate prediction error for kth part. 3)Repeat it for all k=1,2,,,K and combine all prediction errors and get cross-validation prediction error. If K=n then we will have leave-one-out cross-validation technique. Let us denote estimate at the kth step by  k (we will use a vector form). Let kth subset of the sample be A k and number of points in this subset is N k.. Then prediction error calculated per observation would be: Then we would choose the function that gives the smallest prediction error. We can expect that in future when we will have new observation this function will give smallest prediction error. This technique is widely used in modern statistical analysis. It is not restricted to least-squares technique. Instead of least-squares we could have any other form dependent on the distribution of the observations. It can in principle be applied to various maximum- likelihood and other estimators. Cross-validation is useful for model selection. I.e. if we have several models using cross- validation we select one of them.

6 Bootstrap Bootstrap is one of the computationally very expensive techniques. In a very simple form it works as follows. We have a sample of size n. We want to estimate some parameter . Estimator for this parameter gives t. For each sample point we assign probability (usually 1/n, i.e. all sample points have equal probability). Then from this sample with replacement we draw another random sample of size n and estimate . Let us denote estimate of the parameter by t i * at the jth resampling stage. Bootstrap estimator for  and its variance is calculated as: It is very simple form of application of the bootstrap resampling. For the parameter estimation bootstrap is usually chosen to be around 200. Let us analyse the working of bootstrap in one simple case. Consider random variable X with sample space x=(x 1,,,,x M ). Each point have probability f j. I.e. f =(f 1,,,f M ) represents distribution of the population. The sample of the size n will have relative frequencies for each sample point as

7 Bootstrap: Cont. Then distribution of conditional on f will be multinomial distribution: Multinomial distribution is the extension of the binomial distribution and expressed as: Limiting distribution of: Is multinormal distribution. If we resample from the given sample then we should consider conditional distribution of the following (that is also multinomial distribution): Limiting distribution of is the same as the conditional distribution of original sample. Since these two distribution converge to the same distribution then well behaved function of them also will have same limiting distributions. Thus if we use bootstrap to derive distribution of the sample statistic we can expect that in the limit it will converge to the distribution of sample statistic. I.e. following two function will have the same limiting distributions:

8 Bootstrap: Cont. If we could enumerate all possible resamples from our sample then we could build “ideal” bootstrap distribution. In practice even with modern computers it is impossible to achieve. Instead Monte Carlo simulation is used. Usually it works like: Draw random sample of size of n with replacement from the given sample. Estimate parameter and get estimate t j. Repeat it B times and build frequency and cumulative distributions for t

9 Bootstrap: Cont. How to build the cumulative distribution (it approximates our distribution function)? Consider sample of size n. x=(x 1,x 2,,,,x n ). Then cumulative distribution will be: where I denotes the indicator function: Another way of building the cumulative distribution is to sort the data first so that: Then build cumulative distribution like: We can also build histogram that approximates density of the distribution. First we should find interval that contains our data into equal intervals with length  t. Assume that center of the i-th interval is t i.. Then histogram can be calculated using the formula: Once we have the distribution of the statistics we can use it for various purposes. Bootstrap estimation of the parameter and its variance is one of the possible application. We can use this distribution for hypothesis testing, interval estimation etc. For pure parameter estimation we need resample around 200 times. For interval estimation we might need to resample around 2000 times. Reason is that for interval estimation and hypothesis testing we need more accurate distribution.

10 Bootstrap: Cont. Since while resampling we did not use any assumption about the population distribution this bootstrap is called non-parametric bootstrap. If we have some idea about the population distribution then we can use it in resampling. I.e. when we draw randomly from our sample we can use population distribution. For example if we know that population distribution is normal then we can estimate its parameters using our sample (sample mean and variance). Then we can approximate population distribution with this sample distribution and use it to draw new samples. As it can be expected if assumption about population distribution is correct then parametric bootstrap will perform better. If it is not correct then non-parametric bootstrap will overperform its parametric counterpart.

11 Bootstrap: Some simple applications Linear model: 1) Estimate parameters; 2) Calculate fitted values using 3) Calculate residuals using: 4) Draw n random representatives from r (call r random ) and add this to the fitted values and calculate new “observations” 5) Estimate new parameters and save them 6) Go to step 4. Generalised linear models. Procedure is as in linear model case with small modifications. 1) Residuals can be calculated using: 2) When calculating new “observation” make sure that they are similar to the “original observations. E.g. in binomial case make sure that values are 0 or 1


Download ppt "Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap."

Similar presentations


Ads by Google