Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Slides:



Advertisements
Similar presentations
Model generalization Test error Bias, variance and complexity
Advertisements

N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
Model Assessment, Selection and Averaging
Model assessment and cross-validation - overview
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Evaluation.
Maximum likelihood (ML) and likelihood ratio (LR) test
Elementary hypothesis testing
Point estimation, interval estimation
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Regression II Model Selection Model selection based on t-distribution Information criteria Cross-validation.
Elementary hypothesis testing
Chapter 4 Multiple Regression.
Resampling techniques
Generalised linear models
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
Maximum likelihood (ML) and likelihood ratio (LR) test
Evaluation.
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Sampling Distributions
Evaluating Hypotheses
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Mixed models Various types of models and their relation
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Basics of discriminant analysis
Generalised linear models Generalised linear model Exponential family Example: Log-linear model - Poisson distribution Example: logistic model- Binomial.
Chapter 11 Multiple Regression.
Bootstrapping LING 572 Fei Xia 1/31/06.
Resampling techniques
Linear and generalised linear models
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Probability (cont.). Assigning Probabilities A probability is a value between 0 and 1 and is written either as a fraction or as a proportion. For the.
Linear and generalised linear models
Basics of regression analysis
Role and Place of Statistical Data Analysis and very simple applications Simplified diagram of a scientific research When you know the system: Estimation.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Business Statistics - QBM117 Statistical inference for regression.
Maximum likelihood (ML)
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 5 of Data Mining by I. H. Witten, E. Frank and M. A. Hall 報告人:黃子齊
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
PARAMETRIC STATISTICAL INFERENCE
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
CpSc 881: Machine Learning Evaluating Hypotheses.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Chapter 5 Sampling Distributions. The Concept of Sampling Distributions Parameter – numerical descriptive measure of a population. It is usually unknown.
Case Selection and Resampling Lucila Ohno-Machado HST951.
Validation methods.
1 Probability and Statistics Confidence Intervals.
BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Computacion Inteligente Least-Square Methods for System Identification.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Chapter 9 Sampling Distributions 9.1 Sampling Distributions.
Estimating standard error using bootstrap
Ch13 Empirical Methods.
Bootstrapping Jackknifing
Learning From Observed Data
Presentation transcript:

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap

Why resampling? Purpose of statistics is to estimate some parameter(s) and reliability of them. Since estimators are function of the sample points they are random variables. If we could find distribution of this random variable (sample statistic) then we could estimate reliability of the estimators. Unfortunately apart from the simplest cases, sampling distribution is not easy to derive. There are several techniques to approximate these distributions. These include: Edgeworth series, Laplace approximation, saddle-point approximations and others. These approximations give analytical form for the approximate distributions. With advent of computers more computationally intensive methods are emerging. They work in many cases satisfactorily. If we would have sampling distribution for the sampling statistics then we can estimate variance of the estimator, interval, even test hypotheses. Examples of simplest cases where sample distribution is known include: 1)Sample mean when sample is from the normal distribution – normal distribution with mean value equal to sample mean and variance equal to variance of the population divided by sample size if population variance is known. If population variance is not known then variance of sample mean is sample variance divided by n. 2)Sample variance has the distribution of multiple of  2 distribution. Again it is valid if population distribution is normal. 3)Sample mean divided by square root of sample variance has the multiple of the t distribution – again normal case 4)For independent samples: sample variance divided by sample variance has the multiple of F-distribution.

Jacknife Jacknife is used for bias removal. As we know that mean-square error for an estimator is equal to square of bias plus variance of the estimator. If bias is much higher than variance then under some circumstances Jacknife could be used. Description of Jacknife: Let us assume that we have a sample of size n. We estimate some sample statistics using all data – t n. Then by removing one point at a time we estimate t n-1,i, where subscript indicates size of the sample and index of removed sample point. Then new estimator is derived as: If the order of the bias of the statistic t n is O(n -1 ) then after jacknife order of the bias becomes O(n -2 ). Variance is estimated using: This procedure can be applied iteratively. I.e. for new estimator jacknife can be applied again. First application of Jacknife can reduce bias without changing variance of the estimator. But its second and higher order application can in general increase the variance of the estimator.

Cross-validation Cross-validation is a resampling technique to overcome overfitting. Let us consider least-squares technique. Let us assume that we have sample of size n y=(y 1,y 2,,,y n ). We want to estimate parameters  =(  1,  2,,,  m ). Now let us further assume that mean value of the observations is a function of these parameters (we may not know form of this function). Then we can postulate that function has a form g. Then we can find values of the parameters using least-squares techniques. Where X is a fixed matrix or random variables. After this technique we will have values of the parameters therefore form of the function. Form of the function g defines model we want to use. We may have several forms of the function. Obviously if we have more parameters fit will be “better”. Question is what would happen if we would observe new values of observations. Using estimated values of the parameters we could estimate square differences. Let us say we have new observation (y n+1,,,y n+l ). Can our function predict new observations? Which function can predict better? To answer to these questions we can calculate new differences: Where PE is prediction error. Function g that gives smallest value for PE will have higher predictive power. Function that gives smaller h but larger PE will be called overfitted function.

Cross-validation: Cont. When we choose the function using current sample how can we avoid overfitting? Cross- validation is an approach to deal with this problem. Description of cross-validation: We have a sample of size n. 1)Divide sample into K roughly equal size parts. 2)For the kth part, estimate parameters using K-1 parts excluding kth part. Calculate prediction error for kth part. 3)Repeat it for all k=1,2,,,K and combine all prediction errors and get cross-validation prediction error. If K=n then we will have leave-one-out cross-validation technique. Let us denote estimate at the kth step by  k (we will use a vector form). Let kth subset of the sample be A k and number of points in this subset is N k.. Then prediction error calculated per observation would be: Then we would choose the function that gives the smallest prediction error. We can expect that in future when we will have new observation this function will give smallest prediction error. This technique is widely used in modern statistical analysis. It is not restricted to least-squares technique. Instead of least-squares we could have any other form dependent on the distribution of the observations. It can in principle be applied to various maximum- likelihood and other estimators. Cross-validation is useful for model selection. I.e. if we have several models using cross- validation we select one of them.

Bootstrap Bootstrap is one of the computationally very expensive techniques. In a very simple form it works as follows. We have a sample of size n. We want to estimate some parameter . Estimator for this parameter gives t. For each sample point we assign probability (usually 1/n, i.e. all sample points have equal probability). Then from this sample with replacement we draw another random sample of size n and estimate . Let us denote estimate of the parameter by t i * at the jth resampling stage. Bootstrap estimator for  and its variance is calculated as: It is very simple form of application of the bootstrap resampling. For the parameter estimation bootstrap is usually chosen to be around 200. Let us analyse the working of bootstrap in one simple case. Consider random variable X with sample space x=(x 1,,,,x M ). Each point have probability f j. I.e. f =(f 1,,,f M ) represents distribution of the population. The sample of the size n will have relative frequencies for each sample point as

Bootstrap: Cont. Then distribution of conditional on f will be multinomial distribution: Multinomial distribution is the extension of the binomial distribution and expressed as: Limiting distribution of: Is multinormal distribution. If we resample from the given sample then we should consider conditional distribution of the following (that is also multinomial distribution): Limiting distribution of is the same as the conditional distribution of original sample. Since these two distribution converge to the same distribution then well behaved function of them also will have same limiting distributions. Thus if we use bootstrap to derive distribution of the sample statistic we can expect that in the limit it will converge to the distribution of sample statistic. I.e. following two function will have the same limiting distributions:

Bootstrap: Cont. If we could enumerate all possible resamples from our sample then we could build “ideal” bootstrap distribution. In practice even with modern computers it is impossible to achieve. Instead Monte Carlo simulation is used. Usually it works like: Draw random sample of size of n with replacement from the given sample. Estimate parameter and get estimate t j. Repeat it B times and build frequency and cumulative distributions for t

Bootstrap: Cont. How to build the cumulative distribution (it approximates our distribution function)? Consider sample of size n. x=(x 1,x 2,,,,x n ). Then cumulative distribution will be: where I denotes the indicator function: Another way of building the cumulative distribution is to sort the data first so that: Then build cumulative distribution like: We can also build histogram that approximates density of the distribution. First we should find interval that contains our data into equal intervals with length  t. Assume that center of the i-th interval is t i.. Then histogram can be calculated using the formula: Once we have the distribution of the statistics we can use it for various purposes. Bootstrap estimation of the parameter and its variance is one of the possible application. We can use this distribution for hypothesis testing, interval estimation etc. For pure parameter estimation we need resample around 200 times. For interval estimation we might need to resample around 2000 times. Reason is that for interval estimation and hypothesis testing we need more accurate distribution.

Bootstrap: Cont. Since while resampling we did not use any assumption about the population distribution this bootstrap is called non-parametric bootstrap. If we have some idea about the population distribution then we can use it in resampling. I.e. when we draw randomly from our sample we can use population distribution. For example if we know that population distribution is normal then we can estimate its parameters using our sample (sample mean and variance). Then we can approximate population distribution with this sample distribution and use it to draw new samples. As it can be expected if assumption about population distribution is correct then parametric bootstrap will perform better. If it is not correct then non-parametric bootstrap will overperform its parametric counterpart.

Bootstrap: Some simple applications Linear model: 1) Estimate parameters; 2) Calculate fitted values using 3) Calculate residuals using: 4) Draw n random representatives from r (call r random ) and add this to the fitted values and calculate new “observations” 5) Estimate new parameters and save them 6) Go to step 4. Generalised linear models. Procedure is as in linear model case with small modifications. 1) Residuals can be calculated using: 2) When calculating new “observation” make sure that they are similar to the “original observations. E.g. in binomial case make sure that values are 0 or 1