#10 The Central Limit Theorem

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 6 Point Estimation.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Chapter 5 Introduction to Inferential Statistics.
Binomial Random Variables. Binomial experiment A sequence of n trials (called Bernoulli trials), each of which results in either a “success” or a “failure”.
Time Series Basics Fin250f: Lecture 3.1 Fall 2005 Reading: Taylor, chapter
Estimation of parameters. Maximum likelihood What has happened was most likely.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Parametric Inference.
Computer vision: models, learning and inference
R. Kass/S07 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
Standard Error of the Mean
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
TobiasEcon 472 Law of Large Numbers (LLN) and Central Limit Theorem (CLT)
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
8 Sampling Distribution of the Mean Chapter8 p Sampling Distributions Population mean and standard deviation,  and   unknown Maximal Likelihood.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
INTRODUCTION TO Machine Learning 3rd Edition
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press 1 Computational Statistics with Application to Bioinformatics Prof. William.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #31 A Tale.
CLASSICAL NORMAL LINEAR REGRESSION MODEL (CNLRM )
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Professor William H. Press, Department of Computer Science, the University of Texas at Austin1 Opinionated in Statistics by Bill Press Lessons #46 Interpolation.
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
1 Opinionated in Statistics by Bill Press Lessons #15.5 Poisson Processes and Order Statistics Professor William H. Press, Department of Computer Science,
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
R. Kass/W04 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
Oliver Schulte Machine Learning 726
Some General Concepts of Point Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
The Gaussian Probability Distribution Function
Bayes Net Learning: Bayesian Approaches
Review Guppies can swim an average of 8 mph, with a standard deviation of 2 mph You measure 100 guppies and compute their mean What is the standard error.
Chapter 7: Sampling Distributions
Special Topics In Scientific Computing
Interval Estimation.
More about Posterior Distributions
Lecture 4: Econometric Foundations
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
#23 Bootstrap Estimation of Uncertainty
#22 Uncertainty of Derived Parameters
#8 Some Standard Distributions
Opinionated Lessons #13 The Yeast Genome in Statistics by Bill Press
Mathematical Foundations of BME Reza Shadmehr
Applied Statistics and Probability for Engineers
Presentation transcript:

#10 The Central Limit Theorem Opinionated Lessons in Statistics by Bill Press #10 The Central Limit Theorem Professor William H. Press, Department of Computer Science, the University of Texas at Austin

S = P X w t h ´ Á ( t ) = Y µ ¶ 1 ¡ ¾ + ¢ e x p " l n # ¼ Ã ! p ( ¢ ) The Central Limit Theorem is the reason that the Normal (Gaussian) distribution is uniquely important. We need to understand where it does and doesn’t apply. Let S = 1 N P X i w t h ´ Can always subtract off the means, then add back later. Then Á S ( t ) = Y i X N µ ¶ 1 ¡ 2 ¾ + ¢ e x p " l n # ¼ Ã ! Whoa! It better have a convergent Taylor series around zero! (Cauchy doesn’t, e.g.) These terms decrease with N, but how fast? So, S is normally distributed p S ( ¢ ) » N o r m a l ; 1 2 P ¾ i Professor William H. Press, Department of Computer Science, the University of Texas at Austin

Intuitively, the product of a lot of arbitrary functions that all start at 1 and have zero derivative looks like this: 1 Because the product falls off so fast, it loses all memory of the details of its factors except the starting value 1 and fact of zero derivative. In characteristic function space that’s basically the CLT. Professor William H. Press, Department of Computer Science, the University of Texas at Austin

p ( ¢ ) » N o r m a l ; P ¾ N S = P X V a r ( N S ) = p ( ¢ ) » N o r CLT is usually stated about the sum of RVs, not the average, so p S ( ¢ ) » N o r m a l ; 1 2 P ¾ i Now, since and N S = P X i V a r ( N S ) = 2 it follows that the simple sum of a large number of r.v.’s is normally distributed, with variance equal to the sum of the variances: p P X i ( ¢ ) » N o r m a l ; ¾ 2 If N is large enough, and if the higher moments are well-enough behaved, and if the Taylor series expansion exists! Also beware of borderline cases where the assumptions technically hold, but convergence to Normal is slow and/or highly nonuniform. (This can affect p-values for tail tests, as we will soon see.) Professor William H. Press, Department of Computer Science, the University of Texas at Austin

x ; = 1 : N P ( x j ¹ ; ¾ ) = Y 1 p 2 ¼ e P ( ¹ ; ¾ j x ) / 1 p 2 ¼ e Since Gaussians are so universal, let’s learn estimate the parameters m and s of a Gaussian from a set of points drawn from it: For now, we’ll just find the maximum of the posterior distribution of (m,s), given some data, for a uniform prior. This is called “maximum a posteriori (MAP)” by Bayesians, and “maximum likelihood (MLE)” by frequentists. The data is: x i ; = 1 : N The statistical model is: P ( x j ¹ ; ¾ ) = Y i 1 p 2 ¼ e ¡ uniform P ( ¹ ; ¾ j x ) / 1 p 2 ¼ N e ¡ i £ The posterior estimate is: Now find the MAP (MLE): Ha! The MAP mean is the sample mean, the MAP variance is the sample variance! = @ P ¹ ¾ 2 ( X i x ¡ N ) 1 = @ P ¾ 3 [ ¡ N 2 + X i ( x ¹ ) ] 1 Bessel’s Correction: N–1 Professor William H. Press, Department of Computer Science, the University of Texas at Austin

It won’t surprise you that I did the algebra by computer, in Mathematica: Professor William H. Press, Department of Computer Science, the University of Texas at Austin