Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

3.3 Hypothesis Testing in Multiple Linear Regression
Correlation and regression
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
NASSP Masters 5003F - Computational Astronomy Lecture 5: source detection. Test the null hypothesis (NH). –The NH says: let’s suppose there is no.
/k 2DS00 Statistics 1 for Chemical Engineering lecture 4.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Maximum likelihood (ML) and likelihood ratio (LR) test
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Maximum likelihood (ML)
MAE 552 Heuristic Optimization Instructor: John Eddy Lecture #20 3/10/02 Taguchi’s Orthogonal Arrays.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Chi Square Distribution (c2) and Least Squares Fitting
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
Maximum likelihood (ML)
Classification and Prediction: Regression Analysis
Objectives of Multiple Regression
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
Overview course in Statistics (usually given in 26h, but now in 2h)  introduction of basic concepts of probability  concepts of parameter estimation.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
Probability, Statistics and Errors in High Energy Physics Wen-Chen Chang Institute of Physics, Academia Sinica 章文箴 中央研究院 物理研究所.
NASSP Masters 5003F - Computational Astronomy Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Machine Learning 5. Parametric Methods.
ES 07 These slides can be found at optimized for Windows)
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
NASSP Masters 5003F - Computational Astronomy Lecture 4: mostly about model fitting. The model is our estimate of the parent function. Let’s express.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Statistics (2) Fitting Roger Barlow Manchester University IDPASC school Sesimbra 14 th December 2010.
The Maximum Likelihood Method
Probability Theory and Parameter Estimation I
Non-linear relationships
The Maximum Likelihood Method
The Maximum Likelihood Method
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Chi Square Distribution (c2) and Least Squares Fitting
Statistical Assumptions for SLR
Computing and Statistical Data Analysis / Stat 8
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 7
Introduction to Statistics − Day 4
Presentation transcript:

Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation

Slide 2 About Estimation Theory Data Statistical Inference Theory Data Probability Calculus Given these distribution parameters, what can we say about the data? Given this data, what can we say about the properties or parameters or correctness of the distribution functions?

Slide 3 What is an estimator? An estimator is a procedure giving a value for a parameter or property of the distribution as a function of the actual data values

Slide 4 Minimum Variance Bound What is a good estimator? A perfect estimator is: Consistent Unbiassed Efficient minimum One often has to work with less-than- perfect estimators

Slide 5 The Likelihood Function Set of data {x 1, x 2, x 3, …x N } Each x may be multidimensional – never mind Probability depends on some parameter a a may be multidimensional – never mind Total probability (density) P(x 1 ;a) P(x 2 ;a) P(x 3 ;a) …P(x N ;a)=L(x 1, x 2, x 3, …x N ;a) The Likelihood

Slide 6 Maximum Likelihood Estimation In practice usually maximise ln L as it’s easier to calculate and handle; just add the ln P(x i ) ML has lots of nice properties Given data {x 1, x 2, x 3, …x N } estimate a by maximising the likelihood L(x 1, x 2, x 3, …x N ;a) a Ln L â

Slide 7 Properties of ML estimation It’s consistent (no big deal) It’s biassed for small N May need to worry It is efficient for large N Saturates the Minimum Variance Bound It is invariant If you switch to using u(a), then û=u(â) a Ln L â u û

Slide 8 More about ML It is not ‘right’. Just sensible. It does not give the ‘most likely value of a’. It’s the value of a for which this data is most likely. Numerical Methods are often needed Maximisation / Minimisation in >1 variable is not easy Use MINUIT but remember the minus sign

Slide 9 ML does not give goodness-of-fit ML will not complain if your assumed P(x;a) is rubbish The value of L tells you nothing Fit P(x)=a 1 x+a 0 will give a 1 =0; constant P L= a 0 N Just like you get from fitting

Slide 10 Least Squares Measurements of y at various x with errors  and prediction f(x;a) Probability Ln L To maximise ln L, minimise  2 x y So ML ‘proves’ Least Squares. But what ‘proves’ ML? Nothing

Slide 11 Least Squares: The Really nice thing Should get  2  1 per data point Minimise  2 makes it smaller – effect is 1 unit of  2 for each variable adjusted. (Dimensionality of MultiD Gaussian decreased by 1.) N degrees Of Freedom =N data pts – N parameters Provides ‘Goodness of agreement’ figure which allows for credibility check

Slide 12 Chi Squared Results Large  2 comes from 1.Bad Measurements 2.Bad Theory 3.Underestimated errors 4.Bad luck Small  2 comes from 1.Overestimated errors 2.Good luck

Slide 13 Fitting Histograms Often put {x i } into bins Data is then {n j } n j given by Poisson, mean f(x j ) =P(x j )  x 4 Techniques Full ML Binned ML Proper  2 Simple  2 x x

Slide 14 What you maximise/minimise Full ML Binned ML Proper  2 Simple  2

Slide 15 Which to use? Full ML: Uses all information but may be cumbersome, and does not give any goodness-of-fit. Use if only a handful of events. Binned ML: less cumbersome. Lose information if bin size large. Can use  2 as goodness-of-fit afterwards Proper  2 : even less cumbersome and gives goodness-of-fit directly. Should have n j large so Poisson  Gaussian Simple  2 : minimising becomes linear. Must have n j large

Slide 16 Consumer tests show Binned ML and Unbinned ML give similar results unless binsize > feature size Both  2 methods get biassed and less efficient if bin contents are small due to asymmetry of Poisson Simple  2 suffers more as sensitive to fluctuations, and dies when bin contents are zero

Slide 17 Orthogonal Polynomials Fit a cubic: Standard polynomial f(x)=c 0 + c 1 x+ c 2 x 2 + c 3 x 3 Least Squares [  (y i -f(x i )) 2 ] gives Invert and solve? Think first!

Slide 18 Define Orthogonal Polynomial P 0 (x)=1 P 1 (x)=x + a 01 P 0 (x) P 2 (x)=x 2 + a 12 P 1 (x) + a 02 P 0 (x) P 3 (x)=x 3 + a 23 P 2 (x) + a 13 P 1 (x) + a 03 P 0 (x) Orthogonality:  r P i (x r ) P j (x r ) =0 unless i=j a ij =-(   r x r j P i (x r ))/  r P i (x r ) 2

Slide 19 Use Orthogonal Polynomial f(x)=c’ 0 P 0 (x)+ c’ 1 P 1 (x)+ c’ 2 P 2 (x)+ c’ 3 P 3 (x) Least Squares minimisation gives c’ i =  yP i /  P i 2 Special Bonus: These coefficients are UNCORRELATED Simple example: Fit y=mx+c or y=m(x -  x)+c’

Slide 20 Optimal Observables Function of the form P(x)=f(x)+a g(x) e.g. signal+background, tau polarisation, extra couplings A measurement x contains info about a Depends on f(x)/g(x) ONLY. Work with O(x)=f(x)/g(x) Write Use x g f O

Slide 21 Why this is magic It’s efficient. Saturates the MVB. As good as ML x can be multidimensional. O is one variable. In practice calibrate  O and â using Monte Carlo If a is multidimensional there is an O for each If the form is quadratic then use of the mean OO is not as good as ML. But close.

Slide 22 Extended Maximum Likelihood Allow the normalisation of P(x;a) to float Predicts numbers of events as well as their distributions Need to modify L Extra term stops normalistion shooting up to infinity

Slide 23 Using EML If the shape and size of P can vary independently, get same answer as ML and predicted N equal to actual N If not then the estimates are better using EML Be careful of the errors in computing ratios and such