Presentation is loading. Please wait.

Presentation is loading. Please wait.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

Similar presentations


Presentation on theme: "G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and."— Presentation transcript:

1 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and probability densities 3Expectation values, error propagation 4Catalogue of pdfs 5The Monte Carlo method 6Statistical tests: general concepts 7Test statistics, multivariate methods 8Goodness-of-fit tests 9Parameter estimation, maximum likelihood 10More maximum likelihood 11Method of least squares 12Interval estimation, setting limits 13Nuisance parameters, systematic uncertainties 14Examples of Bayesian approach

2 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 2 Information inequality for n parameters Suppose we have estimated n parameters The (inverse) minimum variance bound is given by the Fisher information matrix: The information inequality then states that V  I  is a positive semi-definite matrix, where Therefore Often use I  as an approximation for covariance matrix, estimate using e.g. matrix of 2nd derivatives at maximum of L.

3 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 3 Example of ML with 2 parameters Consider a scattering angle distribution with x = cos , or if x min < x < x max, need always to normalize so that Example:  = 0.5,  = 0.5, x min =  0.95, x max = 0.95, generate n = 2000 events with Monte Carlo.

4 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 4 Example of ML with 2 parameters: fit result Finding maximum of ln L( ,  ) numerically ( MINUIT ) gives N.B. No binning of data for fit, but can compare to histogram for goodness-of-fit (e.g. ‘visual’ or  2 ). (Co)variances from ( MINUIT routine HESSE )

5 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 5 Two-parameter fit: MC study Repeat ML fit with 500 experiments, all with n = 2000 events: Estimates average to ~ true values; (Co)variances close to previous estimates; marginal pdfs approximately Gaussian.

6 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 6 The ln L max  1/2 contour For large n, ln L takes on quadratic form near maximum: The contour is an ellipse:

7 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 7 (Co)variances from ln L contour → Tangent lines to contours give standard deviations. → Angle of ellipse  related to correlation: Correlations between estimators result in an increase in their standard deviations (statistical errors). The ,  plane for the first MC data set

8 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 8 Extended ML Sometimes regard n not as fixed, but as a Poisson r.v., mean. Result of experiment defined as: n, x 1,..., x n. The (extended) likelihood function is: Suppose theory gives = (  ), then the log-likelihood is where C represents terms not depending on .

9 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 9 Extended ML (2) Extended ML uses more info → smaller errors for Example: expected number of events where the total cross section  (  ) is predicted as a function of the parameters of a theory, as is the distribution of a variable x. If does not depend on  but remains a free parameter, extended ML gives: Important e.g. for anomalous couplings in e  e  → W + W 

10 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 10 Extended ML example Consider two types of events (e.g., signal and background) each of which predict a given pdf for the variable x: f s (x) and f b (x). We observe a mixture of the two event types, signal fraction = , expected total number =, observed total number = n. Let goal is to estimate  s,  b. →

11 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 11 Extended ML example (2) Maximize log-likelihood in terms of  s and  b : Monte Carlo example with combination of exponential and Gaussian: Here errors reflect total Poisson fluctuation as well as that in proportion of signal/background.

12 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 12 Extended ML example: an unphysical estimate A downwards fluctuation of data in the peak region can lead to even fewer events than what would be obtained from background alone. Estimate for  s here pushed negative (unphysical). We can let this happen as long as the (total) pdf stays positive everywhere.

13 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 13 Unphysical estimators (2) Here the unphysical estimator is unbiased and should nevertheless be reported, since average of a large number of unbiased estimates converges to the true value (cf. PDG). Repeat entire MC experiment many times, allow unphysical estimates:

14 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 14 ML with binned data Often put data into a histogram: Hypothesis iswhere If we model the data as multinomial (n tot constant), then the log-likelihood function is:

15 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 15 ML example with binned data Previous example with exponential, now put data into histogram: Limit of zero bin width → usual unbinned ML. If n i treated as Poisson, we get extended log-likelihood:

16 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 16 Relationship between ML and Bayesian estimators In Bayesian statistics, both  and x are random variables: Recall the Bayesian method: Use subjective probability for hypotheses (  ); before experiment, knowledge summarized by prior pdf  (  ); use Bayes’ theorem to update prior in light of data: Posterior pdf (conditional pdf for  given x)

17 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 17 ML and Bayesian estimators (2) Purist Bayesian: p(  | x) contains all knowledge about . Pragmatist Bayesian: p(  | x) could be a complicated function, → summarize using an estimator Take mode of p(  | x), (could also use e.g. expectation value) What do we use for  (  )? No golden rule (subjective!), often represent ‘prior ignorance’ by  (  ) = constant, in which case But... we could have used a different parameter, e.g., = 1/ , and if prior   (  ) is constant, then  ( ) is not! ‘Complete prior ignorance’ is not well defined.

18 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 18 Wrapping up lecture 10 We’ve now seen several examples of the method of Maximum Likelihood: multiparameter case variable sample size (extended ML) histogram-based data and we’ve seen the connection between ML and Bayesian parameter estimation. Next we will consider a special case of ML with Gaussian data and show how this leads to the method of Least Squares.

19 G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 19 Extra slides

20 G. Cowan Lectures on Statistical Data Analysis 20 Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements. Often called “objective priors” Form basis of Objective Bayesian Statistics The priors do not reflect a degree of belief (but might represent possible extreme cases). In a Subjective Bayesian analysis, using objective priors can be an important part of the sensitivity analysis.

21 G. Cowan Lectures on Statistical Data Analysis 21 Priors from formal rules (cont.) In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties. For a review see: Formal priors have not been widely used in HEP, but there is recent interest in this direction; see e.g. L. Demortier, S. Jain and H. Prosper, Reference priors for high energy physics, arxiv:1002.1111 (Feb 2010)

22 G. Cowan Lectures on Statistical Data Analysis 22 Jeffreys’ prior According to Jeffreys’ rule, take prior according to where is the Fisher information matrix. One can show that this leads to inference that is invariant under a transformation of parameters. For a Gaussian mean, the Jeffreys’ prior is constant; for a Poisson mean  it is proportional to 1/√ .

23 G. Cowan Lectures on Statistical Data Analysis 23 Jeffreys’ prior for Poisson mean Suppose n ~ Poisson(  ). To find the Jeffreys’ prior for , So e.g. for  = s + b, this means the prior  (s) ~ 1/√(s + b), which depends on b. But this is not designed as a degree of belief about s.


Download ppt "G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and."

Similar presentations


Ads by Google