Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,

Slides:

Advertisements

Similar presentations

Tests of Hypotheses Based on a Single Sample

Advertisements

Lecture XXIII.  In general there are two kinds of hypotheses: one concerns the form of the probability distribution (i.e. is the random variable normally.

CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 

Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.

Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.

Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.

Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.

Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.

Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.

Maximum likelihood (ML) and likelihood ratio (LR) test

Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.

Point estimation, interval estimation

Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

7-2 Estimating a Population Proportion

Ch. 9 Fundamental of Hypothesis Testing

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

Maximum likelihood (ML)

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.

Estimation Basic Concepts & Estimation of Proportions

1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.

Statistical Decision Theory

Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.

Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.

G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,

Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:

6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Significance Tests: THE BASICS Could it happen by chance alone?

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.

Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.

1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.

- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.

Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.

Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.

Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.

"Classical" Inference. Two simple inference scenarios Question 1: Are we in world A or world B?

Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.

Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.

Review of Statistics.  Estimation of the Population Mean  Hypothesis Testing  Confidence Intervals  Comparing Means from Different Populations  Scatterplots.

1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.

G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.

1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.

G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.

G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.

Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.

Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.

Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.

In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.

Practical Statistics for LHC Physicists Frequentist Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 8 April, 2015.

Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,

BAYES and FREQUENTISM: The Return of an Old Controversy

3. The X and Y samples are independent of one another.

Confidence Intervals and Limits

Lecture 4 1 Probability (90 min.)

Bayesian Inference, Basics

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Applied Statistics and Probability for Engineers

Presentation transcript:

Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő, Hungary 5 – 18 June, 2013

Example 4: Higgs to γγ background model signal model Likelihood – Higgs to γγ (CMS) 2

3 7 TeV8 TeV

Outline  Lecture 1  Descriptive Statistics  Probability & Likelihood  Lecture 2  The Frequentist Approach  The Bayesian Approach  Lecture 3  Analysis Example 4

The Frequentist Approach Confidence Intervals

The Frequentist Principle The Frequentist Principle (Neyman, 1937) Construct statements so that a fraction f ≥ p of them are guaranteed be true over an ensemble of statements. The fraction f is called the coverage probability and p is called the confidence level (C.L.). Note: The confidence level is a property of the ensemble to which the statements belong. Consequently, the confidence level may change if the ensemble changes. 6

Confidence Intervals – 1 Consider an experiment that observes D events with expected signal s and no background. Neyman devised a way to make statements of the form with the guarantee that at least a fraction p of them will be true. But, since we don’t know the true value of s, this property must hold whatever the true value of s. 7

Confidence Intervals – 2 8 Observation space Parameter space For each value s find a region in the observation space with probability f ≥ CL

Confidence Intervals – 3  Central Intervals (Neyman) Has equal probabilities on either side  Feldman – Cousins Intervals p(D | s) / p(D | D) Contains largest values of the ratios p(D | s) / p(D | D)  Mode – Centered Intervals p(D | s) Contains largest probabilities p(D | s) y ≥ By construction, all these intervals satisfy the frequentist principle: coverage probability ≥ confidence level 9

Confidence Intervals – 4 10 Central Feldman-Cousins Mode-Centered D ± √D

Confidence Intervals – 5 11 Central Feldman-Cousins Mode-Centered D ± √D

Confidence Intervals – 6 12 Central Feldman-Cousins Mode-Centered D ± √D

The Frequentist Approach The Profile Likelihood

Maximum Likelihood – 1 Example: Top Quark Discovery (1995), D0 Results D = 17 events B= 3.8 ± 0.6 events where B = Q / k δB = √Q / k 14

15 Maximum Likelihood – 2 knowns: D = 17 events B = 3.8 ± 0.6 background events unknowns: bexpected background count sexpected signal count Find maximum likelihood estimates (MLE):

16 Maximum Likelihood – 3 The Good  Maximum likelihood estimates (MLE) are consistent: RMS goes to zero as more and more data are acquired  If an unbiased estimate for a parameter exists, the maximum likelihood procedure will find it  Given the MLE for s, the MLE for y = g(s) is just The Bad (according to some!)  In general, MLEs are biased The Ugly  Correcting for bias, however, can waste data and sometimes yield absurdities Exercise 7: Show this Hint: consider a Taylor expansion about the MLE

Example: The Seriously Ugly The moment generating function of a probability distribution P(k) is the average: For the binomial, this is which is useful for calculating moments e.g., M 2 = (np) 2 + np – np 2 17 Exercise 8a: Show this

Given that k events out of n pass a set of cuts, the MLE of the event selection efficiency is p = k / n and the obvious estimate of p 2 is k 2 / n 2 But is a biased estimate of p 2. The best unbiased estimate of p 2 is k (k – 1) / [n (n – 1) ] Note: for a single success in n trials, p = 1/n, but p 2 = 0! 18 Exercise 8b: Show this Exercise 8c: Show this Example: The Seriously Ugly

The Profile Likelihood – 1 19 In order to make an inference about the signal, s, the 2-parameter problem, must be reduced to one involving s only by getting rid of all nuisance parameters, such as b. In principle, this must be done while respecting the frequentist principle: coverage prob. ≥ confidence level. This is very difficult to do exactly.

The Profile Likelihood – 2 20 In practice, we replace all nuisance parameters by their conditional maximum likelihood estimates (CMLE), which yields a function called the profile likelihood, p PL (D | s). In the top quark discovery example, we find an estimate of b as a function of s Then, in the likelihood p(D | s, b), b is replaced with its estimate. Since this is an approximation, the frequentist principle is not guaranteed to be satisfied exactly

The Profile Likelihood – 3 21 Wilks’ Theorem (1938) If certain conditions are met, and p max is the value of the likelihood p(D | s, b) at its maximum, the quantity has a density that is asymptotically χ 2. Therefore, by setting y(s) = 1 and solving for s, we can compute approximate 68% confidence intervals. This is what Minuit (now TMinuit) has been doing for 40 years!

The Profile Likelihood – 4 The CMLE of b is with s = D – B b = B the mode (peak) of the likelihood 22

The Profile Likelihood – 5 By solving for s, we can make the 68% C.L. 23 Exercise 9: Show this

The Frequentist Approach Hypothesis Tests

Hypothesis Tests 25 Fisher’s Approach: Null hypothesis (H 0 ), background-only The null hypothesis is rejected if the p-value is judged to be small enough. Note: this can be calculated only if p(x | H 0 ) is known

Example – Top Quark Discovery 26 Background, B = 3.8 events (ignoring uncertainty) D is observed count This is equivalent to 4.9 σ

Hypothesis Tests – 2 27 Alternative hypothesis significance (or size) of test A fixed significance α is chosen before data are analyzed. Neyman argued forcefully that it is necessary to consider alternative hypotheses H 1 Neyman’s Approach: Null hypothesis (H 0 ) + alternative (H 1 )

The Neyman-Pearson Test 28 In Neyman’s approach, hypothesis tests are a contest between significance and power, i.e., the probability to accept a true alternative. power significance of test

The Neyman-Pearson Test 29 Power curve power vs. significance. Note: in general, no analysis is uniformly the most powerful. power significance of test Blue is the more powerful below the cross-over point and green is the more powerful after.

The Bayesian Approach

Definition: A method is Bayesian if 1.it is based on the degree of belief interpretation of probability and 2.it uses Bayes’ theorem for all inferences. Dobserved data θparameter of interest ωnuisance parameters πprior density 31

The Bayesian Approach – 2 Bayesian analysis is just applied probability theory. Therefore, the method for eliminating nuisance parameters is “simply” to integrate them out: a procedure called marginalization. The integral is a weighted average of the likelihood. 32

The Bayesian Approach An Example

Example – Top Quark Discovery – 1 D Top Discovery Data D = 17 events B= 3.8 ± 0.6 events Calculations 1.Compute the posterior density p(s | D) 2.Compute the Bayes factor B 10 = p(D | H 1 ) / p(D | H 0 ). 3.Compute the p-value, including the effect of background uncertainty. 34

Step 1: Construct a probability model for the observations then put in the data D = 17 events B = 3.8 ± 0.6 background events B = Q / k δB = √Q / k to arrive at the likelihood. 35 Example – Top Quark Discovery – 2

Example – Top Quark Discovery – 3 Step 2: Write down Bayes’ theorem: and specify the prior: It is useful to compute the following marginal likelihood: sometimes referred to as the evidence for s. 36

Example – Top Quark Discovery – 4 The Prior: What do and represent? They encode what we know, or assume, about the mean background and signal in the absence of new observations. We shall assume that s and b are non-negative. After a century of argument, the consensus today is that there is no unique way to represent such vague information. 37

38 Example – Top Quark Discovery – 5 For simplicity, we take π(b | s) = 1. We may now eliminate b from the problem: where, and where we have introduced the symbol H 1 to denote the background + signal hypothesis. Exercise 10: Show this

p(17|s, H 1 ) as a function of the expected signal s. 39 Example – Top Quark Discovery – 6

40 Example – Top Quark Discovery – 7 The posterior density Given the marginal likelihood we can compute where

Assuming a flat prior for the signal π (s | H 1 ) = 1, the posterior density is given by from which we can compute the central 68% C.L. Example – Top Quark Discovery – 8 41 Exercise 11: Derive an expression for p(s | D, H 1 ) assuming a gamma prior Gamma(qs, U +1) for π(s | H 1 )

The Bayesian Approach Hypothesis Testing

43 Bayesian Hypothesis Testing – 1 Conceptually, Bayesian hypothesis testing proceeds in exactly the same way as any other Bayesian calculation: one computes the posterior density and marginalize it with respect to all parameters except those label the hypotheses posteriorpriorlikelihood and of course get your pHD!

44 Bayesian Hypothesis Testing – 2 However, just like your PhD, it is usually more convenient, and instructive, to arrive at the p(H | D) in stages. 1.Factorize the priors:  ( , , H) =  (θ,  | H)  (H) 2.Then, for each hypothesis, H, compute the function 3.Then, compute the probability of each hypothesis, H

45 Bayesian Hypothesis Testing – 3 It is clear, however, that to compute p(H | D), it is necessary to specify the priors  (H). Unfortunately, consensus on these numbers is highly unlikely! Instead of asking for the probability of an hypothesis, p(H | D), we could compare probabilities: The ratio in the first bracket is called the Bayes factor, B 10.

In principle, in order to compute the number twe need to specify a proper prior for the signal, that is, a prior that integrates to one. For simplicity, let’s assume π (s | H 1 ) = δ(s – 14). This yields: p(D | H 0 ) = 3.86 x p(D | H 1 ) = p(D |14, H 1 )= 9.28 x Bayesian Hypothesis Testing – 4 46

Since, p(D | H 0 )= 3.86 x p(D | H 1 )= 9.28 x we conclude that the hypothesis with s = 14 events is favored over that with s = 0 by 24,000 to 1. The Bayes factor can be mapped to a measure akin to “n-sigma” Bayesian Hypothesis Testing – 5 47 Exercise 12: Compute Z for the D0 results

Hypothesis Testing – A Hybrid Approach 48 Background, B = 3.8 ± 0.6 events D is observed count This is equivalent to 4.4 σ which may be compared with the 4.5 σ obtained with B 10 Exercise 13: Verify this calculation

Summary Frequentist Approach 1)Models statistical uncertainties with probabilities 2)Uses the likelihood 3)Ideally, respects the frequentist principle 4)In practice, eliminates nuisance parameters through the approximate procedure of profiling Bayesian Approach 1)Models all uncertainties with probabilities 2)Uses likelihood and prior probabilities 3)Eliminate nuisance parameters through marginalization. 49