Presentation is loading. Please wait.

Presentation is loading. Please wait.

G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.

Similar presentations


Presentation on theme: "G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and."— Presentation transcript:

1 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and probability densities 3Expectation values, error propagation 4Catalogue of pdfs 5The Monte Carlo method 6Statistical tests: general concepts 7Test statistics, multivariate methods 8Goodness-of-fit tests 9Parameter estimation, maximum likelihood 10More maximum likelihood 11Method of least squares 12Interval estimation, setting limits 13Nuisance parameters, systematic uncertainties 14Examples of Bayesian approach

2 G. Cowan Statistical methods for particle physics page 2 Example setting for statistical tests: the Large Hadron Collider Counter-rotating proton beams in 27 km circumference ring pp centre-of-mass energy 14 TeV Detectors at 4 pp collision points: ATLAS CMS LHCb (b physics) ALICE (heavy ion physics) general purpose

3 G. Cowan Statistical methods for particle physics page 3 The ATLAS detector 2100 physicists 37 countries 167 universities/labs 25 m diameter 46 m length 7000 tonnes ~10 8 electronic channels

4 G. Cowan Statistical methods for particle physics page 4 A simulated SUSY event in ATLAS high p T muons high p T jets of hadrons missing transverse energy p p

5 G. Cowan Statistical methods for particle physics page 5 Background events This event from Standard Model ttbar production also has high p T jets and muons, and some missing transverse energy. → can easily mimic a SUSY event.

6 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 6 For each reaction we consider we will have a hypothesis for the pdf of, e.g., Statistical tests (in a particle physics context) Suppose the result of a measurement for an individual event is a collection of numbers x 1 = number of muons, x 2 = mean p t of jets, x 3 = missing energy,... follows some n-dimensional joint pdf, which depends on the type of event produced, i.e., was it etc. Often call H 0 the signal hypothesis (the event type we want); H 1, H 2,... are background hypotheses.

7 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 7 Selecting events Suppose we have a data sample with two kinds of events, corresponding to hypotheses H 0 and H 1 and we want to select those of type H 0. Each event is a point in space. What ‘decision boundary’ should we use to accept/reject events as belonging to event type H 0 ? accept H0H0 H1H1 Perhaps select events with ‘cuts’:

8 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 8 Other ways to select events Or maybe use some other sort of decision boundary: accept H0H0 H1H1 H0H0 H1H1 linearor nonlinear How can we do this in an ‘optimal’ way? What are the difficulties in a high-dimensional space?

9 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 9 Test statistics Construct a ‘test statistic’ of lower dimension (e.g. scalar) We can work out the pdfs Goal is to compactify data without losing ability to discriminate between hypotheses. Decision boundary is now a single ‘cut’ on t. This effectively divides the sample space into two regions, where we accept or reject H 0.

10 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 10 Significance level and power of a test Probability to reject H 0 if it is true (error of the 1st kind): (significance level) Probability to accept H 0 if H 1 is true (error of the 2nd kind): (  power)

11 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 11 Efficiency of event selection Probability to accept an event which is signal (signal efficiency): Probability to accept an event which is background (background efficiency):

12 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 12 Purity of event selection Suppose only one background type b; overall fractions of signal and background events are  s and  b (prior probabilities). Suppose we select events with t < t cut. What is the ‘purity’ of our selected sample? Here purity means the probability to be signal given that the event was accepted. Using Bayes’ theorem we find: So the purity depends on the prior probabilities as well as on the signal and background efficiencies.

13 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 13 Constructing a test statistic How can we select events in an ‘optimal way’? Neyman-Pearson lemma (proof in Brandt Ch. 8) states: To get the lowest  b for a given  s (highest power for a given significance level), choose acceptance region such that where c is a constant which determines  s. Equivalently, optimal scalar test statistic is N.B. any monotonic function of this is just as good.

14 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 14 Purity vs. efficiency — optimal trade-off Consider selecting n events: expected numbers s from signal, b from background; → n ~ Poisson (s + b) Suppose b is known and goal is to estimate s with minimum relative statistical error. Take as estimator: Variance of Poisson variable equals its mean, therefore → So we should maximize equivalent to maximizing product of signal efficiency  purity.

15 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 15 Why Neyman-Pearson doesn’t always help The problem is that we usually don’t have explicit formulae for the pdfs Instead we may have Monte Carlo models for signal and background processes, so we can produce simulated data, and enter each event into an n-dimensional histogram. Use e.g. M bins for each of the n dimensions, total of M n cells. But n is potentially large, → prohibitively large number of cells to populate with Monte Carlo data. Compromise: make Ansatz for form of test statistic with fewer parameters; determine them (e.g. using MC) to give best discrimination between signal and background.

16 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 16 Multivariate methods Many new (and some old) methods: Fisher discriminant Neural networks Kernel density methods Support Vector Machines Decision trees Boosting Bagging New software for HEP, e.g., TMVA, Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/0703039 StatPatternRecognition, I. Narsky, physics/0507143

17 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 17 Linear test statistic Ansatz: → Fisher: maximize Choose the parameters a 1,..., a n so that the pdfs have maximum ‘separation’. We want: ss bb t g (t) bb large distance between mean values, small widths ss

18 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 18 Determining coefficients for maximum separation We have where In terms of mean and variance of this becomes

19 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 19 Determining the coefficients (2) The numerator of J(a) is and the denominator is ‘between’ classes ‘within’ classes → maximize

20 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 20 Fisher discriminant Setting accept H0H0 H1H1 Corresponds to a linear decision boundary. gives Fisher’s linear discriminant function:

21 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 21 Fisher discriminant: comment on least squares We obtain equivalent separation between hypotheses if we multiply the a i by a common scale factor and add an arbitrary offset a 0 : Thus we can fix the mean values  0 and  1 under the null and alternative hypotheses to arbitrary values, e.g., 0 and 1. Then maximizing is equivalent to minimizing Maximizing Fisher’s J(a) → ‘least squares’ In practice, expectation values replaced by averages using samples of training data, e.g., from Monte Carlo models.

22 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 22 Fisher discriminant for Gaussian data Suppose and covariance matrices V 0 = V 1 = V for both. We can write the Fisher discriminant (with an offset) as is multivariate Gaussian with mean values Then the likelihood ratio becomes

23 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 23 Fisher discriminant for Gaussian data (2) That is, (monotonic) so for this case, the Fisher discriminant is equivalent to using the likelihood ratio, and thus gives maximum purity for a given efficiency. For non-Gaussian data this no longer holds, but linear discriminant function may be simplest practical solution. Often try to transform data so as to better approximate Gaussian before constructing Fisher discrimimant.

24 G. Cowan Statistical methods for particle physics page 24

25 G. Cowan Statistical methods for particle physics page 25

26 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 26 Fisher discriminant and Gaussian data (3) Multivariate Gaussian data with equal covariance matrices also gives a simple expression for posterior probabilities, e.g., For a particular choice of the offset a 0 this can be written: which is the logistic sigmoid function: (We will use this later in connection with Neural Networks.)

27 G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 27 Wrapping up lecture 6 We looked at statistical tests and related issues: discriminate between event types (hypotheses), determine selection efficiency, sample purity, etc. We discussed a method to construct a test statistic using a linear function of the data: Fisher discriminant Next we will discuss nonlinear test variables such as neural networks


Download ppt "G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and."

Similar presentations


Ads by Google