Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Data Analysis and Simulation Jorge Andre Swieca School Campos do Jordão, January,2003 João R. T. de Mello Neto.

Similar presentations


Presentation on theme: "Statistical Data Analysis and Simulation Jorge Andre Swieca School Campos do Jordão, January,2003 João R. T. de Mello Neto."— Presentation transcript:

1 Statistical Data Analysis and Simulation Jorge Andre Swieca School Campos do Jordão, January,2003 João R. T. de Mello Neto

2 Questions What is probability? How to quantify it? What is the probability of something happens? What is the value of a given parameter? What is the uncertainty in a given parameter? Is this fit acceptable? What is the likelihood of a given signal be physics and not background? How one separates signal from background?

3 Chance The conception of chance enters into the very first steps of scientific activity, in virtue of the fact that no observation is absolutely correct. Max Born Natural Philosophy of Cause and Chance, p. 47 O acaso é um diabo e um deus ao mesmo tempo. Machado de Assis

4 Lectures Basics: random variables, probability, distributions Random numbers, minimization techniques Maximum likelihood and chi-square methods Goodness of fit, limits Applications: pattern recognition in the LHCb muon system, sigma particle fitting in E791, bayesian coin,…

5 Basics: random variables, probabilities and distributions Jorge Andre Swieca School Campos do Jordão, January,2003 First lecture

6 References Statistical Data Analysis, G. Cowan, Oxford, 1998; Statistics, A guide to the Use of Statistical Methods in the Physical Sciences, R. Barlow, J. Wiley & Sons, 1989; Computational Statistics Handbook with MATLAB, W. L. Martinez, A. R. Martinez, Chapman&Hall, 2002

7 Random Variables Random experiment: the outcome cannot be predicted with certainty Statistics: model and analyze the outcomes Sample space S = set of all possible outcomes Die X = { 1, 2, 3, 4, 5, 6} Period of a pendulum Errors in the measuring process Fundamental unpredictability Discrete random variable Continous random variable

8 Probability Quantify the degree of randomness; Definition in terms of set theory: S composed of elements A (subsets of S) P(A) real number that satisfy three axioms: for every A, P(A) ≥ 0 if A∩B = Ø (disjoints) P(AUB) = P(A) + P(B) P(S) = 1 P(Ā) = 1 – P(A) P(AUĀ) = 1 0 ≤ P(A) ≤ 1 P(Ø) = 0 If A C B, P(A) ≤ P(B) P(AUB) = P(A) + P(B) – P(A∩B)

9 Intuitive approach Conditional probability P(A|B) : prob. of event A given B ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ABS P(A|B) = events in A and B Events in B events in A and B total events in B total == P(A∩B) P(B) P(B|A) = 2323 P(B∩A) P(A) = P(A∩B)= P(B|A)P(A) = 2/3 x 3/10 = 2/10 = P(A|B)P(B)

10 Intuitive approach Independent probabilities P(A|B) = P(A) ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ABS ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ∩ ABS independent not independent

11 Bayes Theorem disjoints

12 Cherenkov counter signal π signal K 90% 10% 95% efficiency 6% false signals = 99.3% = 0.7% = 67.6% = 32.4%

13 AIDS positive “About 0.01 percent of men with no known risk behaviour are infected with HIV (base rate). If such a man has the virus, there is a 99.9 percent chance that the test result will be positive (sensitivity). If a man is not infected, there is a 99.99 percent chance that the test result will be negative (specificity)” What is the chance that a man who tests positive actually has the virus? = 0.5 Reckoning with Risk, G. Gigerenger, 2002

14 AIDS positive natural frequencies 10000 (no known risk behaviour) 1 HIV 9999 no HIV 1 positive0 negative 1 positive9998 negative Many examples: mamography screening 1 out of 10 positives! Gigerenger, 2002

15 P(A) = lim number of occurrences of A in n measurements n→∞ n Probability What is the meaning of P(A)? Frequentist: limit of relative frequencies S: possible outcomes of an experiment (repeatable) A: occurrence of a given outcome (event) consistent with the probability axioms usual interpretation in standard textbooks appropriate to particle physics (many repeatable events) more problematic for unique phenomena big-bang rain tomorrow

16 Probability Bayesian (subjective) Element of S: hypotheses or propositions (true or false) P(A) = degree of belief that hypothesis A is true Hypothesys: a measurement will yield a given outcome a certain fraction of the time subjective probabilities include the frequentist interpretation P=95% m 1 ≤ m e ≤ m 2 Bayesian interpretation! Bayesian statistics: interpretation of Bayes theorem

17 Probability A: a given theory is correct; B: data will yield a particular result; P(theory|data) = P(data|theory) P(theory) P(data) posteriori apriori likelihood

18 Distributions x: random continuos variable f(x) prob. density function probability to observe x in the interval [x, x+ dx] = f(x)dx cumulative distribution function

19 Distributions joint p.d.f f(x,y) P(A∩B) = prob. of x in [x, x + dx] and y in [y, y + dy] =

20 Distributions

21 expectation value population variance covariance correlation coeficient

22 Distributions

23 binomial process with a given number of identical trials (N) with two possible outcomes : success (p), failure (1-p) what is the probability of n success? ( N-n failures) probability for a particular sequence: order does not matter: number of sequences probability, not prob. density

24 binomial

25

26 C1C1 C2C2 C4C4 C3C3 C5C5 Individual efficiency: 0.95 track: at least 3 points 3 chambers:f(3;3,0.95) = 0.95 3 = 0.857 4 chambers: f(3;4,0.95) + f(4;4,0.95) = 0.171 + 0.815 = 0.986 5 chambers: f(3;5,0.95) + f(4;5,0.95) + f(5;5,0.95) = 0.021 + 0.204 + 0.774 = 0.999

27 Poisson binomial: N large, p very small, Np→ν particular events, but no idea of number of trials sharp events occurring in a continuum Geiger counter near a radioactive source; Number of flashes of lightning in a storm;

28 Poisson Proof: ν events in some interval split interval in N sections prob. that a given section contains an event prob. of n events in N sections N→ ∞ with n finite

29 Poisson

30 deaths actual number Poisson corpsXyear 0 109 108.7 1 65 66.3 2 22 20.2 3 3 4.1 4 1 0.6 Poisson Fatal horse kicks: number of Prussian soldiers kicked to death by horses. In ten different army corps, over 20 years, there were 122 deaths: number of deaths one corps X year = = 0.610 no deaths: P(0, 0.61) = 0.5434 number of (corpsXyears) with no deaths: 200X0.5434 = 108.7 one death: P(1, 0.61) = 0.3315 number of (corpsXyears) with one death: 200X0.3515 = 66.3

31 Gaussian standard gaussian: cumulative evaluated numerically

32 Gaussian

33

34

35 in N dimensions:column vectors V: symmetric NXN matrix in 2 dimensions:

36 Gaussian

37 Central limit theorem the sum of N independent continous random variables x i with means µ i and variances σ i (N →∞) becomes a Gaussian random variable with regardless of the form of the individual p.d.f. of the x i formal justification for treating measurement errors as Gaussian random variables: total error: sum of a large number of small contributions

38 Central limit theorem Actually used: algorithm R632 Cern library


Download ppt "Statistical Data Analysis and Simulation Jorge Andre Swieca School Campos do Jordão, January,2003 João R. T. de Mello Neto."

Similar presentations


Ads by Google