Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.

Similar presentations


Presentation on theme: "Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis."— Presentation transcript:

1 Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis

2 Why Bother About Probabilities? Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements. Probability theory is the proper mechanism for accounting for uncertainty. Take into account a-priori knowledge, for example: "If the fish was caught in the Atlantic ocean, then it is more likely to be salmon than sea-bass 2

3 Definitions Random experiment – An experiment whose result is not certain in advance (e.g., throwing a die) Outcome – The result of a random experiment Sample space – The set of all possible outcomes (e.g., {1,2,3,4,5,6}) Event – A subset of the sample space (e.g., obtain an odd number in the experiment of throwing a die = {1,3,5}) 3

4 Intuitive Formulation of Probability Intuitively, the probability of an event α could be defined as: Assumes that all outcomes are equally likely (Laplacian definition) 4 where is the number of times that event happens in n trials where N(a) is the number of times that event α happens in n trials

5 Axioms of Probability 5 C) A1A1 A2A2 A3A3 A4A4 S

6 Prior (Unconditional) Probability This is the probability of an event in the absence of any evidence. P(Cavity)=0.1 means that “in the absence of any other information, there is a 10% chance that the patient is having a cavity”. 6

7 Posterior (Conditional) Probability This is the probability of an event given some evidence. P(Cavity/Toothache)=0.8 means that “there is an 80% chance that the patient is having a cavity given that he is having a toothache” 7

8 Posterior (Conditional) Probability (cont’d) Conditional probabilities can be defined in terms of unconditional probabilities: Conditional probabilities lead to the chain rule: P(A,B)=P(A/B)P(B)=P(B/A)P(A) 8

9 Law of Total Probability If A 1, A 2, …, A n is a partition of mutually exclusive events and B is any event, then: Special case : 9    S 

10 Bayes’ Theorem Conditional probabilities lead to the Bayes’ rule: where 10

11 Example Consider the probability of Disease given Symptom: 11 where:

12 Example (cont’d) Meningitis causes a stiff neck 50% of the time. A patient comes in with a stiff neck – what is the probability that he has meningitis? Need to know the following: – The prior probability of a patient having meningitis (P(M)=1/50,000) – The prior probability of a patient having a stiff neck (P(S)=1/20) P(M/S)=0.0002 12

13 General Form of Bayes’ Rule If A 1, A 2, …, A n is a partition of mutually exclusive events and B is any event, then the Bayes’ rule is given by: where 13

14 Independence Two events A and B are independent iff: P(A,B)=P(A)P(B) Using the formula above, we can show: P(A/B)=P(A) and P(B/A)=P(B) A and B are conditionally independent given C iff: P(A/B,C)=P(A/C) e.g., P(WetGrass/Season,Rain)=P(WetGrass/Rain) 14

15 Random Variables In many experiments, it is easier to deal with a summary variable than with the original probability structure. Example In an opinion poll, we ask 50 people whether agree or disagree with a certain issue. – Suppose we record a "1" for agree and "0" for disagree. – The sample space for this experiment has 2 50 elements. Suppose we are only interested in the number of people who agree. – Define the variable X=number of "1“ 's recorded out of 50. – The new sample space has only 51 elements. 15

16 Random Variables (cont’d) A random variable (r.v.) is a function that assigns a value to the outcome of a random experiment. 16 X(j)

17 Random Variables (cont’d) How do we compute probabilities using random variables? – Suppose the sample space is S= – Suppose the range of the random variable X is – We observe X=x j iff the outcome of the random experiment is an such that X(s j )=x j 17

18 Example Consider the experiment of throwing a pair of dice 18

19 Discrete/Continuous Random Variables A discrete r.v. can assume only discrete values. A continuous r.v. can assume a continuous range of values (e.g., sensor readings). 19

20 Probability mass function (pmf) The pmf (discrete r.v.) of a r.v. X assigns a probability for each possible value of X. Notation: given two r.v.'s, X and Y, their pmf are denoted as p X (x) and p Y (y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different! 20

21 Probability density function (pdf) The pdf (continuous r.v.) of a r.v. X shows the probability of being “close” to some number x Notation: given two r.v.'s, X and Y, their pdf are denoted as p X (x) and p Y (y); for convenience, we will drop the subscripts and denote them as p(x) and p(y). However, keep in mind that these functions are different! 21

22 Probability Distribution Function (PDF) Definition: Some properties of PDF: – (1) – (2) F(x) is a non-decreasing function of x If X is discrete, its PDF can be computed as follows: 22

23 Probability Distribution Function (PDF) (cont’d) 23 pmf

24 Probability Distribution Function (PDF) (cont’d) If X is continuous, its PDF can be computed as follows: The pdf can be obtained from the PDF using: 24

25 Example Gaussian pdf and PDF 25 0

26 Joint pmf (discrete case) For n random variables, the joint pmf assigns a probability for each possible combination of values: p(x 1,x 2,…,x n )=P(X 1 =x 1, X 2 =x 2, …, X n =x n ) Notation: the joint pmf /pdf of the r.v.'s X 1, X 2,..., X n and Y 1, Y 2,..., Y n are denoted as p X1X2...Xn (x 1,x 2,...,x n ) and p Y1Y2...Yn (y 1,y 2,...,y n ); for convenience, we will drop the subscripts and denote them p(x 1,x 2,...,x n ) and p(y 1,y 2,...,y n ), keep in mind, however, that these are two different functions. 26

27 Joint pmf (discrete r.v.) (cont’d) Specifying the joint pmf requires a large number of values – k n assuming n random variables where each one can assume one of k discrete values. – Can be simplified if we assume independence or conditional independence. P(Cavity, Toothache) is a 2 x 2 matrix Toothache Not Toothache Cavity0.040.06 Not Cavity 0.010.89 27 Joint Probability Sum of probabilities = 1.0

28 Joint pdf (continuous r.v.) For n random variables, the joint pdf gives the probability of being “close” to a given combination of values: 28

29 Conditional pdf The conditional pdf can be derived from the joint pdf: Conditional pdfs lead to the chain rule: General case: 29

30 Independence Knowledge about independence between r.v.'s is very useful because it simplifies things. e.g., if X and Y are independent, then: 30 2D1D

31 Law of Total Probability The law of total probability: 31

32 Marginalization Given the joint pmf/pdf, we can compute the pmf/pdf of any subset of variables using marginalization: – Examples: 32

33 The joint pmf/pdf is very useful! 33

34 Normal (Gaussian) Distribution 34 d-dimensional: d x 1 d x d symmetric matrix 1-dimensional: -

35 Normal (Gaussian) Distribution (cont’d) Parameters and shape of Gaussian distribution – Number of parameters is – Shape determined by Σ 35

36 Normal (Gaussian) Distribution (cont’d) Mahalanobis distance: If the variables are independent, the multivariate normal distribution becomes: 36 Σ is diagonal

37 Expected Value 37

38 Expected Value (cont’d) The sample mean and expected value are related by: The expected value for a continuous r.v. is given by: 38

39 Variance and Standard Deviation 39

40 Covariance 40

41 Covariance Matrix – 2 variables 41 and Cov(X,Y)=Cov(Y,X) Σ = (i.e., symmetric matrix) can be computed using:

42 Covariance Matrix – n variables 42 (i.e., symmetric matrix)

43 Uncorrelated r.v.’s 43 DD

44 Properties of covariance matrix 44 (Note: we will review these concepts later in case you do not remember them)

45 Covariance Matrix Decomposition 45 where Φ -1 = Φ T

46 Linear Transformations 46

47 Linear Transformations (cont’d) 47 Whitening transform


Download ppt "Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis."

Similar presentations


Ads by Google