Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probability and Information

Similar presentations


Presentation on theme: "Probability and Information"— Presentation transcript:

1 Probability and Information
A brief review Copyright, 1996 © Dale Carnegie & Associates, Inc.

2 CSE 572, CBS572: Data Mining by H. Liu
Probability Probability provides a way of summarizing uncertainty that comes from our laziness and ignorance! Probability, belief of the truth of a sentence 1 - true, 0 - false, 0<P<1 - intermediate degrees of belief in the truth of the sentence Degree of truth (fuzzy logic) vs. degree of belief 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

3 CSE 572, CBS572: Data Mining by H. Liu
All probability statements must indicate the evidence with respect to which the probability is being assessed. Prior or unconditional probability Posterior or conditional probability 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

4 Basic probability notation
Prior probability Proposition - P(Sunny) Random variable - P(Weather=Sunny) Each Random Variable has a domain Probability distribution P(weather) = <.7,.2,.08,.02> Conditional probability P(A|B) = P(A^B)/P(B) Product rule - P(A^B) = P(A|B)P(B) Probabilistic inference does not work like logical inference. 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

5 The axioms of probability
All probabilities are between 0 and 1 Necessarily true (valid) propositions have probability 1, false (unsatisfiable) 0 The probability of a disjunction P(AvB)=P(A)+P(B)-P(A^B) 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

6 The joint probability distribution
Joint completely specifies an agent’s probability assignments to all propositions in the domain A probabilistic model consists of a set of random variables (X1, …,Xn). An atomic event is an assignment of particular values to all the variables. 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

7 CSE 572, CBS572: Data Mining by H. Liu
Joint An example of two Boolean variables Observations: mutually exclusive and collectively exhaustive What are P(Cavity), P(CavityVToothache), P(Cavity|Toothache)? Impractical to specify all the entries for the Joint over n Boolean variable. If there is a Joint, we can read off any probability we need. Sidestep the Joint and work directly with conditional probability using Bayes rule P(Cavity) = P(Cavity or Toothache) = P(C) + P(T) - P(CT) P(C|T) =P(CT)/P(T) 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

8 CSE 572, CBS572: Data Mining by H. Liu
Bayes’ rule Deriving the rule via the product rule P(B|A) = P(A|B)P(B)/P(A) A more general case is P(X|Y) = P(Y|X)P(X)/P(Y) Bayes’ rule conditionalized on evidence E P(X|Y,E) = P(Y|X,E)P(X|E)/P(Y|E) Can you prove the above? Applying the rule to medical diagnosis page 426, meningitis P(M=1/50,000), stiff neck P(S)=1/20, M causes S P(S|M) = 0.5, what is P(M|S)? Relative likelihood Comparing the relative likelihood of meningitis and whiplash, given a stiff neck P(M|S)/P(W|S) = P(S|M)P(M)/P(S|W)P(W) Avoiding direct assessment of the prior P(M|S) + P(!M|S) = 1, P(S) = ? Can be solved by using product rules Normalization - P(Y|X)=P(X|Y)P(Y) 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

9 CSE 572, CBS572: Data Mining by H. Liu
Independence Independent events A, B P(B|A)=P(B), P(A|B)=P(A), P(A,B)=P(A)P(B) – is it true in general? Conditional independence P(X|Y,Z)=P(X|Z) 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

10 CSE 572, CBS572: Data Mining by H. Liu
Information & Entropy Entropy measures homogeneity of a collection of examples In information theory, defined as the average number of bits needed to encode the class of an arbitrary example With two classes in S, P and N, with p & n instances resp.; let t = p+n. I(S) = - (p/t) log2 (p/t) - (n/t) log2 (n/t) E.g., p=9, n=5, S=[9,5], I(S) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 I([14,0])=0; I([7,7])=1 In general, I([s1,s2,…,sk])= -  (si/s) log2 (si/s) 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu

11 CSE 572, CBS572: Data Mining by H. Liu
Entropy curve For p/(p+n) between 0 & 1, the 2-class entropy is 0 when p/(p+n) is 0 monotonically increasing between 0 and 0.5 1 when p/(p+n) is 0.5 monotonically decreasing between 0.5 and 1 0 when p/(p+n) is 1 When the data is pure, no need to send any bits 12/5/2018 CSE 572, CBS572: Data Mining by H. Liu


Download ppt "Probability and Information"

Similar presentations


Ads by Google