Clustering and Probability (Chap 7). Review from Last Lecture Defined the K-means problem for formalizing the notion of clustering. Discussed the K-means.

Slides:



Advertisements
Similar presentations
What can we say about probability? It is a measure of likelihood, uncertainty, possibility, … And it is a number, numeric measure.
Advertisements

Bayes rule, priors and maximum a posteriori
1 1  1 =.
1  1 =.
CS CS1512 Foundations of Computing Science 2 Lecture 23 Probability and statistics (4) © J.
CS1512 Foundations of Computing Science 2 Week 3 (CSD week 32) Probability © J R W Hunter, 2006, K van Deemter 2007.
Probability and Counting Rules
5.1 Probability of Simple Events
Algorithms and applications
Discrete Probability Rosen 5.1.
5-1 Chapter 5 Probability 1.
CS525: Special Topics in DBs Large-Scale Data Management
MAT 103 Probability In this chapter, we will study the topic of probability which is used in many different areas including insurance, science, marketing,
Business and Economics 6th Edition
Chapter 2.3 Counting Sample Points Combination In many problems we are interested in the number of ways of selecting r objects from n without regard to.
Association Rule Mining
The Meaning of Independence in Probability and Statistics Henry Mesa Use your keyboard’s arrow keys to move the slides forward (▬►) or backward (◄▬)
Copyright © Cengage Learning. All rights reserved. 7 Probability.
40S Applied Math Mr. Knight – Killarney School Slide 1 Unit: Probability Lesson: PR-L1 Intro To Probability Intro to Probability Learning Outcome B-4 PR-L1.
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
From W1-S16. Node failure The probability that at least one node failing is: f= 1 – (1-p) n When n =1; then f =p Suppose p= but n=10000, then: f.
Warm Up Express the indicated degree of likelihood as a probability value: “There is a 40% chance of rain tomorrow.” A bag contains 6 red marbles, 3 blue.
9. Two Functions of Two Random Variables
Chapter 2 Probability.
Review and Preview and Random Variables
Pick Me. Basic Probability Population – a set of entities concerning which statistical inferences are to be drawn Random Sample – all member of population.
Christoph F. Eick Questions and Topics Review Dec. 10, Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.
A Survey of Probability Concepts
INC 551 Artificial Intelligence Lecture 11 Machine Learning (Continue)
Overview Fundamentals
Week 21 Basic Set Theory A set is a collection of elements. Use capital letters, A, B, C to denotes sets and small letters a 1, a 2, … to denote the elements.
COUNTING AND PROBABILITY
Mathematics in Today's World
MAT 103 Probability In this chapter, we will study the topic of probability which is used in many different areas including insurance, science, marketing,
Copyright © Cengage Learning. All rights reserved.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Chapter 4: Probability (Cont.) In this handout: Total probability rule Bayes’ rule Random sampling from finite population Rule of combinations.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Sets, Combinatorics, Probability, and Number Theory Mathematical Structures for Computer Science Chapter 3 Copyright © 2006 W.H. Freeman & Co.MSCS SlidesProbability.
Probability We love Section 9.3a and b!. Most people have an intuitive sense of probability, but that intuition is often incorrect… Let’s test your intuition.
Conditional Probability and Independence If A and B are events in sample space S and P(B) > 0, then the conditional probability of A given B is denoted.
NA387 Lecture 5: Combinatorics, Conditional Probability
Bayesian Networks. Male brain wiring Female brain wiring.
Chapter 3 Section 3.2 Basic Terms of Probability.
Theory of Probability Statistics for Business and Economics.
Discrete Mathematical Structures (Counting Principles)
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin A Survey of Probability Concepts Chapter 5.
LECTURE 15 THURSDAY, 15 OCTOBER STA 291 Fall
STA Lecture 61 STA 291 Lecture 6 Randomness and Probability.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. Probability is a numerical measure of the likelihood Probability is a numerical measure of the.
1.4 Equally Likely Outcomes. The outcomes of a sample space are called equally likely if all of them have the same chance of occurrence. It is very difficult.
Advanced Precalculus Advanced Precalculus Notes 12.3 Probability.
V7 Foundations of Probability Theory „Probability“ : degree of confidence that an event of an uncertain nature will occur. „Events“ : we will assume that.
Probability. What is probability? Probability discusses the likelihood or chance of something happening. For instance, -- the probability of it raining.
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Classifiers Fujinaga. Bayes (optimal) Classifier (1) A priori probabilities: and Decision rule: given and decide if and probability of.
Probability. Today we will look at… 1.Quick Recap from last week 2.Terminology relating to events and outcomes 3.Use of sample spaces when dealing with.
PROBABILITY AND STATISTICS WEEK 2 Onur Doğan. Introduction to Probability The Classical Interpretation of Probability The Frequency Interpretation of.
Virtual University of Pakistan
Chapter 7: Counting Principles
Probability Imagine tossing two coins and observing whether 0, 1, or 2 heads are obtained. It would be natural to guess that each of these events occurs.
What is Probability? Quantification of uncertainty.
Data Mining Lecture 11.
Classifiers Fujinaga.
Great Theoretical Ideas In Computer Science
CONDITIONAL PROBABILITY
Classifiers Fujinaga.
6.1 Sample space, events, probability
Presentation transcript:

Clustering and Probability (Chap 7)

Review from Last Lecture Defined the K-means problem for formalizing the notion of clustering. Discussed the K-means algorithm. Noted that the K-means algorithm was “quite good” in discovering “concepts” from data (based on features). Noted the important distinction between “attributes” and “features”.

Example of K-means -1 Measure 1Measure 2 Patient 111 Patient 221 Patient 334 Patient 445 Let initial centroids be C1 = (1,1) and C2 = (2,1)

Example of K-means-2 Measure 1 Measure 2 Dist to C1 (1,1) Dist to C2 (2,1) nearest Patient 11101C1 Patient 22110C2 Patient C2 Patient C2 C1: = (1,1); C2 = ((2+3+4)/3, (1+4+5)/3)) = (3,3.33)

Example of K-means-3 Measure1Measure2Dist to C1 (1,1) Dist to C2 (3,3.33) nearest Patient C1 Patient C1 Patient C2 Patient C2 C1: = ((1+2)/2, (1+1)/2) = (1.5,1); C2 = ((3 +4)/2), (4+5)/2) = (3.5, 4.5)

Example of K-means-4 Measure1Measure2Dist to C1 (1.5,1) Dist to C2 ( ) nearest Patient C1 Patient C1 Patient C2 Patient C2 C1: = ((1+2)/2, (1+1)/2) = (1.5,1); C2 = ((3 +4)/2), (4+5)/2) = (3.5, 4.5)

Example: 2 Clusters c c c c c c c c A(-1,2) B(1,2) C(-1,-2) D(1,-2) (0,0) K-means Problem: Solution is (0,2) and (0,-2) and the clusters are {A,B} and {C,D} K-means Algorithm: Suppose the initial centroids are (-1,0) and (1,0) then {A,C} and {B,D} end up as the two clusters. 4 2

Several other issues regarding clustering How do you select the initial centroids? How do you select the right number of clusters ? How do you deal with non-Euclidean distance/similarity measures ? Other approaches (like hierarchical, spectral etc.) Curse of high-dimensionality.

Question S-LengthS-WidthP-LengthP-WidthFlower SmallMediumSmallMediumA (SetosA) Medium Large O(Versicolor) MediumSmall LargeI (Virginica) Large MediumSmallA LargeSmallMediumSmall? What should the “prediction” be for the flower ?

Prediction and Probability When we make predictions we should assign “probabilities” with the prediction. Examples:  20% chance it will rain tomorrow.  50% chance that the tumor is malignant.  60% chance that the stock market will fall by the end of the week.  30% that the next president of the United States will be a Democrat.  0.1% chance that the user will click on a banner-ad. How do we assign probabilities to complex events.. using smart data algorithms…and counting.

Probability Basics Probability is a deep topic…..but for most cases the rules are straightforward to apply.. Terminology  Experiment  Sample Space  Events  Probability  Rules of probability  Conditional Probability  Bayes Rule

Probability: Sample Space Consider an experiment and let S be the space of possible outcomes. Example:  Experiment is tossing a coin; S={h,t}  Experiment is rolling a pair of dice: S={(1,1),(1,2),…(6,6)}  Experiment is a race consisting of three cars: 1,2 and 3. The sample space is {(1,2,3),(1,3,2),(2,1,3),(2,3,1),(3,1,2),(3,2,1)}

Probabilities Let Sample Space S = {1,2,…m} Consider numbers p i is the probability that the outcome of the experiment is i. Suppose we toss a fair coin. Sample space is S={h,t}. Then p h = 0.5 and p t = 0.5.

Probability Experiment: Will it rain or not in Sydney : S = {rain, no-rain}  P rain = 138/365 =0.38; P no-rain = 227/365 Assigning (or rather how to) probabilities is a deep philosophical problem.  What is the probability that the “green object standing outside my house is a burglar dressed in green.”

Probability An Event A is a set of possible outcomes of the experiment. Thus A is a subset of S. Let A be the event of getting a seven when we roll a pair of dice.  A = {(1,6),(6,1),(2,5),(5,2),(4,3),(3,4) }  P(A) = 6/36 = 1/6 In general

Probability The sample space S and events are “sets”. P(S) = 1; P(Φ) = 0 Addition:  Often Complement:

Example Suppose the probability of raining today is 0.4 and tomorrow is also 0.4 and on both days is 0.1. What is the probability it does not rain on either day. S={(R,N), (R,R),(N,N),(N,R)} Let A be the event that it will rain today and B it will rain tomorrow. Then  A ={(R,N), (R,R)} ; B={(N,R),(R,R)} Rain at least today or tomorrow: Will not rain on either day: 1 – 0.7 = 0.3

Conditional Probability One of the most important concepts in all of Data Mining and Machine Learning P(A|B) = P(AB)/P(B)..assuming P(B) not equal 0.  Conditional probability of A given B has occurred. Probability it will rain tomorrow given it has rained today.  P(A|B) = P(AB)/(B) = 0.1/0.4 = ¼ = 0.25  In general P(A|B) is not equal to P(B|A)

We need conditional probability to answer…. S-LengthS-WidthP-LengthP-WidthFlower SmallMediumSmallMediumA (SetosA) Medium Large O(Versicolor) MediumSmall LargeI (Virginica) Large MediumSmallA LargeSmallMediumSmall? What should the “prediction” be for the flower ?

Bayes Rule P(A|B) = P(AB)/P(B); P(B|A) = P(BA)|P(A) Now P(AB) = P(BA) Thus P(A|B)P(B) = P(B|A)P(A) Thus P(A|B) = [P(B|A)P(A)]/[P(B)]  This is called Bayes Rule  Basis of almost all prediction  Latest theories hypothesize that human memory and action is Bayes rule in action.

Bayes Rule Prior Posterior

Bayes Rule: Example The ASX market goes up 60% of the days of a year. 40% of the time it stays the same or goes down. The day the ASX is up, there is a 50% chance that the Shanghai Index is up. On other days there is 30% chance that Shanghai goes up. Suppose The Shanghai market is up. What is the probability that ASX was up. Define A1 as “ASX is up”; A2 is “ASX is not up” Define S1 as “Shanghai is up”; S2 is “Shanghai is not up” We want to calculate P(A1|S1) ? P(A1) = 0.6; P(A2) = 0.4; P(S1|A1) = 0.5; P(S1|A2) = 0.3 P(S2|A1) = 1 – P(S1|A1) = 0.5; P(S2|A2) = 1 –P(S1|A2) = 0.7;

Bayes Rule: Example We want to calculate P(A1|S1) ? P(A1) = 0.6; P(A2) = 0.4; P(S1|A1) = 0.5; P(S1|A2) = 0.3 P(S2|A1) = 1 – P(S1|A1) = 0.5; P(S2|A2) = 1 –P(S1|A2) = 0.7; P(A1|S1) = P(S1|A1)P(A1)/(P(S1)) How do we calculate P(S1) ?

Bayes Rule: Example P(S1) = P(S1,A1) + P(S1,A2) [Key Step] = P(S1|A1)P(A1) + P(S1|A2)P(A2) = 0.5 x x 0.4 = 0.42 Finally, P(A1|S1) = P(S1|A1)P(A1)/P(S1) = (0.5 x 0.6)/0.42 = 0.71

Example: Iris Flower F=Flower; SL=Sepal Length; SW = Sepal Width; PL=Petal Length; PW =Petal Width Data LargeSmallMediumSmall? choose the maximum

Example: Iris Flower So how do we compute: P(Data|F=A) ? This is a a non-trivial question…[subject to much research] How many times does “Data” appear in the “database” when F=A. In this case “Data” is a 4-dimensional “data vector.” Each component takes 3 values (small, medium, large). Thus number of combinations 3^4 = 81.

Example: Iris Flower Conditional Independence P(Data|F=A) = P(SL=Large,SW=Small,PL=Medium,PW=Small|F=A) ~= P(SL=Large|F=A)P(SW=Small|F=A)P(PL=Medium|A)P(PW=Small|A) The above is an assumption to make the “computation easier.”  Surprisingly evidence suggest that it works reasonably well in practice. This prediction method (which exploits conditional independence) is called “Naïve Bayes Classifier.”