EMAT Data Analysis WEEK -2

Slides:



Advertisements
Similar presentations
Lecture 18 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics.
Advertisements

MOMENT GENERATING FUNCTION AND STATISTICAL DISTRIBUTIONS
Lecture Discrete Probability. 5.3 Bayes’ Theorem We have seen that the following holds: We can write one conditional probability in terms of the.
Chapter 2 Concepts of Prob. Theory
CS433: Modeling and Simulation
Presentation on Probability Distribution * Binomial * Chi-square
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Section 2 Union, Intersection, and Complement of Events, Odds
Copyright © Cengage Learning. All rights reserved. 8.6 Probability.
Introduction to Probability
Chapter 4 Using Probability and Probability Distributions
1 Counting Rules. 2 The probability of a specific event or outcome is a fraction. In the numerator we have the number of ways the specific event can occur.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Introduction to Probability and Statistics
Probability Distributions
1 Counting Rules. 2 The probability of a specific event or outcome is a fraction. In the numerator we have the number of ways the specific event can occur.
1 Probability Parts of life are uncertain. Using notions of probability provide a way to deal with the uncertainty.
Probability and Probability Distributions
1 Engineering Computation Part 5. 2 Some Concepts Previous to Probability RANDOM EXPERIMENT A random experiment or trial can be thought of as any activity.
Class notes for ISE 201 San Jose State University
Information Theory and Security
Probability Distributions: Finite Random Variables.
Fundamentals of Probability
Joint Distribution of two or More Random Variables
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Stat 1510: Introducing Probability. Agenda 2  The Idea of Probability  Probability Models  Probability Rules  Finite and Discrete Probability Models.
1 Algorithms CSCI 235, Fall 2012 Lecture 9 Probability.
OUTLINE Probability Theory Linear Algebra Probability makes extensive use of set operations, A set is a collection of objects, which are the elements.
1 9/8/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
1 9/23/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Tch-prob1 Chap 3. Random Variables The outcome of a random experiment need not be a number. However, we are usually interested in some measurement or numeric.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Independence and Bernoulli.
Probability.
Chapter 3 Section 3.2 Basic Terms of Probability.
Theory of Probability Statistics for Business and Economics.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Expected values and variances. Formula For a discrete random variable X and pmf p(X): Expected value: Variance: Alternate formula for variance:  Var(x)=E(X^2)-[E(X)]^2.
College Algebra Sixth Edition James Stewart Lothar Redlin Saleem Watson.
1 2. Independence and Bernoulli Trials Independence: Events A and B are independent if It is easy to show that A, B independent implies are all independent.
5.3 Random Variables  Random Variable  Discrete Random Variables  Continuous Random Variables  Normal Distributions as Probability Distributions 1.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Introduction to Behavioral Statistics Probability, The Binomial Distribution and the Normal Curve.
STA347 - week 51 More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function.
Week 21 Conditional Probability Idea – have performed a chance experiment but don’t know the outcome (ω), but have some partial information (event A) about.
Independence and Bernoulli Trials. Sharif University of Technology 2 Independence  A, B independent implies: are also independent. Proof for independence.
The two way frequency table The  2 statistic Techniques for examining dependence amongst two categorical variables.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
Rules of Probability. Recall: Axioms of Probability 1. P[E] ≥ P[S] = 1 3. Property 3 is called the additive rule for probability if E i ∩ E j =
Section 2 Union, Intersection, and Complement of Events, Odds
12/7/20151 Probability Introduction to Probability, Conditional Probability and Random Variables.
Natural Language Processing Giuseppe Attardi Introduction to Probability IP notice: some slides from: Dan Jurafsky, Jim Martin, Sandiway Fong, Dan Klein.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
Probability. What is probability? Probability discusses the likelihood or chance of something happening. For instance, -- the probability of it raining.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
3/7/20161 Now it’s time to look at… Discrete Probability.
Basic Probability. Introduction Our formal study of probability will base on Set theory Axiomatic approach (base for all our further studies of probability)
1 What Is Probability?. 2 To discuss probability, let’s begin by defining some terms. An experiment is a process, such as tossing a coin, that gives definite.
Discrete Random Variable Random Process. The Notion of A Random Variable We expect some measurement or numerical attribute of the outcome of a random.
What is Probability? Quantification of uncertainty.
Natural Language Processing
Introduction to Probability
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Discrete Random Variables: Basics
Presentation transcript:

EMAT 20205 Data Analysis WEEK -2 Nello Cristianini

Axioms of Probability The probability law (assigning a number to each event E) must satisfy the following axioms: Nonnegativity: Additivity: if E and F are two disjoint events, then the probability of their union satisfies: Normalization: the probability of the entire sample space is equal to 1:

Some comments… The maximum value for the probability of an event is 1 (probability of the entire sample space) This means that that event is CERTAIN P(W)=1 means: the outcome will be one of the possible outcomes (obviously) (e.g.: the dice roll will certainly give outcome 1 or 2 or 3 or 4 or 5 or 6)

Comments… An event E is IMPOSSIBLE if it has zero probability P(E)=0 An event is CERTAIN if it has probability P(E)=1 The interesting things happen in between …

Comments on Additivity… Additivity: if E and F are two disjoint events, then the probability of their union satisfies: Probability of E or F is P(E) + P(F) E.g. in dice roll: probability of 1 or 2 is P(1)+P(2)

Consequences If we use a sample space W={O1, O2, O3, O4,…On} the probabilities of the outcomes Oi must satisfy P(O1)+P(O2)+…P(On)=1 We will write this sum as:

Consequences From this axiom we can see that: the probability of the empty event is 0 (so: there MUST be an outcome, think of dice roll example)

Probability Law We have seen 3 axioms that must be satisfied by the probability assignment to the outcomes (sample space) and some of their consequences BUT: who gives us the probabilities ? They are largely an arbitrary design choice (although we will see practical methods)

Example Think again of the case of the dice roll. Given our knowledge of physics, and the symmetry of a dice, we see no reason why a certain outcome should be more likely than another. So we want: P(1)=P(2)=P(3)=P(4)=P(5)=P(6) The normalization axiom gives P(…)=1/6 for each of them We can then use these probabilities and the axioms to compute probabilities of more complex events

Example Coin toss. Again: no reason to prefer one outcome over another, so: P(H)=P(T)=1/2 Unless …

Frequency information… Unless we actually know that specific coin (or dice) and we know the exact frequency of the outcomes in the last 1000s experiments Possibly the coin is not fair, and we observe 80% head, 20% tail outcomes … We can incorporate this in the model, assigning P(H)=0.8 P(T)=0.2 In the first case we have used our knowledge of the situation; in the second case we have estimated the probabilities by using frequencies

Probabilistic Model of Coin Toss Sample space is: W={H,T} Possible events are all subsets: {H,T}, {H}, {T}, 0 (empty) Fair coin  P({H})=P({T})=0.5 P({H,T})=P({H})+P({T})=1 P(0)=0 So we have assigned a probability to EACH possible event based on the probabilities on the outcomes, in a way to satisfy all axioms

Model: Toss of Three Coins Sample space (8 possible outcomes): W={HHH,HHT,HTH, HTT, TTT, THH, THT, TTH} We assume they are all equally likely, so we assign to each of them probability 1/8 The probability law should assign probabilities to EVERY POSSIBLE EVENT

P({HHT, HTH, THH})= =P({HHT})+P({HTH})+P({THH})= =1/8+1/8+1/8=3/8 Tossing Three Coins A possible event: 2 heads occur How many outcomes are in this event ? {HHT, HTH, THH} 3 disjoint events, their union has probability equal to the sum of their probabilities: P({HHT, HTH, THH})= =P({HHT})+P({HTH})+P({THH})= =1/8+1/8+1/8=3/8

Tossing Three Coins We can calculate similarly the probability of all possible events, and this gives a probability law that satisfies the axioms. We can see that obtaining 3 heads has probability 1/8, less than observing 2 heads (3/8), and so on …

Probability law for finite sample spaces For finite sample spaces, we specify the probability law by just assigning probabilities to the individual outcomes Often the outcomes are equiprobable, then P(E)=number of outcomes in E / total number of outcomes

Continuous Sample Space In the case of the dart and target, things are different … If each outcome is a point, its probability cannot be bigger than zero, else the total probability will exceed one Solution: outcomes must be (infinitesimally) small areas, not points Do not worry too much about this for now

Properties of Probability Law Assume area of set = probability of event!

Using Probabilistic Models Say we want to model an uncertain situation (e.g. an experiment) We first decide a sample space and a probability law. This step is somewhat arbitrary, and fully specifies the model. Then operating within the model we derive the probabilities of the events of interest, or other properties. This is fully unambiguous.

Example We want to choose a day in 2009 when to organize a picnic We want to avoid: rain, cold and traffic These are three possible events (day=rain; day=cold; day=traffic) not mutually exclusive …

R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT Assume this is a generic month. A random day will have values for R,C,T … we can compute the probability for R (rain), or for nT (not traffic); but also for R AND T …

R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT Event: RAIN

R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT Event: COLD

R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT Event: TRAFFIC

Unions and Intersections of Events We may want to calculate the probability to randomly selecting a day that is both not-rainy and not-cold Today we talk of probabilities of COMBINATIONS of events

Intersection of Events Probability that BOTH events occur simultaneously We DEFINE A NEW EVENT consisting of the outcomes that are in both events E and F and we calculate its probability New event The probability of both events occurring is

Intersection of Events The probability of this event is the sum of the probabilities of the outcomes that are both in E and in F (e.g.: fraction of days that are both R and T) Two events are mutually exclusive (or disjoint) if their intersection is empty (e.g.: R and nR are disjoint)

rain cold Event: Rain and cold R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT

Union of Events We want to calculate the probability that at least one of the events E and F occurs This is the probability of the union event The probability of G is the sum of the probability of the outcomes that are in either E or F (e.g. number of days that are either R or C)

rain cold Event: Rain OR cold R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT

Other combinations … We can consider the probability of being in E and not in F by considering the probability of being in E and in FC

Dice Example… Event E = {1,2,3} outcome is small (less than 3) Event F = {2,4,6} outcome is even number Probability of being either even OR small ? Probability of being even AND small ?

R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,NC,T 1 1/30 R,nC,nT nR,C,nT R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT 1 1/30 nR,NC,nT 0 0/30

Important … Calculate the joint probabilities from the table … P(R,C,nT)=0/30 P(R,C,T)=6/30 P(R,C)=P(R,C,nT)+P(R,C,T)=6/30

Conditional Probability What is the probability of rain in this month? (count all rainy days and divide by 30) P ( R )=#R / #Days What is the probability of rain given that it is cold ?

Conditional Probability Outcomes of experiment: days Being a cold day is an event Being a rainy day is an event Probability of being cold AND rainy ? Cold AND NOT rainy ? NOW: Is it more likely to be cold in rainy days ? What about: COLD ‘given that’ it is RAINY ?

P(d is rainy and cold)=6/30 R,C,T 6 6/30 nR,C,T 0 0/30 R,nC,T 16 16/30 nR,NC,T 1 1/30 R,C,nT 0 0/30 nR,C,nT 6 6/30 R,nC,nT 1 1/30 nR,NC,nT 0 0/30 P(d is rainy | d is cold) P(d is cold) = 12/30 P(d is rainy) = 23/30 P(d is rainy and cold)=6/30

Is it more likely to have rain in cold days ? P(rain)=23/30 What is the rain probability IN THE COLD DAYS ? Probability of rain given cold is … P(rain|cold)= P(rain AND cold)/P(cold) P(rain|cold)= 6/12=0.5

Definition We define conditional probability of E given F: Given that F is true, what is the probability of E ? In a way, restrict to the case when only F exists, F is the universe here …

Conditional probability We can consider the conditional probability P(E|F) as a new probability law defined on a new universe, F P(F|F)=1 All other axioms also remain valid …

Properties of Conditional Probability It satisfies all the axioms to be a probability law

Properties of Conditional Probability Definition: This can be seen as a new probability law in the restricted universe F For finite sample spaces:

Independent Events We define 2 independent events as follows: Independent events: P(E|F)=P(E)

Independent Events 2 independent events: rain and monday 2 dependent events: rain and january 2 dependent (?) events: traffic and Monday 2 independent events: january and monday In theory (not sure about our finite dataset)

Bayes Theorem Calculation P(cold)=12/30=2/5 P(traffic)=6/30=1/5 R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT Calculation P(cold)=12/30=2/5 P(traffic)=6/30=1/5 P(cold AND traffic) =6/30=1/5 P(cold|traffic)=1 P(traffic|cold)=1/2 R,C,T R,nC,T nR, nC,T R,nC,nT nR,C,nT

Bayes Theorem P(cold|traffic)P(traffic)= P(traffic|cold)P(cold) P(cold|traffic)P(traffic)=P(cold AND traffic) Calculation P(cold)=12/30=2/5 P(traffic)=6/30=1/5 P(cold AND traffic) =6/30=1/5 P(cold|traffic)=1 P(traffic|cold)=1/2 1*1/5=1/5 P(traffic|cold)P(cold)=P(traffic AND cold) ½*2/5=1/5 P(cold|traffic)P(traffic)= P(traffic|cold)P(cold)

Bayes Theorem P(cold|traffic)P(traffic)= P(traffic|cold)P(cold) P(cold|traffic)= P(traffic|cold)P(cold)/P(traffic)

Independent Events P(E|F)=P(E) E independent of F

Independent Events Since it was: And we are assuming it follows that for independent events:

Independent Events If E and F are independent, so are E and FC

Independence of 3 events… E,F,G are independent if every subset of these 3 events is independent… E,F are independent E,G are independent F,G are independent And: P(E,F,G)=P(E)P(F)P(G)

Independent Events We can decompose joint probabilities: P(E,F,G)=P(E)P(F)P(G) if they are independent Otherwise, we should write: P(E,F,G)=P(E|F,G)P(F|G)P(G)

Bernoulli Trials Toss a coin N times … Probability of starting with H= ½ Probability of starting with HH= ½ ½ … Probability of N consecutive H = (½)N

MATLAB INTERLUDE INTERSECT Set intersection. INTERSECT(A,B) when A and B are vectors returns the values common to both A and B. The result will be sorted. A and B can be cell arrays of strings.

MATLAB INTERLUDE UNION Set union. UNION(A,B) when A and B are vectors returns the combined values from A and B but with no repetitions. The result will be sorted.

MATLAB INTERLUDE FIND Find indices of nonzero elements. I = FIND(X) returns the linear indices corresponding to the nonzero entries of the array X. X may be a logical expression. So you can find elements in a set with a given property, and make a new set…

MATLAB INTERLUDE LENGTH Length of vector. LENGTH(X) returns the length of vector X. It is equivalent to MAX(SIZE(X)) for non-empty arrays and 0 for empty ones.

MATLAB INTERLUDE You can use these set commands to count the elements in various sets, and hence to compute probabilities…

Topics Modeling with Random Variables Discrete Random Variables Events and Probability Mass Function Examples of RV: Bernoulli Binomial Geometric The concept of Expectation…

RANDOM VARIABLES We have studied Probabilistic Models in general, the notions of outcome, sample space and event. Now an important special case: in many probabilistic models the outcomes are NUMBERS, or can be associated to numbers

RANDOM VARIABLES Examples of numerical outcome: How many people showed up today ? How many are sitting next to a statistics major? How many days of rain in january ? Temperature on a given day ? OR we can ASSOCIATE numerical values to non-numerical outcomes …

RANDOM VARIABLES Associating numerical values to non-numerical outcomes … HOMEWORK EXPERIMENT Outcome: the homework Sample space: set of all possible answers you COULD have given Associated numerical value: the GRADE

RANDOM VARIABLES Easier model: multiple choice quiz 10 questions, 3 choices each (A,B,C) Experiment: give the test to a student Outcome: a string of 10 symbols Sample space: set of all possible 10 symbols strings Numeric value: the grade assigned to each string (some form of distance to ‘correct string’)

RANDOM VARIABLES We call RANDOM VARIABLE a real-valued function of the outcome of an experiment Given an experiment, and the corresponding set of possible outcomes, a random variable associates a particular number with each outcome

RANDOM VARIABLES Example: This could be a model of grading a test Sample space = {AAA, AAB, AAC, ….} Random variable: AAA3 AAB2 AAC3 … This could be a model of grading a test

RANDOM VARIABLES Why are RANDOM VARIABLES important ? They allow us to model uncertain situations in a quantitative way, we will talk about: the EXPECTED temperature on january 25, or the EXPECTED number of students that will pass the test, etc. … We can also talk about expected deviations from this estimate …

RANDOM VARIABLES (continuous vs discrete) A random variable is called discrete if its range (the set of values it can take) is finite or COUNTABLY infinite It is called continuous – for example - if its range is the real axis (but we will not deal with this case today)

RANDOM VARIABLES Examples of discrete random variables: Number of things (number of ‘tails’ in1000 coin tosses) Number of minutes this class will last Roll of 2 dice, sum or product of the outputs is a discrete random variable

RANDOM VARIABLES The 2- dice example Let us call: A=* B=** C=*** D=**** E=***** F=******

RANDOM VARIABLES Let us consider the following random variable N associated to one dice: N(A)=1 N(B)=2 N( C)=3 N(D)=4 N(E)=5 N(F)=6

RANDOM VARIABLES Sample space of the 2 dice experiment: AA,AB,AC,AD,AE,AF, BA,BB,BC,BD,BE,BF, CA,CB,CC,CD,CE,CF, DA,DB,DC,DD,DE,DF, EA,EB,EC,ED,EE,EF, FA,FB,FC,FD,FE,FF,

RANDOM VARIABLES Sum random variable: AA1+1=2 =S(AA) AB1+2=3 = S(AB) … FF6+6=12 = S(FF) Range of random variable: {2,3,4,5,6,7,8,9,10,11,12}

RANDOM VARIABLES Similarly we can define the random variable PRODUCT, etc … So after the same experiment (rolling 2 dice) we may define different random variables (sum, absolute difference, product, max, min, etc … of the two individual outcomes …) Whatever attaches a numeric value to the OUTCOME of the experiment is a RANDOM VARIABLE

RANDOM VARIABLES important concepts A discrete random variable is a real valued function of the outcome of the experiment that can take a finite or countably infinite number of values A function of a discrete random variable defines another random variable We will define MEAN and VARIANCE of a random variable We will define independence and all other concepts we defined in the previous classes

RANDOM VARIABLES For discrete random variables we will define PROBABILITY MASS FUNCTIONS, that are probability laws that assign a probability to each possible numerical value the random variable can assume It will be analogous to what done so far …

RANDOM VARIABLES: notation We will denote by uppercase letters (X) the random variable, by lowercase letters (x) the actual value it assumes in a given experiment So we will talk about the probability that X=x, for example … and we will write it: P({X=x})

RANDOM VARIABLES Look at the website of the course, where we publish the statistics about the past homeworks Random variable: GRADE, G A particular grade: “g” For example we can talk about P({G=27})

RANDOM VARIABLES Easier model: multiple choice quiz 10 questions, 3 choices each (A,B,C) Experiment: give the test to a student Outcome: a string of 10 symbols Sample space: set of all possible 10 symbols strings Numeric value: the grade assigned to each string (some form of distance to ‘correct string’)

RANDOM VARIABLES We call RANDOM VARIABLE a real-valued function of the outcome of an experiment Given an experiment, and the corresponding set of possible outcomes, a random variable associates a particular number with each outcome

RANDOM VARIABLES important concepts A discrete random variable is a real valued function of the outcome of the experiment that can take a finite or countably infinite number of values A function of a discrete random variable defines another random variable We will define MEAN and VARIANCE of a random variable We will define independence and all other concepts we defined in the previous classes

RANDOM VARIABLES For discrete random variables we will define PROBABILITY MASS FUNCTIONS, that are probability laws that assign a probability to each possible numerical value the random variable can assume It will be analogous to what done so far …

RANDOM VARIABLES: notation We will denote by uppercase letters (X) the random variable, by lowercase letters (x) the actual value it assumes in a given experiment So we will talk about the probability that X=x, for example … and we will write it: P({X=x})

Probability Mass Function (PMF) The most important way to characterize a random variable is through the probabilities of the values that it can take For the random variable X, these are given by the PMF of X, denoted pX. If x is any possible value of X, the probability mass of x, pX(x) is the probability of the event {X=x}, consisting of all outcomes that give rise to a value of X equal to x pX(x)=P({X=x})

PMF Example: experiment = tossing 2 fair coins Random Variable X = number of heads obtained (range = {0,1,2}) Compute the PMF of X pX(x)= ¼ if x=0 ½ if x=1 ¼ if x=2 0 otherwise (=impossible)

PMF Event x=0  corr. Outcome TT Event x=1  corr. Outcomes HT or TH Event x=2  corr. Outcome HH Each outcome has probability ¼  hence the probabilities given before … (grouping outcomes based on value of random variable = a way to define events )

PMF Some properties: since the events corresponding to each value of the random variable must be disjoint, and form a partition of the sample space, From probability axioms we obtain:

PMF By a similar argument, we have for any set S of possible values of X: In coin example before, we can say: probability of at least 1 head is ¾ (sum of prob 1 heat + prob 2 heads)

PMF

PMF Some properties: since the events corresponding to each value of the random variable must be disjoint, and form a partition of the sample space, From probability axioms we obtain:

PMF By a similar argument, we have for any set S of possible values of X: In coin example before, we can say: probability of at least 1 head is ¾ (sum of prob 1 heat + prob 2 heads)

Functions of Random Variables One can generate new random variables as functions of random variables

CALCULATION OF PMF OF A RANDOM VARIABLE X For each possible value x of X: Collect all the possible outcomes that give rise to the event {X=x} Add their probabilities to obtain pX(x) THIS IS IMPORTANT !!

Example Probability of having HW grade larger than 30 ? Prob G=30 + prob G=31+…+ prob G=40 Each probability: count number of outcomes, divide by total sample space size

Expectation The PMF of a random variable provides us with several numbers: the probabilities of all possible values of X We would like to summarize this in few numbers that represent the PMF One such number is the EXPECTATION

Expectation Expected value of X: weighted average of all possible values of X (using probabilities as weights)

Expectation Suppose you roll a dice many times, and each time you receive as many dollars as the outcome of the dice-roll … How much money would you ‘expect’ for each roll ? We need to specify these terms …

Expectation Suppose you roll the dice K times, and Ki is the number of times the outcome is “i” Sample space = {1,2,3,4,5,6} The total amount of money you receive is:

Expectation The total amount in K rolls is: So the amount per roll is:

Expectation If we have been rolling the dice many times (=K is v. large), we can approximate the probability of an outcome with its frequency: pi=Ki/K Then we can write the expected amount of money as:

Expectation We define the expected value (expectation, or mean) of a random variable X, with PMF pX, by

Expectation Remark: we can consider this as the ‘center of gravity’ of the distribution

Variance Other important quantity to describe PMF. Expectation: we know the ‘average’ behavior of the random variable But: how often does the random variable deviate from the average behavior ?

Variance Let us create a NEW random variable describing the deviation of X from its mean E[X], and let us study it … What is the expected value of the random variable (X-E[X])2 ?

Variance New random variable: (X-E[X])2 Its expectation: E[(X-E[X])2]=Var(X) is called ‘the variance of X’ It is always nonnegative Provides a measure of dispersion of X around its mean

Variance

Variance Another related measure of dispersion is the standard deviation of X, defined as the square root of the variance From a practical viewpoint, the STD is easier to use because its has the same units as X (I.e.: if X is in meters, STD will be in meters, Var(X) in square meters)

Calculation of Variance Can just study expectation of R.V. Z=(X-E[X])2 X=… Z=… Var(X)=E[Z]=…

Expected Value of Functions of Random Values Let X be a random variable with PMF p(x), and let g(X) be a function of X The expected value of the random variable g(X) is:

Variance So the variance can be calculated as:

Properties of Mean and Variance Let X be a random variable and let us consider the linear function: Y=aX+b where a,b are given scalars. Then: E[Y]=aE[X]+b Var(Y)=a2•Var(X) THIS ONLY if g(X) is linear !!

A useful relation (variance as a function of moments) Var(X)=E[(X-E[X]) 2] Var(X)=E[X2]-(E[X]) 2 Proof: SEE IN LATER SLIDES FOR FULL PROOF … Use the relation

Variance The variance can be calculated as:

Properties of Mean and Variance Let X be a random variable and let us consider the linear function: Y=aX+b where a,b are given scalars. Then: E[Y]=aE[X]+b Var(Y)=a2•Var(X) THIS ONLY if g(X) is linear !!

A useful relation… Var(X)=E[(X-E[X] 2)] Var(X)=E[X2]-(E[X]) 2 Proof: either as HW or with Tas… Use the relation

Variance Calculation Var(X)=E[(X-E[X] 2)] = E[X2]-(E[X]) 2 We will use this a lot

Covariance of 2 RVs In probability theory and statistics, covariance is a measure of how much two variables change together (variance is a special case of the covariance when the two variables are identical). If two variables tend to vary together (that is, when one of them is above its expected value, then the other variable tends to be above its expected value too), then the covariance between the two variables will be positive. On the other hand, when one of them is above its expected value the other variable tends to be below its expected value, then the covariance between the two variables will be negative. [from wikipedia]

Covariance of 2 RVs The covariance between two real-valued random variables X and Y, with expected values E(X)=m E(Y)=n is defined as Cov(X, Y) = E[(X - m) (Y - n)]

In Matlab COV Covariance matrix. COV(X), if X is a vector, returns the variance. For matrices, where each row is an observation, and each column a variable, COV(X) is the covariance matrix. DIAG(COV(X)) is a vector of variances for each column, and SQRT(DIAG(COV(X))) is a vector of standard deviations. COV(X,Y), where X and Y are matrices with the same number of elements, is equivalent to COV([X(:) Y(:)]).

Correlation Coefficient From wikipedia

Correlation Coefficient Between 2 Random Variables CORRCOEF Correlation coefficients. R=CORRCOEF(X) calculates a matrix R of correlation coefficients for an array X, in which each row is an observation and each column is a variable. R=CORRCOEF(X,Y), where X and Y are column vectors, is the same as R=CORRCOEF([X Y]). If C is the covariance matrix, C = COV(X), then CORRCOEF(X) is the matrix whose (i,j)'th element is C(i,j)/SQRT(C(i,i)*C(j,j)).

EXTRA MATERIAL BELOW THIS POINT WHAT FOLLOWS IS EXTRA MATERIAL FOR REFERENCE Not covered in class 1 of week 2 (refers to class 2 of week 2)

Bernoulli Random Variable Consider the toss of a (generally not fair) coin, probability H = p; prob T = 1-p The BERNOULLI random variable is a RV that takes the two values 0 or 1 depending on whether the outcome is H or T (remember: RV is a function of the outcome) X=1 if outcome is H; X=0 if outcome is T

Bernoulli Random Variable The PMF of this Bernoulli RV is: PX(x)= Very important RV in modeling any generic situation with just 2 outcomes, e.g. outcome of the football match on Sunday, … P if x=1 1-p if x=0

Binomial Random Variable Experiment = N coin tosses, each one with prob(H)=p; prob(T)=1-p The random variable X is the number of heads in the n-toss sequence We refer to X as a BINOMIAL RANDOM VARIABLE WITH PARAMETERS n AND p

Binomial Random Variable The PMF of X consists of the binomial probabilities we have seen some time ago … Two parts: probability of a sequence with k heads and n-k tails Number of sequences with k heads and n-k tails

Binomial Random Variable The normalization property can be written as We will study this more in the future …

Geometric Random Variable We repeatedly toss the same coin as before. RV: number of tosses before the first head comes up … TTTTTTTH TTH H TTTTTTTTTTTTTTTTTTTTTTTTTTTTH

Geometric Random Variable PMF two parts: probability of the ‘prefix’ of k=1 tails, and probability of the end H Normalization:

Geometric Random Variable This can model the process of you trying to connect with the modem to an internet service provider … (how many fails before 1 success ?)

Poisson Random Variable

Functions of Random Variables One can generate new random variables as functions of random variables

Expectation The PMF of a random variable provides us with several numbers: the probabilities of all possible values of X We would like to summarize this in few numbers that represent the PMF One such number is the EXPECTATION

Expectation Expected value of X: weighted average of all possible values of X (using probabilities as weights) Next time we will develop this and other concepts…

Conclusion Random Variables Probability Mass Functions How to calculate PMFs Bernoulli Binomial Geometric Poisson ?

Probability Mass Function (PMF) The most important way to characterize a random variable is through the probabilities of the values that it can take For the random variable X, these are given by the PMF of X, denoted pX. If x is any possible value of X, the probability mass of x, pX(x) is the probability of the event {X=x}, consisting of all outcomes that give rise to a value of X equal to x pX(x)=P({X=x})

PMF Example: experiment = tossing 2 fair coins Random Variable X = number of heads obtained (range = {0,1,2}) Each outcome has probability ¼  hence the probabilities are Event x=0  corr. Outcome TT Event x=1  corr. Outcomes HT or TH Event x=2  corr. Outcome HH

PMF Compute the PMF of X pX(x)= ¼ if x=0 ½ if x=1 ¼ if x=2 0 otherwise (=impossible)

PMF

PMF Some properties: since the events corresponding to each value of the random variable must be disjoint, and form a partition of the sample space, From probability axioms we obtain:

PMF By a similar argument, we have for any set S of possible values of X: In coin example before, we can say: probability of at least 1 head is ¾ (sum of prob 1 heat + prob 2 heads)

Functions of Random Variables One can generate new random variables as functions of random variables

Bernoulli Random Variable Consider the toss of a (generally not fair) coin, probability H = p; prob T = 1-p The BERNOULLI random variable is a RV that takes the two values 0 or 1 depending on whether the outcome is H or T (remember: RV is a function of the outcome) X=1 if outcome is H; X=0 if outcome is T

Bernoulli Random Variable The PMF of this Bernoulli RV is: PX(x)= Very important RV in modeling any generic situation with just 2 outcomes, e.g. outcome of the football match on Sunday, … P if x=1 1-p if x=0

Mean and Variance E[X]=1*p + 0*(1-p)=p E[X2]= 12*p + 02*(1-p)=p Var(X)=E[X2]-(E[X]) 2=p-p2=p(1-p)

Uniform Distribution: dice roll … see later slides …

Binomial Random Variable Experiment = N coin tosses, each one with prob(H)=p; prob(T)=1-p The random variable X is the number of heads in the n-toss sequence We refer to X as a BINOMIAL RANDOM VARIABLE WITH PARAMETERS n AND p

Binomial Random Variable The PMF of X consists of the binomial probabilities we have seen some time ago … Two parts: probability of a sequence with k heads and n-k tails Number of sequences with k heads and n-k tails

Binomial Random Variable The normalization property can be written as We will study this more in the future …

Geometric Random Variable We repeatedly toss the same coin as before. RV: number of tosses before the first head comes up … TTTTTTTH TTH H TTTTTTTTTTTTTTTTTTTTTTTTTTTTH

Geometric Random Variable PMF two parts: probability of the ‘prefix’ of k=1 tails, and probability of the end H Normalization:

Bernoulli Random Variable Consider the toss of a (generally not fair) coin, probability H = p; prob T = 1-p The BERNOULLI random variable is a RV that takes the two values 0 or 1 depending on whether the outcome is H or T (remember: RV is a function of the outcome) X=1 if outcome is H; X=0 if outcome is T

Bernoulli Random Variable The PMF of this Bernoulli RV is: PX(x)= Very important RV in modeling any generic situation with just 2 outcomes, e.g. outcome of the football match on Sunday, … P if x=1 1-p if x=0

Mean and Variance E[X]=1*p + 0*(1-p)=p E[X2]= 12*p + 02*(1-p)=p Var(X)=E[X2]-(E[X]) 2=p-p2=p(1-p)

Two Important Series X We do not derive them here. We will apply these to calculations of variance… X

X

Uniform Distribution: dice roll Discrete Uniform PMF over [a,b] (case of the dice rolls)

Uniform The expectation is: This can be seen directly, since the PMF is symmetric around (a+b/2). Or use the series given before... Dice example: 1+2+3+4+5+6=21 Direct Computation of Expectation: 21/6=3.5 Formula says: (1+6)/2=3.5

Variance of Discrete Uniform We first study case where a=1; b=n [the general case will reduce to this] We will use relation: Var(X)=E[X2]-(E[X])2 Can verify this by induction of just believe it

Variance of Discrete Uniform Notice: we are still working with special case a=1; b=n

Variance of Discrete Uniform Now we can study the general case: by SHIFTING a distribution, its variance does not change (so we can study [a,b] case by studying variance of [1,b-a+1] case) So: setting n=b-a+1 in the previous equation gives the general case

Variance of Discrete Uniform Example: I get 1 $ for each point on the dice, I can expect 3.5 dollars at each roll, and a Standard Deviation of sqrt(35/12)~1.7

Binomial Random Variable Experiment = N coin tosses, each one with prob(H)=p; prob(T)=1-p The random variable X is the number of heads in the n-toss sequence We refer to X as a BINOMIAL RANDOM VARIABLE WITH PARAMETERS n AND p

Binomial Random Variable The PMF of X consists of the binomial probabilities we have seen some time ago … Two parts: probability of a sequence with k heads and n-k tails Number of sequences with k heads and n-k tails

Binomial Random Variable The normalization property can be written as We will study this more in the future …

QUESTION There are 94 students Each has probability 1/3 to get an A The number of students that get an A is a random variable What is its mean ? (how many are expected to get an A)

Mean of the Binomial If we want the mean of the binomial, we first need to learn how to handle JOINT PMFs of MULTIPLE RANDOM VARIABLES

JOINT PMFs of MULTIPLE RANDOM VARIABLES Consider 2 discrete random variables, X and Y associated with the same experiment The probabilities of the values that X and Y can take, are captured by the JOINT PMF of X and Y, written: pX,Y pX,Y(x,y)=P(X=x,Y=y)

JOINT PMF of 2 RV (if we consider the pair X,Y as a random variable, all ideas transfer …) If A is an event (set of pairs (x,y) that have a certain property) then P((X,Y) in A)=S(x,y in A)pX,Y(x,y)

students Consider the random variable Xi that is 1 if student “i” gets an A, and 0 otherwise If n students, probability p, this is np

Conclusion Mean of Random Variables Variance of Random Variables Properties, relations for variance and moments Bernoulli Discrete Uniform, … General Methods for variance calculation

topics Some probability distributions Some real applications: decision making; modeling clashes between ants Modeling the distribution of ‘ping’ times …

Marginalization For a fixed value y, Using the definition of conditional probability, we have:

Random Variables Joint probability Conditional probability Independence

Joint Probability It is common for several random variables to be defined on the same sample space. If X and Y are random variables, the function f(x,y) = Pr{X = x and Y = y} is the joint probability mass function of X and Y.

Independent Random Variables We define two random variables X and Y to be independent if for all x and y, the events X = x and Y = y are independent or, equivalently, if for all x and y, we have Pr{X = x and Y = y} = Pr{X = x} Pr{Y = y}.

Functions of Random Variables Given a set of random variables defined over the same sample space, one can define new random variables as sums, products, or other functions of the original variables.

Expected value of a random variable The simplest and most useful summary of the distribution of a random variable is the "average" of the values it takes on. The expected value (or, synonymously, expectation or mean) of a discrete random variable X is

Expectation of joint RVs Given random variables X and Y, and given their PMF: P{X=x and Y=y}, what is their joint expectation ? E[X,Y] Easy if they are independent …

Expectation of Joint Independent RVs

In general… In general, when n random variables X1, X2, . . . , Xn are mutually independent, E[X1X2 Xn] = E[X1]E[X2] E[Xn] .

More about independent RVs… When X and Y are independent random variables, Var[X + Y] = Var[X] + Var[Y]. (whereas for ANY random variables the expectation of the sum is the sum of their expectations, that is, E[X + Y] = E[X] + E[Y] , )

The Geometric Distribution A coin flip is an instance of a Bernoulli trial, which is defined as an experiment with only two possible outcomes: success, which occurs with probability p, and failure, which occurs with probability q = 1 - p. When we speak of Bernoulli trials collectively, we mean that the trials are mutually independent and that each has the same probability p for success. Two important distributions arise from Bernoulli trials: the geometric distribution and the binomial distribution.

Geometric Distribution Take a sequence of Bernoulli trials, each with a probability p of success and a probability q = 1 - p of failure. How many trials occur before we obtain a success?

Geometric Distribution Let the random variable X be the number of trials needed to obtain a success. Then X has values in the range {1, 2, . . .}, and Pr{X = k} = qk-1p , (for k larger than 0) since we have k - 1 failures before the one success. A probability distribution satisfying this equation is said to be a geometric distribution.

Geometric Distribution This is the geometric dictribution (picture taken from Cormen, Leiserson and Rivest’s book on Algorithms) In this case, the coin has probability p = 1/3 of success and a probability q = 1 - p of failure

Geometric distribution Expectation: we can use the relation That holds when the summation is infinite and |x| < 1

Geometric Distribution The expectation of the distribution is 1/p = 3.

Geometric Distribution The variance, which can be calculated similarly, is Var[X] = q/p2 Example: repeatedly roll two dice until we obtain either a seven or an eleven. Of the 36 possible outcomes, 6 yield a seven and 2 yield an eleven. Thus, the probability of success is p = 8/36 = 2/9, and we must roll 1/p = 9/2 = 4.5 times on average to obtain a seven or eleven. NEXT WEEK we will implement things like this ….

BINOMIAL DISTRIBUTION How many successes occur during n Bernoulli trials, where a success occurs with probability p and a failure with probability q = 1 - p?

Binomial Distribution Define the random variable X to be the number of successes in n trials. Then X has values in the range {0, 1, . . . , n}, and for k = 0, . . . , n, since there are ways to pick which k of the n trials are successes, and the probability that each occurs is pkqn-k. A probability distribution satisfying this equation is said to be a binomial distribution.

Binomial Distribution Let Xi be the random variable describing the number of successes in the ith trial. Then E[Xi] = p*1+ q*0 = p, and by linearity of expectation, the expected number of successes for n trials is

Binomial Distribution Similarly we can do for the variance, exploiting the relation Var[X]=E[X2] - E2[X] Since Xi only takes on the values 0 and 1, we have E[X2] = E[X]=p And hence Var[Xi] = p - p2 = pq . Then we can use independence, to move from Var[Xi] to the variance of the binomial …

Binomial Distribution The binomial distribution increases as k runs from 0 to n until it reaches the mean np, and then it decreases. Picture from cormen, leiserson, rivest’s book

Binomial Distribution

Conclusion Conditional PMF in RVs Independence Expectation and Variance for RVs Geometric distribution Binomial distribution  next: we will implement all of these ideas…

EXTRA MATERIAL (NOT COVERED IN CLASS)

Cards ♠ ♣ ♥ ♦ Ace 2 3 4 5 6 7 8 9 10 Jack Queen King

Counting … Probability of Generating a growing sequence of cards … (1,2,3,4,5,6,7,8,9,…) Probability of starting with a 1 * probability of having a 2 * …* probability of having a king…

COUNTING METHODS How many ways to obtain K heads and N-K tails in N coin tosses ? How many ways to have a 4-of-a-kind ?

Basic Counting Two experiments are performed. The first one can have any one of N possible outcomes, the second one any of M possible outcomes.  there are MN possible outcomes for the two experiments considered together

Basic Counting How many different arrangements of the letters A,B,C are possible ? ABC ACB BAC BCA CAB CBA Each arrangement known as a PERMUTATION. There are 6 possible permutations of a set of 3 objects There are N! permutations of a set of N objects N!=N(N-1)(N-2)…3*2*1

Combinations How many different groups of M objects can I form from a total of N objects ? (e.g. how many groups of 5 cards can I form from a deck of 52 ?) (there are 52 ways to select the first; 51 to select the second; … but we are counting each group each time we see one of its possible orderings… we need to correct for this …) (52*51*50*49*48)/(5*4*3*2*1)

Combinations Ways of choosing k elements out of a set of n elements:

Combinations and Permutations How many ways to put N balls in K boxes ? OOO11O1O1OOO11  example 1 is the boundary of the box  will use G=(K-1) 1s O is the ball  N will use: Os (N+G)! Correct for permutations of the 1s and of the 0s: (N+G)!/(N!G!) If create and M=N+G M!/(M-G)!G! Same as before… In example: G=6; K=7 N=8; M=14

COUNTING METHODS Combinations VS permutations How many sets of 3 numbers out of 10 ? How many ordered sets of 3 numbers ?

Pascal’s Triangle

Pascal’s Triangle

Binomial Coefficient and Pascal’s Triangle A number in the triangle can be found by nCr (n Choose r) where n is the number of the row and r is the element in that row. For example, in row 3, 1 is the zeroth element, 3 is element number 1, the next three is the 2nd element, and the last 1 is the 3rd element. The formula for nCr is: n! -------- r!(n-r)!

Examples How many ways to select 5 cards from the deck ? How many ways to have 4 equal cards in a set of 5 ? Probability of selecting 5 cards containing a poker ?

Poker Probabilities Deck of 52 cards, ranked: ace, king, queen, jack, 10,9,8,7,6,5,4,3,2 (and ace again: it can be either high or low) 4 suits: spades, hearts, diamonds and clubs 5 card draw; 5 cards make up a poker hand The highest hand wins Hands are ranked as follows:

Poker Probabilities Royal flush  10, J, Q, K, A of the same suit Four of a kind  4 cards of the same RANK Full house 3 cards of the same rank + 2 cards of the same rank Flush  5 cards of the same suit …

Poker Probabilities How many poker hands ? 2,598,960

Poker Probabilities How many combinations of royal flush? 4 (probability: 0.00000154) How many combinations of 4-of-a-kind ? 624

Consider a number of experiments with poker cards Write down SAMPLE SPACE Count possible outcomes for each experiment (see book, or handouts) ♠ ♣ ♥ ♦ Ace 2 3 4 5 6 7 8 9 10 Jack Queen King

Kind of questions… Probability of having King of ♣ at first draw ? Probability of having 4 kings ? Probability of having any set of 4 equal cards ? When we ask to write sample space for 5-cards experiment, we do not mean to list all of the outcomes (they are about 2.5 million), just to show you know what the sample space is: e.g. {all hands of 5 cards}, or {{2S, 2C,2D, 2H,3S},…{KS,KC,KD,KH, AC},…}

How to do the homework… Always: write down probabilistic model Use one of the 3 formulae we have for COUNTING number of events of a certain type, or of outcomes Use definitions like: P(event)= # outcomes in event / #possible outcomes

Combinations Ways of choosing k elements out of a set of n elements: HOW MANY COMMITTEES OF 5 PEOPLE CAN WE MAKE OUT OF A CLASS OF 10 PEOPLE ?

Poker Probabilities How many poker hands ? 2,598,960

Poker Probabilities How many combinations of royal flush? 4 (probability: 0.00000154) How many combinations of 4-of-a-kind ? 624