Probability and Statistics Review Thursday Sep 11.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
CS433: Modeling and Simulation
Simulation with ArenaAppendix C – A Refresher on Probability and StatisticsSlide 1 of 33 A Refresher on Probability and Statistics Appendix C.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Laws of division of casual sizes. Binomial law of division.
Lec 18 Nov 12 Probability – definitions and simulation.
Review of Basic Probability and Statistics
Probability Review 1 CS479/679 Pattern Recognition Dr. George Bebis.
Joint Distributions, Marginal Distributions, and Conditional Distributions Note 7.
Background Knowledge Brief Review on Counting,Counting, Probability,Probability, Statistics,Statistics, I. TheoryI. Theory.
Statistics Lecture 18. Will begin Chapter 5 today.
Probability Theory Part 2: Random Variables. Random Variables  The Notion of a Random Variable The outcome is not always a number Assign a numerical.
CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California
Visual Recognition Tutorial
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
Statistics Lecture 7. Last day: Completed Chapter 2 Today: Discrete probability distributions Assignment 3: Chapter 2: 44, 50, 60, 68, 74, 86, 110.
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Probability: Review TexPoint fonts used in EMF.
Thanks to Nir Friedman, HU
Information Theory and Security
Probability Review Thursday Sep 13.
Joint Distribution of two or More Random Variables
Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.
Expected Value (Mean), Variance, Independence Transformations of Random Variables Last Time:
Probability, Bayes’ Theorem and the Monty Hall Problem
Recitation 1 Probability Review
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
2. Mathematical Foundations
Pairs of Random Variables Random Process. Introduction  In this lecture you will study:  Joint pmf, cdf, and pdf  Joint moments  The degree of “correlation”
OUTLINE Probability Theory Linear Algebra Probability makes extensive use of set operations, A set is a collection of objects, which are the elements.
Modeling and Simulation CS 313
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Probability Review –Statistical Estimation (MLE) Readings: Barber 8.1, 8.2.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Theory of Probability Statistics for Business and Economics.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
Probability theory Petter Mostad Sample space The set of possible outcomes you consider for the problem you look at You subdivide into different.
STA347 - week 51 More on Distribution Function The distribution of a random variable X can be determined directly from its cumulative distribution function.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 5.2: Recap on Probability Theory Jürgen Sturm Technische Universität.
Probability Theory for Networking Application Rudra Dutta ECE/CSC Fall 2010, Section 001, 601.
Probability and Statistics Review Thursday Mar 12.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
Topic 2: Intro to probability CEE 11 Spring 2002 Dr. Amelia Regan These notes draw liberally from the class text, Probability and Statistics for Engineering.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Probability Refresher. Events Events as possible outcomes of an experiment Events define the sample space (discrete or continuous) – Single throw of a.
Probability (outcome k) = Relative Frequency of k
Basics on Probability Jingrui He 09/11/2007. Coin Flips  You flip a coin Head with probability 0.5  You flip 100 coins How many heads would you expect.
1 Probability and Statistical Inference (9th Edition) Chapter 4 Bivariate Distributions November 4, 2015.
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
1 Review of Probability and Random Processes. 2 Importance of Random Processes Random variables and processes talk about quantities and signals which.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
CS 2750: Machine Learning Probability Review Prof. Adriana Kovashka University of Pittsburgh February 29, 2016.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
L Basic Definitions: Events, Sample Space, and Probabilities l Basic Rules for Probability l Conditional Probability l Independence of Events l Combinatorial.
Pattern Recognition Probability Review
Review of Probability.
Appendix A: Probability Theory
CS 2750: Machine Learning Probability Review Density Estimation
Review of Probabilities and Basic Statistics
Review of Probability and Estimators Arun Das, Jason Rebello
Advanced Artificial Intelligence
Statistical NLP: Lecture 4
Probability overview Event space – set of possible outcomes
Experiments, Outcomes, Events and Random Variables: A Revisit
Presentation transcript:

Probability and Statistics Review Thursday Sep 11

The Big Picture ModelData Probability Estimation/learning But how to specify a model?

Graphical Models How to specify the model? – What are the variables of interest? – What are their ranges? – How likely their combinations are? You need to specify a joint probability distribution – But in a compact way Exploit local structure in the domain Today: we will cover some concepts that formalize the above statements

Probability Review Events and Event spaces Random variables Joint probability distributions Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. Structural properties Independence, conditional independence Examples Moments

Sample space and Events  Sample Space, result of an experiment If you toss a coin twice  Event: a subset of  First toss is head = {HH,HT} S: event space, a set of events  Closed under finite union and complements Entails other binary operation: union, diff, etc. Contains the empty event and 

Probability Measure Defined over (  S  s.t. P(  ) >= 0 for all  in S P(  ) = 1 If  are disjoint, then P(  U  ) = p(  ) + p(  ) We can deduce other axioms from the above ones Ex: P(  U  ) for non-disjoint event

Visualization We can go on and define conditional probability, using the above visualization

Conditional Probability -P(F|H) = Fraction of worlds in which H is true that also have F true

Rule of total probability A B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7

From Events to Random Variable Almost all the semester we will be dealing with RV Concise way of specifying attributes of outcomes Modeling students (Grade and Intelligence):  all possible students What are events Grade_A = all students with grade A Grade_B = all students with grade A Intelligence_High = … with high intelligence Very cumbersome We need “functions” that maps from  to an attribute space.

Random Variables  High low A B A+ I:Intelligence G:Grade

Random Variables  High low A B A+ I:Intelligence G:Grade P(I = high) = P( {all students whose intelligence is high})

Probability Review Events and Event spaces Random variables Joint probability distributions Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. Structural properties Independence, conditional independence Examples Moments

Joint Probability Distribution Random variables encodes attributes Not all possible combination of attributes are equally likely Joint probability distributions quantify this P( X= x, Y= y) = P(x, y) How probable is it to observe these two attributes together? Generalizes to N-RVs How can we manipulate Joint probability distributions?

Chain Rule Always true P(x,y,z) = p(x) p(y|x) p(z|x, y) = p(z) p(y|z) p(x|y, z) =…

Conditional Probability But we will always write it this way: events

Marginalization We know p(X,Y), what is P(X=x)? We can use the low of total probability, why? A B1B1 B2B2 B3B3 B4B4 B5B5 B6B6 B7B7

Marginalization Cont. Another example

Bayes Rule We know that P(smart) =.7 If we also know that the students grade is A+, then how this affects our belief about his intelligence? Where this comes from?

Bayes Rule cont. You can condition on more variables

Probability Review Events and Event spaces Random variables Joint probability distributions Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. Structural properties Independence, conditional independence Examples Moments

Independence X is independent of Y means that knowing Y does not change our belief about X. P(X|Y=y) = P(X) P(X=x, Y=y) = P(X=x) P(Y=y) Why this is true? The above should hold for all x, y It is symmetric and written as X  Y

CI: Conditional Independence RV are rarely independent but we can still leverage local structural properties like CI. X  Y | Z if once Z is observed, knowing the value of Y does not change our belief about X The following should hold for all x,y,z P(X=x | Z=z, Y=y) = P(X=x | Z=z) P(Y=y | Z=z, X=x) = P(Y=y | Z=z) P(X=x, Y=y | Z=z) = P(X=x| Z=z) P(Y=y| Z=z) We call these factors : very useful concept !!

Properties of CI Symmetry: – (X  Y | Z)  (Y  X | Z) Decomposition: – (X  Y,W | Z)  (X  Y | Z) Weak union: – (X  Y,W | Z)  (X  Y | Z,W) Contraction: – (X  W | Y,Z) & (X  Y | Z)  (X  Y,W | Z) Intersection: – (X  Y | W,Z) & (X  W | Y,Z)  (X  Y,W | Z) – Only for positive distributions! – P(  )>0, 8 ,  ; You will have more fun in your HW1 !!

Probability Review Events and Event spaces Random variables Joint probability distributions Marginalization, conditioning, chain rule, Bayes Rule, law of total probability, etc. Structural properties Independence, conditional independence Examples Moments

Monty Hall Problem You're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1 The host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. Do you want to pick door No. 2 instead?

Host must reveal Goat B Host must reveal Goat A Host reveals Goat A or Host reveals Goat B

Monty Hall Problem: Bayes Rule : the car is behind door i, i = 1, 2, 3 : the host opens door j after you pick door i

Monty Hall Problem: Bayes Rule cont. WLOG, i=1, j=3

Monty Hall Problem: Bayes Rule cont.

  You should switch!

Moments Mean (Expectation): – Discrete RVs: – Continuous RVs: Variance: – Discrete RVs: – Continuous RVs:

Properties of Moments Mean – – If X and Y are independent, Variance – – If X and Y are independent,

The Big Picture ModelData Probability Estimation/learning

Statistical Inference Given observations from a model – What (conditional) independence assumptions hold? Structure learning – If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning

MLE Maximum Likelihood estimation – Example on board Given N coin tosses, what is the coin bias (  )? Sufficient Statistics: SS – Useful concept that we will make use later – In solving the above estimation problem, we only cared about N h, N t, these are called the SS of this model. All coin tosses that have the same SS will result in the same value of  Why this is useful?

Statistical Inference Given observation from a model – What (conditional) independence assumptions holds? Structure learning – If you know the family of the model (ex, multinomial), What are the value of the parameters: MLE, Bayesian estimation. Parameter learning We need some concepts from information theory

Information Theory P(X) encodes our uncertainty about X Some variables are more uncertain that others How can we quantify this intuition? Entropy: average number of bits required to encode X P(X) P(Y) XY

Information Theory cont. Entropy: average number of bits required to encode X We can define conditional entropy similarly We can also define chain rule for entropies (not surprising)

Mutual Information: MI Remember independence? If X  Y then knowing Y won’t change our belief about X Mutual information can help quantify this! (not the only way though) MI: Symmetric I(X;Y) = 0 iff, X and Y are independent!

Continuous Random Variables What if X is continuous? Probability density function (pdf) instead of probability mass function (pmf) A pdf is any function that describes the probability density in terms of the input variable x.

PDF Properties of pdf – Actual probability can be obtained by taking the integral of pdf – E.g. the probability of X being between 0 and 1 is

Cumulative Distribution Function Discrete RVs – Continuous RVs –

Acknowledgment Andrew Moore Tutorial: Monty hall problem :