Review of Probability and Estimators Arun Das, Jason Rebello

Slides:



Advertisements
Similar presentations
CS433: Modeling and Simulation
Advertisements

Simulation with ArenaAppendix C – A Refresher on Probability and StatisticsSlide 1 of 33 A Refresher on Probability and Statistics Appendix C.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Chapter 4 Probability and Probability Distributions
Laws of division of casual sizes. Binomial law of division.
DEPARTMENT OF HEALTH SCIENCE AND TECHNOLOGY STOCHASTIC SIGNALS AND PROCESSES Lecture 1 WELCOME.
Chapter 4 Discrete Random Variables and Probability Distributions
Introduction to stochastic process
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Introduction to Probability and Statistics
Short review of probabilistic concepts Probability theory plays very important role in statistics. This lecture will give the short review of basic concepts.
Maximum likelihood (ML) and likelihood ratio (LR) test
Probability: Review TexPoint fonts used in EMF.
Probability and Statistics Review
Thanks to Nir Friedman, HU
Review of Probability and Statistics
Probability and Statistics Review Thursday Sep 11.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
OUTLINE Probability Theory Linear Algebra Probability makes extensive use of set operations, A set is a collection of objects, which are the elements.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
Modeling and Simulation CS 313
Chapter 8 Probability Section R Review. 2 Barnett/Ziegler/Byleen Finite Mathematics 12e Review for Chapter 8 Important Terms, Symbols, Concepts  8.1.
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Theory of Probability Statistics for Business and Economics.
0 K. Salah 2. Review of Probability and Statistics Refs: Law & Kelton, Chapter 4.
Random Variables. A random variable X is a real valued function defined on the sample space, X : S  R. The set { s  S : X ( s )  [ a, b ] is an event}.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Probability theory Petter Mostad Sample space The set of possible outcomes you consider for the problem you look at You subdivide into different.
Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 5.2: Recap on Probability Theory Jürgen Sturm Technische Universität.
CS433 Modeling and Simulation Lecture 03 – Part 01 Probability Review 1 Dr. Anis Koubâa Al-Imam Mohammad Ibn Saud University
Dr. Ahmed Abdelwahab Introduction for EE420. Probability Theory Probability theory is rooted in phenomena that can be modeled by an experiment with an.
Probability Review-1 Probability Review. Probability Review-2 Probability Theory Mathematical description of relationships or occurrences that cannot.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
AP STATISTICS Section 7.1 Random Variables. Objective: To be able to recognize discrete and continuous random variables and calculate probabilities using.
Machine Learning 5. Parametric Methods.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
Lecture 7 Dustin Lueker.  Experiment ◦ Any activity from which an outcome, measurement, or other such result is obtained  Random (or Chance) Experiment.
Welcome to MM305 Unit 3 Seminar Prof Greg Probability Concepts and Applications.
Chapter 2: Probability. Section 2.1: Basic Ideas Definition: An experiment is a process that results in an outcome that cannot be predicted in advance.
Probability Distribution. Probability Distributions: Overview To understand probability distributions, it is important to understand variables and random.
Conditional Expectation
Chap 4-1 Chapter 4 Using Probability and Probability Distributions.
Probability and Probability Distributions. Probability Concepts Probability: –We now assume the population parameters are known and calculate the chances.
4. Overview of Probability Network Performance and Quality of Service.
Pattern Recognition Probability Review
APPENDIX A: A REVIEW OF SOME STATISTICAL CONCEPTS
Lecture 1.31 Criteria for optimal reception of radio signals.
MECH 373 Instrumentation and Measurements
Review of Probability.
Welcome to MM305 Unit 3 Seminar Dr
Random variables (r.v.) Random variable
Chapter 6 6.1/6.2 Probability Probability is the branch of mathematics that describes the pattern of chance outcomes.
Probability Theory and Parameter Estimation I
Appendix A: Probability Theory
3.1 Expectation Expectation Example
Lecture 11 Sections 5.1 – 5.2 Objectives: Probability
STA 291 Spring 2008 Lecture 7 Dustin Lueker.
Advanced Artificial Intelligence
HKN ECE 313 Exam 1 Review Session
Parametric Methods Berlin Chen, 2005 References:
AP Statistics Chapter 16 Notes.
M248: Analyzing data Block A UNIT A3 Modeling Variation.
Experiments, Outcomes, Events and Random Variables: A Revisit
Applied Statistics and Probability for Engineers
Presentation transcript:

Review of Probability and Estimators Arun Das, Jason Rebello 16/05/2017 Jason Rebello | Waterloo Autonomous Vehicles Lab

Probability helps to fuse sensory information Probability Review | Why is probability used in Robotics ? State of robot (position, velocity ) and state of its environment are unknown and only noisy sensors available (GPS, IMU) Probability helps to fuse sensory information Provides a distribution over possible states of the robot and environment Dynamics is often stochastic, hence can’t optimize for a particular outcome, but only optimize to obtain a good distribution over outcomes ! Probability provides a framework to reason in this setting " Result: ability to find good control policies for stochastic dynamics and environments Jason Rebello | Waterloo Autonomous Vehicles Lab

Probability for any event A in the set of all possible outcomes. Probability Axioms | Probability for any event A in the set of all possible outcomes. Probability over the set of all possible outcomes Probability of the Union of events If the events are mutually exclusive Jason Rebello | Waterloo Autonomous Vehicles Lab

Random Variables | Random Variable assigns a value to each possible outcome of a probabilistic experiment. Example: Toss 2 dice: random Variable X is the sum of the numbers on the dice Discrete Distinct Values Probability Mass function Eg: X = (1) Heads, (0) Tails Y = Year a random student was born (2000,2001,2002,..) Continuous Any value in some interval Probability Density function Eg: Weight of random animal Suppose that to each point of a sample space we assign a number. We then have a function defined on the sample space. This function is called a random variable (or stochastic variable) or more precisely a random function (stochastic function). It is usually denoted by a capital letter such as X or Y. In general, a random variable has some specified physical, geometrical, or other significance In probability and statistics, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.[1] The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. Jason Rebello | Waterloo Autonomous Vehicles Lab

E[c] = c … where ‘c’ is a constant Random Variables | Expected Value It is the long-run average value of repetitions of the experiment it represents. Expectation is the probability weighted average of all possible values. E[X] = ∑x x * P(x) Eg. For a dice. E(X) = (⅙)*1 + (⅙)*2 + (⅙)*3 + (⅙)*4 + (⅙)*5 + (⅙)*6 = 3.5 Properties: E[c] = c … where ‘c’ is a constant E[X+Y] = E[X] + E[Y] and E[aX] = aE[X] .. Expected value operator is linear E[X | Y=y] = ∑x x * P(X=x | Y=y) … Conditional Expectation E[X] ≤ E[Y] if X ≤ Y … Inequality condition Suppose that to each point of a sample space we assign a number. We then have a function defined on the sample space. This function is called a random variable (or stochastic variable) or more precisely a random function (stochastic function). It is usually denoted by a capital letter such as X or Y. In general, a random variable has some specified physical, geometrical, or other significance In probability and statistics, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.[1] The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. Jason Rebello | Waterloo Autonomous Vehicles Lab

Variance measures the dispersion around the mean value. Random Variables | Variance Variance measures the dispersion around the mean value. Var[X] = 𝜎2 = E [ X- 𝜇 ]2 Var[X] = E[ X2 ] - E[ X ]2 Eg: For a dice. E[ X2 ] = (⅙)*12 + (⅙)*22 + (⅙)*32 + (⅙)*42 + (⅙)*52 + (⅙)*62 = 91/6 E[ X ] = 7/2 Var[X] = (91/6) - (7/2)2 = 2.9166667 Properties: Var[ aX+b ] = a2 Var[X] .. Variance is not linear Var[X+Y] = Var[X] + Var[Y] .. If X and Y are independent Suppose that to each point of a sample space we assign a number. We then have a function defined on the sample space. This function is called a random variable (or stochastic variable) or more precisely a random function (stochastic function). It is usually denoted by a capital letter such as X or Y. In general, a random variable has some specified physical, geometrical, or other significance In probability and statistics, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.[1] The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function, whose value at any given sample (or point) in the sample space (the set of possible values taken by the random variable) can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. Jason Rebello | Waterloo Autonomous Vehicles Lab

Independent random variables Joint and Conditional Probability | Joint Probability Joint Probability of Independent random variables Conditional Probability Conditional Probability of Independent random variables A joint probability is a statistical measure where the likelihood of two events occurring together and at the same point in time are calculated. Joint probability is the probability of event Y occurring at the same time event X occurs. Jason Rebello | Waterloo Autonomous Vehicles Lab

Mutually Exclusive, Independent Events | What is the difference between Mutually exclusive events and Independent Events ? Jason Rebello | Waterloo Autonomous Vehicles Lab

Mutually Exclusive, Independent Events | Events are mutually exclusive if the occurrence of one event excludes the occurrence of other events. Eg Tossing a coin. The result can either be heads or tails but not both P(A U B) = P(A) + P(B) P(A,B) = 0 Events are independent if the occurrence of one event does not influence the occurrence of the other event. Eg Tossing two coins. The result of first flip does not affect the result of the second P(A U B) = P(A) + P(B) - P(A)*P(B) P(A,B) = P(A)*P(B) Jason Rebello | Waterloo Autonomous Vehicles Lab

Law of Total Probability and Marginals | The distribution of the marginal variables (the marginal distribution) is obtained by marginalizing over the distribution of the variables being discarded, and the discarded variables are said to have been marginalized out. Given two random variables X and Y whose joint distribution is known, the marginal distribution of X is simply the probability distribution of X averaging over information about Y. It is the probability distribution of X when the value of Y is not known. Jason Rebello | Waterloo Autonomous Vehicles Lab

Probability Example | Calculate Marginal Probability of person being hit by car without paying attention to traffic light ? Assume P(L=red) = 0.2, P(L=yellow)=0.1, P(L=green)=0.7 P(hit | colour)+P(not hit | colour) = 1 P(hit, L=red) = P(hit | L=red) * P(L=red) = 0.01*0.2 = 0.002 ∑colourP(hit) = ∑colourP(hit,all colour) = ∑colourP(hit | any colour) * P(any colour) = P(hit | red) * P(red) + P(hit | yellow) * P(yellow)+ P(hit | green) * P(green) = 0.01*0.2 + 0.1*0.1 + 0.8*0.7 = 0.572 Jason Rebello | Waterloo Autonomous Vehicles Lab

Parameter Inference | Experiment: We flip a coin 10 times and have the following outcome. What is the Probability that the next coin flip is T ? ≅ 0.3 ≅ 0.38 ≅ 0.5 ≅ 0.76 Jason Rebello | Waterloo Autonomous Vehicles Lab

Parameter Inference | Why is 0.3 a good estimate Every flip is random. So every sequence of flips is random. We have a parameter that tells us if the next flip is going to be tails. The sequence is modeled by the parameters 𝜃1, ….. 𝜃10 Find 𝜃i’s such that the above probability is as high as possible. Maximize the likelihood of our observation (Maximum Likelihood) Jason Rebello | Waterloo Autonomous Vehicles Lab

Parameter Inference | Assumptions Assumption 1 (Independence): The coin flips do not affect each other Assumption 2 (Identically Distributed): The coin flips are qualitatively the same Independent and Identically Distributed : Each random variable has the same probability distribution as others and all are mutually independent Jason Rebello | Waterloo Autonomous Vehicles Lab

Find critical point of the above function. Parameter Inference | Find critical point of the above function. Monotonic functions preserve critical points. Use log to make things simpler argmax𝜃 ln [ 𝜃3 (1-𝜃)7 ] = argmax𝜃 |T| ln 𝜃 + |H| ln (1-𝜃) Taking the derivative and equating to 0. = 3/10 = 0.3 This justifies the answer of 0.3 to the original question. We prefer natural logs (that is, logarithms base ee) because, as described above, coefficients on the natural-log scale are directly interpretable as approximate proportional differences: with a coefficient of 0.06, a difference of 1 in xx corresponds to an approximate 6% difference in yy, and so forth. Jason Rebello | Waterloo Autonomous Vehicles Lab

Parameter Inference | Maximum Likelihood Estimation Suppose there is a sample 𝑥1 .. 𝑥n of n independent and identically distributed observations coming from a distribution with an unknown probability density function. Joint Density Function: Consider the observed values to be fixed parameters and allow 𝜃 to vary freely Likelihood: More convenient to work with natural log of the likelihood function Log-Likelihood: Average Log-Likelihood: Maximum Likelihood Estimator: Jason Rebello | Waterloo Autonomous Vehicles Lab

Parameter Inference | Problems with MLE Let’s assume we tossed the coin twice and got the following sequence The probability of seeing a tails in the next toss is Since no tails observed 𝜃MLE = 0 MLE is a point estimator and is prone to Overfitting. How do we solve this ? Assume a prior on 𝜃 Jason Rebello | Waterloo Autonomous Vehicles Lab

Bayes Rule Parameter Inference | Bayes Rule : Likelihood : Prior : Evidence : Posterior Jason Rebello | Waterloo Autonomous Vehicles Lab

Parameter Inference | Choosing the Prior (a-1) and (b-1) are the number of T and H we think we would see, if we made (a+b-2) many coin flips a = 1.0 a = 2.0 b = 1.0 b = 3.0 a = 8.0 a = 0.1 b = 4.0 b = 0.1 Jason Rebello | Waterloo Autonomous Vehicles Lab

Maximum A Posteriori Estimation Parameter Inference | Maximum - A - Posteriori Evidence Likelihood Prior Maximum A Posteriori Estimation Jason Rebello | Waterloo Autonomous Vehicles Lab

Homework | Determine the Maximum Likelihood Estimator for the mean and variance of a Gaussian Distribution Jason Rebello | Waterloo Autonomous Vehicles Lab