Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Primer on Probability Sushmita Roy BMI/CS 576 Sushmita Roy Oct 2nd, 2012 BMI/CS 576.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Scores and substitution matrices in sequence alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 11 th,
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Introduction of Probabilistic Reasoning and Bayesian Networks
What is Statistical Modeling
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Visual Recognition Tutorial
Probability.
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
Machine Learning CMPT 726 Simon Fraser University
Computer vision: models, learning and inference
Thanks to Nir Friedman, HU
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Review: Probability Random variables, events Axioms of probability
Elec471 Embedded Computer Systems Chapter 4, Probability and Statistics By Prof. Tim Johnson, PE Wentworth Institute of Technology Boston, MA Theory and.
Probability, Bayes’ Theorem and the Monty Hall Problem
Recitation 1 Probability Review
Machine Learning Queens College Lecture 3: Probability and Statistics.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 26 of 41 Friday, 22 October.
Random Sampling, Point Estimation and Maximum Likelihood.
Introduction to Probability Theory March 24, 2015 Credits for slides: Allan, Arms, Mihalcea, Schutze.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Intro to Probability for Discrete Variables BMI/CS 576 Colin Dewey Fall 2015.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
V7 Foundations of Probability Theory „Probability“ : degree of confidence that an event of an uncertain nature will occur. „Events“ : we will assume that.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Computer vision: models, learning and inference Chapter 2 Introduction to probability.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 24 of 41 Monday, 18 October.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Lecture 1.31 Criteria for optimal reception of radio signals.
Oliver Schulte Machine Learning 726
Review of Probability.
12. Principles of Parameter Estimation
Parameter Estimation 主講人:虞台文.
From last time: on-policy vs off-policy Take an action Observe a reward Choose the next action Learn (using chosen action) Take the next action Off-policy.
Bayes Net Learning: Bayesian Approaches
Data Mining Lecture 11.
Review of Probability and Estimators Arun Das, Jason Rebello
Lecture 26: Faces and probabilities
Probability Topics Random Variables Joint and Marginal Distributions
Statistical NLP: Lecture 4
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
12. Principles of Parameter Estimation
Mathematical Foundations of BME Reza Shadmehr
Applied Statistics and Probability for Engineers
Presentation transcript:

Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15

Announcements Read Forsyth & Ponce, Chapter 1.2, 1.4, 7.4 on cameras, sampling for Friday

Outline Random variables –Discrete –Continuous Joint, conditional probability Probabilistic inference

Discrete Random Variables A discrete random variable X has a domain of values fx 1, …, x n g that it can assume, each with a particular probability in the range [0, 1] For example, let X = Weather. Then the domain might be fsun, rain, clouds, snowg Use A, B to denote boolean random variables whose domain is ftrue, falseg

Discrete Probability P (X = x i ) is the probability that X has the value x i Can use P (x i ) where random variable is clear –Boolean variables: P (A) ´ P (A = true) and P (:A) ´ P (A = false) The probability distribution P (X) is the vector of probabilities over X ‘s domain: P (X) = (P (X = x 1 ), …, P (X = x n ))

Meaning of Probability Probability can be interpreted as –The strength of our belief or certainty that a random variable has a particular value in the absence of evidence (sometimes called prior probability) –The frequency with which the random variable will have that value if it is repeatedly measured Because a random variable must take one of the values in its domain, probability distributions sum to 1: § i P (X = x i ) = 1

Example: Distribution on Weather So for our example, we might have… –The probabilities for the individual values: P (sun) = 0.7 P (rain) = 0.2 P (clouds) = 0.08 P (snow) = 0.02 –The probability distribution P (Weather) = (0.7, 0.2, 0.08, 0.02) P (Weather = sun)

Joint Probability Joint Probability: The probability that multiple events occur: P (X = x, Y = y) or P (x, y) –E.g.: P (Weather = sun, Temperature = warm ) We can thus define the joint probability distribution P (X, Y) –This is an M 1 £... £ M n table for n random variables with M i values in their domains –Table entries sum to 1

Example: Joint Probability Distribution P (Weather, Temperature ) P (sun, warm ) = 0.5P (sun, cold ) = 0.2 P (rain, warm ) = 0.05P (rain, cold ) = 0.15 P (clouds, warm ) = 0.03 P (clouds, cold ) = 0.05 P (snow, warm ) = P (snow, cold ) = Weather Temperature

Continuous Random Variables When the variable X has a continuous domain of possible values [x low, x high ], it doesn’t make sense to talk about the probability of a particular value. Instead, we define a probability density function (PDF) p(x) such that

Probability Density Function: Properties p(x) is non-negative. By analogy with discrete probability distributions: For n -dimensional joint distributions, the PDF is a surface evaluated over vectors p(x) Most definitions are analogous to discrete versions— just substitute integration for summation

Example: the Normal Distribution Here the PDF is defined by an n -D Gaussian function: Representations of joint PDF for 2-D Gaussians with random variables X, Y Y X

Histograms Definition: Count of instances per bin Example: Random variable brightness is really continuous, but discretized to [0, 255] courtesy of MathWorks Brightness histogram Original image

Histograms as PDF Representations Dividing every bin count by the total count captures frequency of occurrence in that range –E.g., P (brightness 2 [x, x + dx])

Marginalization Summing a discrete joint distribution over all possible values of one random variable effectively removes that variable For a two-variable joint distribution, this means summing all rows or all columns

Example: Marginalization of P (Weather, Temperature ) P (sun, warm ) = 0.5P (sun, cold ) = 0.2 P (rain, warm ) = 0.05P (rain, cold ) = 0.15 P (clouds, warm ) = 0.03 P (clouds, cold ) = 0.05 P (snow, warm ) = P (snow, cold ) = Weather Temperature P (Temperature) P (Weather)

Conditional Probability The conditional probability P (X = x j Y = y) quantifies the change in our beliefs given knowledge of some other event. This “after the evidence” probability is sometimes called the posterior probability on X, and it is defined with the product rule: P (X = x, Y = y) = P (X = x j Y = y) P (Y = y) In terms of joint probability distributions, this is: P (X, Y) = P (X j Y) P (Y) Independence: P (X, Y) = P (X ) P (Y), which implies that P (X j Y) = P (X) remember that these are different distributions

Example: Conditional Probability Distribution P (Temperature j Weather) Divide joint distribution by marginal –E.g., P (warm j sun) = P (sun, warm)/P (sun) Recall P (Temperature) = (0.581, 0.419) P (warm j sun ) = 0.5/0.7 = 0.71 P (cold j sun ) = 0.2/0.7 = 0.29 P (warm j rain ) = 0.05/0.2 = 0.25 P (cold j rain ) = 0.15/0.2 = 0.75 P (warm j clouds ) = 0.03/0.08 = P (cold j clouds ) = 0.05/0.08 = P (warm j snow ) = 0.001/0.02 = 0.05 P (cold j snow ) = 0.019/0.02 = 0.95 Weather Temperature warm cold

Conditioning By the relationship of conditional probability to joint probability P (Y, X) = P (Y j X) P (X) we can write marginalization a different way:

Bayes’ Rule Equating P (X, Y) and P (Y, X) and applying the definition of conditional probability, we have: P (X j Y) P (Y) = P (Y j X) P (X) and so likelihood prior on X posterior on X evidence

Bayes’ Rule By conditioning, the evidence P (Y) is just a normalizing factor that ensures that the posterior sums to 1: Thus, only the likelihood and prior matter:

Bayes’ Rule for Inference Suppose X represents possible hypotheses about some aspect of the world (e.g., the weather today), and Y represents some relevant data we have measured (e.g., thermometer temperature). Inference is the process of reasoning about how likely different values of X are conditioned on Y Bayes’ rule can be a useful inference tool when there is an imbalance in our knowledge or it is difficult to quantify hypothesis probabilities directly Inferring hypothesis values is often called parameter estimation

Maximum a posteriori (MAP): Inference Choose a parameter value x MAP for the hypothesis X that maximizes the probability of the observed data Y = y For discrete distributions, this means calculating the posterior probability P (X j y) for all different values of X For continuous distributions, we may be able to employ differential techniques such as looking for values where the derivative of the posterior is 0

Maximum Likelihood (ML) Inference MAP with a uniform prior (either we don’t know it or believe it to be unimportant):

Example: Estimating the Temperature Suppose we want to know what the weather will be like on the basis of a thermometer reading Say we don’t have direct knowledge of P (Weather j Temperature), but the thermometer reading indicates a chill, and we do know something about P (Temperature j Weather)

Example: Temperature Estimation MAP: From logging past occurrences, we think P (Weather) = ( 0.7, 0.2, 0.08, 0.02). –This leads us to infer that it is sunny, but not by much over rainy ML: Ignoring the weather prior, snow is most likely How much better our estimate is than the other possibilities says something about how good it is—we would be much more certain of sunniness for a warm thermometer reading P (sun j cold) / P (cold j sun ) P (sun ) = 0.29 * 0.7 = P (rain j cold) / P (cold j rain ) P (rain ) = 0.75 * 0.2 = 0.15 P (clouds j cold) / P (cold j clouds ) P (clouds ) = * 0.08 = 0.05 P (snow j cold) / P (cold j snow ) P (snow ) = 0.95 * 0.02 = 0.019