Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin.

Slides:



Advertisements
Similar presentations
Stats for Engineers Lecture 5
Advertisements

Probabilistic models Haixu Tang School of Informatics.
Discrete Uniform Distribution
Important Random Variables Binomial: S X = { 0,1,2,..., n} Geometric: S X = { 0,1,2,... } Poisson: S X = { 0,1,2,... }
A. A. Elimam College of Business San Francisco State University Random Variables And Probability Distributions.
Biostatistics Unit 4 Probability.
FREQUENCY ANALYSIS Basic Problem: To relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions.
Review of Basic Probability and Statistics
Introduction to Probability and Statistics
BCOR 1020 Business Statistics Lecture 15 – March 6, 2008.
Chapter 6 Continuous Random Variables and Probability Distributions
Probability and Statistics Review
Some standard univariate probability distributions
A random variable that has the following pmf is said to be a binomial random variable with parameters n, p The Binomial random variable.
CHAPTER 6 Statistical Analysis of Experimental Data
3-1 Introduction Experiment Random Random experiment.
Some standard univariate probability distributions
Continuous Random Variables and Probability Distributions
Week 51 Theorem For g: R  R If X is a discrete random variable then If X is a continuous random variable Proof: We proof it for the discrete case. Let.
Some standard univariate probability distributions
Week 51 Relation between Binomial and Poisson Distributions Binomial distribution Model for number of success in n trails where P(success in any one trail)
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Review of Probability.
Chapter 4 Continuous Random Variables and Probability Distributions
Prof. SankarReview of Random Process1 Probability Sample Space (S) –Collection of all possible outcomes of a random experiment Sample Point –Each outcome.
Chapter Two Probability Distributions: Discrete Variables
Probability Theory and Random Processes
Machine Learning Queens College Lecture 3: Probability and Statistics.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Chapter 6: Probability Distributions
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
Chapter 5 Statistical Models in Simulation
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling STATISTICS Random Variables.
TELECOMMUNICATIONS Dr. Hugh Blanton ENTC 4307/ENTC 5307.
Random Variables & Probability Distributions Outcomes of experiments are, in part, random E.g. Let X 7 be the gender of the 7 th randomly selected student.
2.1 Random Variable Concept Given an experiment defined by a sample space S with elements s, we assign a real number to every s according to some rule.
Modeling and Simulation CS 313
Distributions Normal distribution Binomial distribution Poisson distribution Chi-square distribution Frequency distribution
Some standard univariate probability distributions Characteristic function, moment generating function, cumulant generating functions Discrete distribution.
Random Sampling, Point Estimation and Maximum Likelihood.
Theory of Probability Statistics for Business and Economics.
Review of Probability Concepts ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Continuous Distributions The Uniform distribution from a to b.
ENGR 610 Applied Statistics Fall Week 3 Marshall University CITE Jack Smith.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 5 Discrete Random Variables.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
STA347 - week 31 Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5’s in the 6 rolls. Let X = number of.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Exam 2: Rules Section 2.1 Bring a cheat sheet. One page 2 sides. Bring a calculator. Bring your book to use the tables in the back.
Random Variable The outcome of an experiment need not be a number, for example, the outcome when a coin is tossed can be 'heads' or 'tails'. However, we.
ENGR 610 Applied Statistics Fall Week 2 Marshall University CITE Jack Smith.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Probability and Distributions. Deterministic vs. Random Processes In deterministic processes, the outcome can be predicted exactly in advance Eg. Force.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
1 Probability: Introduction Definitions,Definitions, Laws of ProbabilityLaws of Probability Random VariablesRandom Variables DistributionsDistributions.
3.1 Statistical Distributions. Random Variable Observation = Variable Outcome = Random Variable Examples: – Weight/Size of animals – Animal surveys: detection.
Week 61 Poisson Processes Model for times of occurrences (“arrivals”) of rare phenomena where λ – average number of arrivals per time period. X – number.
R. Kass/W04 P416 Lec 3 1 Lecture 3 The Gaussian Probability Distribution Function Plot of Gaussian pdf x p(x)p(x) Introduction l The Gaussian probability.
MECH 373 Instrumentation and Measurements
Appendix A: Probability Theory
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
PROBABILITY DISTRIBUTION Dr.Fatima Alkhalidi
Review of Probabilities and Basic Statistics
Chapter 5 Statistical Models in Simulation
Multinomial Distribution
Probability Theory and Specific Distributions (Moore Ch5 and Guan Ch6)
Continuous Probability Distributions
Presentation transcript:

Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin

Can we answer that? 1 st draw M Red N-M Blue 2 nd draw ? N Balls Total ? P(R 1 |I) = (M/N)

The Red and the Blue Red-2  R 2 = (R 1 + B 1 ), R 2 M Red N-M Blue N Balls Total R 2 = R 1,R 2 + B 1, R 2 P(R 2 |I ) = P(R 1, R 2 | I ) + P(B 1, R 2 | I ) = P(R 1 | I ) P(R 2 | R 1, I ) + P(B 1 | I ) P(R 2 | B 1, I ) N - 1 M - 1MN - M N - 1 M N = + N M N = = P(R 1 |I ) Using product rule... = P(R 3 |I ) etc The Outcome of first draw is a “nuisance” parameter. Marginalize = Integrate over all options.

Marginalization RAINNO RAIN CLOUDS NO CLOUDS 1/6 0 1/3 1/2 1/65/6 Chance of Rain Chance of Cloud

Marginalization Where A i represents a set of Mutually Exclusive and Exhaustive possibilities, then marginalization or integrating out of “nuisance parameters” takes the form: P(  |D,I) =  i P( , A i |D,I) Or in the limit of a continuously variable parameter A (rather than discrete case above) P changes into a probability density function: P(  |D,I) =  dA P( , A|D,I) This technique is often required in inference, for example we may be interested in the frequency of a sinusoidal signal in noisy data, but not interested in the amplitude (a nuisance parameter)

Probability Distributions We denote probability distributions over all possible values of a variable x by p(x). Discrete Continuous Cumulative Lim [p(x < X < x+δx)] / δx δx→ 0

Properties of Probability Distributions The expectation value for a function g(X) is the weighted average:  g(X)  =  g(x) p(x) (discrete) All x ʃ g(x) f(x) dx (continuous)   If it exists, this is the first moment, or mean of the distribution. The r th moment for a random variable X about the origin (x=0) is:  ’ r =  X r  =  x r p(x) (discrete) All x ʃ x r f(x) dx (continuous)   The mean =  ’ 1 =  X    is the 1 st moment about the origin.

Properties of Probability Distributions Therefore the variance  x 2 =  X 2  –  X  2 The r th central moment for a random variable X about the mean (origin=  ) is:  r =  (X-  ) r  =  (x-  ) r p(x) (discrete) All x ʃ ( x-  ) r f(x) dx (continuous)   First central moment:  1 =  (X-  )  = 0 Second central moment: Var(X) =  x 2 =  ( X -  ) 2   x 2 =  ( X -  ) 2  =  ( X 2 – 2  X +  2 )  =  X 2  – 2   X  +  2 =  X 2  – 2  2 +  2 =  X 2  –  2 =  X 2  –  X  2

Properties of Probability Distributions Third central moment:  3 =  ( X -  ) 3  Skewness Fourth central moment:  4 =  ( X -  ) 4  Kurtosis The median and the mode both provide estimates of central tendency for a distribution, and are in many cases more robust against outliers than the mean.

Example: Mean and Median filtering Mean Filter Median Filter Image degraded by salt noise

The Uniform Distribution A flat distribution with peak value normalized so that the area under the curve=1 Uniform PDFCumulative Uniform PDF Commonly used as an ingnorance prior to express impartiality (a lack of bias) of the value of a quantity over the given interval. Round-off error, quantization error are uniformly distributed

The Binomial Distribution Binomial statistics apply when there are exactly two mutually exclusive outcomes of a trial (labelled "success" and "failure“). The binomial distribution gives the probability of observing k successes in n trials, with the probability of success on a single trial denoted by p (p is assumed fixed for all trials). Fixed n, Varying p Fixed p, Varying n Among the most useful discrete distribution functions in statistics. Multinomial distribution is a generalization for the case where there is more than a binary outcome. n

The Negative Binomial Distribution Closely related to the Binomial distribution, the Negative Binomial Distribution applies under the same circumstances but where the variable of interest is the number of trials n to obtain k successes and n-k failures (rather than the number of successes in N trials). For n Bernoulli trials each with success fraction p, the negative_binomial distribution gives the probability of observing k failures and n-k successes with success on the last trial:

The Poisson Distribution Another crucial discrete distribution function, the Poisson expresses the probability of a number of events k (e.g. failures, arrivals, occurrences...) occurring in a fixed period of time (or fixed area of space), provided these events occur with a known mean rate λ (events/time), and are independent of the previous event. Poisson distribution is the limiting case of a binomial distribution where the probability for success p goes to zero while the number of trials n grows such that λ = np is finite. Examples: photons received from a star in an interval; meteorite impacts over an area; pedestrians crossing at an intersection etc…

The Normal (Gaussian) Distribution The Normal or Gaussian distribution is probably the most well known statistical distribution. A Gaussian with mean zero and standard deviation one is known as the Standard Normal Distribution. Given mean μ and standard deviation σ it has the PDF: Continuous distribution which is the limiting case for a binomial as the number of trials (and successes) is very large. Its pivotal role in statistics is partly due to the Central Limit Theorem (see later).

Examples: Gaussian Distributions Human IQ Distribution

The Power Law Distribution Power law distributions are ubiquitous in science, occurring in diverse phenomena, including city sizes, incomes, word frequencies, and earthquake magnitudes. A power- law implies that small occurrences are extremely common, whereas large instances are extremely rare. This “law” takes a number of forms (can be referred to as Zipf and sometimes Pareto). A simple illustrative power law is: Power Law PDF - Linear Scale Power Law PDF – Log-Log scale k=0.5 K=1.0 K=2.0

Example Power Laws from Nature

Physics Example: Cosmic Ray Spectrum

The Exponential Distribution The exponential distribution is a continuous probability distribution with an exponential falloff controlled by the rate parameter λ: larger values of λ entail a more rapid falloff in the distribution. The exponential distribution is used to model times between independent events which happen at a constant average rate (e.g. lifetimes, waiting times).

The gamma Distribution The gamma distribution is a two-parameter continuous pdf characterized by two parameters usually designated the shape parameter k and the scale parameter θ. When k=1 it coincides with the exponential distribution, and is also closely related to the Poisson and Chi Squared Distributions. Gamma PDF: Where the Gamma function is defined: The Gamma distribution gives a flexible class of PDFs for nonnegative phenomena, often used in modeling waiting times. Conjugate for the Poisson PDF

The Beta Distribution The family of beta probability distributions is defined on the fixed interval [0,1] and parameterized by two positive shape parameters, α and β. In Bayesian statistics it is frequently encountered as a prior for the binomial distribution. Beta PDF: Where the Beta function is defined: The family of Beta distributions allows for a wide variety of shapes over a fixed interval. If likelihood function is a binomial, then a Beta prior will lead to another beta function for the posterior. The role of the Beta function can be thought of as a simple normalization to ensure that the total PDF integrates to 1.0

Central Limit Theorem: Experimental demonstration.....

Central Limit Theorem: A Bayesian demonstration x1x1 dx 1 x2x2 dx 2 ydy X 1  x 1 to dx 1 X 2  x 2 to dx 2 Y  y to dy I  Y is the sum of X 1 and X 2 P(Y |I ) =  dX 1 dX 2 P(Y, X 1, X 2 | I ) P(x 1 |I ) = f 1 (x 1 ) P(x 2 |I ) = f 2 (x 2 ) =  dX 1 dX 2 P(X 1 | I ) P(X 2 | I ) P(Y | X 1, X 2, I ) Using the product rule, and independence of X 1, X 2 P(Y | X 1, X 2, I ) = δ (y – x 1 – x 2 ) B ecause y = x 1 + x 2 Therefore P(Y |I ) =  dX 1 f 1 (x 1 )  dX 2 f 2 (x 2 ) δ (y – x 1 – x 2 ) =  dX 1 f 1 (x 1 ) f 2 (y – x 1 ) Convolution Integral

Central Limit Theorem: Convolution Demonstration