Data Analysis Class 4: Probability distributions and densities.

Presentation on theme: "Data Analysis Class 4: Probability distributions and densities."— Presentation transcript:

Data Analysis Class 4: Probability distributions and densities

Random variables Binary (e.g. heads = 1, tails = 0; plane crashes = 1, does not crash = 0) Discrete (e.g. number of heads in a series of coin tosses; number of plane crashes in a given time span) Continuous (e.g. time to next plane crash; height of a person) Vectorial …

Probability distribution For binary/discrete random variables X: P(X=x)=? Specification of this for all x is a probability distribution Condition:

Probability distribution For example, for a coin flip: P(X=1)=0.4, and P(X=0)=0.6 Note: P(X=1)+P(X=0)=1

Probability density function For continuous random variables X: P(X=x), defined in the same way, is probably 0 (there are an infinite number of other possible outcomes) So well define P(X=x) in a different way, such that: P(X=x)dx is the probability of the event X=x Note: this probability is indeed infinitesimally small Condition:

Probability density function E.g. height of random people Gaussian distribution: A clock centred around mu and with width proportional to sigma Note:

Some probability distributions Bernoulli distribution (binary) Geometric distribution (discrete) Binomial distribution (discrete) Gaussian distribution (continuous) Exponential distribution (continuous) Poisson distribution (discrete)

Bernoulli distribution X is a binary random variable (success/1 versus failure/0) For example: –Biased coin –Whether a given plane crashes

Geometric distribution X is the number of Bernoulli experiments to the first success, where the success probability is p Note: as required

Binomial distribution X is the number of successes (with success probability p) in n Bernoulli experiments Again: sums to 1…

Gaussian distribution E.g. height of random people Gaussian density function: A clock centred around mu and with width proportional to sigma Note:

Exponential density function Time to the first future plane crash (i.e. X>0) Assume non-zero interval Δx Probability of a crash in Δx is p=λΔx for some λ Then, probability that the first crash is at time x is (geometric distribution) Only valid for small enough Δx (then probability of >1 crashes in Δx becomes negligible) Limit for Δx 0:

Exponential probability density Thus (with P the probability density function): Exponentially decaying… From this, the cumulative exponential distribution function: Note:

Poisson distribution Distribution over the number of plane crashes in a unit time interval Limit of the Binomial distribution: –Binomial: n trials, probability p per trial –Poisson: n/Δx trials, probability pΔx per trial, in the limit for Δx 0 (work it out!) Result:

Summary TypeRandom variable X Distribution / density functionParameters BernoulliBinary GeometricPositive integer Binomial0,1,…,n GaussianReal number ExponentialPositive real PoissonPositive integer

Properties of distributions Mean Variance Standard deviation = square root of variance

Summary TypeDistribution / density functionMeanVariance Bernoulli Geometric Binomial Gaussian Exponential Poisson

Lab session Compute conditional probability density and expectation for the exponential density, conditional on X>t. (report) Complete the tables in these lecture notes with a uniform distribution and uniform density. (report) Compute the cumulative distributions of all distributions discussed (or find on wikipedia!) Plot the exponential distribution for 3 different values of lambda, as well as the cumulative exponential distribution. (report) Plot the Poisson distribution for 3 different values of lambda, as well as the cumulative Poisson distribution. Randomly sample n=10 passengers, do this N=1000 times, and plot a histogram of how many of these 10 passengers are in third class in each of the 1000 randomisations. Which distribution does this follow? (report) Make a histogram of the temperatures in all January months. Which distribution does this follow? (report)