Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions
Normal (Gaussian) distribution Normal density function What does the figure tell us about the values of the CDF?
More on the normal distribution P = normcdf(X,MU,SIGMA) returns the cdf of the normal distribution with mean MU and standard deviation SIGMA, evaluated at the values in X. The size of P is the common size of X, MU and SIGMA. normcdf(1)=0.8413. 1-normcdf(6)= 9.8659e-010 If X is normally distributed, Y=aX+b is also normally distributed. What would be the mean and standard deviation of Y? Notation
Estimating mean and standard deviation Given a sample from a normally distributed variable, the sample mean is the best linear unbiased estimator of the true mean. For the variance the equation gives the best unbiased estimator, but the square root is not an unbiased estimate of the standard deviation x=randn(5,10000); s=std(x); mean(s) 0.9463 s2=s.^2; mean(s2) 1.0106
Lognormal distribution If ln(X) has normal distribution X has lognormal distribution. That is, if X is normally distributed exp(X) is lognormally distributed. Notation: Probability distribution function (PDF) Mean and variance
Mean, mode and median Mode (highest point) Median (50% of samples)
Light and heavy tails Normal distribution has light tail. Six sigma is equivalent to.999999999 (nine nines) safety. Lognormal is heavy tailed 0.9963 m=exp(0.5) m =1.6487 v=exp(1)*(exp(1)-1) v =4.6708 sig=sqrt(v) sig =2.1612 sig6=m+6*sig sig6 =14.6159 logncdf(sig6,0,1) =0.9963
Fitting distribution to data Typically fit to CDF.
Empirical CDF [F,X] = ecdf(Y) calculates the Kaplan-Meier estimate of the cumulative distribution function (cdf), also known as the empirical cdf. Y is a vector of data values. F is a vector of values of the empirical cdf evaluated at X. [F,X,FLO,FUP] = ecdf(Y) also returns lower and upper confidence bounds for the cdf. These bounds are calculated using Greenwood's formula, and are not simultaneous confidence bounds. ecdf(...) without output arguments produces a plot of the empirical cdf. Use the data cursor to read precise values from the plot.
Example x=lognrnd(0,1,1,20); ecdf(x) hold on x=lognrnd(0,1,1,10000); ecdf(x)
Extreme value distributions No matter what distribution you sample from, the mean of the sample tends to be normally distributed as sample size increases (what mean and standard deviation?) Similarly, distributions of the minimum (or maximum) of samples belong to other distributions. Even though there are infinite number of distributions, there are only three extreme value distribution. – Type I (Gumbel) derived from normal. – Type II (Frechet) e.g. maximum daily rainfall – Type III (Weibull) weakest link failure
Example x=5-0.3*randn(10,1000); minx=min(x); hist(minx); ecdf(minx)
Gumbel distribution PDF and CDF Mean, median, mode and variance
Weibull distribution Probability distribution Used to describe distribution Of strength or fatigue life in brittle materials (weakest link connection) If it describes time to failure, then k<1 indicates that failure rate decreases with time, k=1 indicates constant rate, k>1 indicates increasing rate. Useful for other phenomena like wind speed distribution. Can add 3 rd parameter by replacing x by x-c.