Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATISTICS Univariate Distributions

Similar presentations


Presentation on theme: "STATISTICS Univariate Distributions"— Presentation transcript:

1 STATISTICS Univariate Distributions
Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2 Probability density functions of discrete random variables
Discrete uniform distribution Bernoulli distribution Binomial distribution Negative binomial distribution Geometric distribution Hypergeometric distribution Poisson distribution 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

3 Discrete uniform distribution
N ranges over the possible integers. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

4 Bernoulli distribution
1-p is often denoted by q. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

5 Binomial distribution
Binomial distribution represents the probability of having exactly x success in n independent and identical Bernoulli trials. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

6 Negative binomial distribution
Negative binomial distribution represents the probability of achieving the r-th success in x independent and identical Bernoulli trials. Unlike the binomial distribution for which the number of trials is fixed, the number of successes is fixed and the number of trials varies from experiment to experiment. The negative binomial random variable represents the number of trials needed to achieve the r-th success. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

7 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

8 Geometric distribution
Geometric distribution represents the probability of obtaining the first success in x independent and identical Bernoulli trials. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

9 Hypergeometric distribution
where M is a positive integer, K is a nonnegative integer that is at most M, and n is a positive integer that is at most M. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

10 Let X denote the number of defective products in a sample of size n when sampling without replacement from a box containing M products, K of which are defective. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

11 Poisson distribution The Poisson distribution provides a realistic model for many random phenomena for which the number of occurrences within a given scope (time, length, area, volume) is of interest. For example, the number of fatal traffic accidents per day in Taipei, the number of meteorites that collide with a satellite during a single orbit, the number of defects per unit of some material, the number of flaws per unit length of some wire, etc. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

12 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

13 Assume that we are observing the occurrence of certain happening in time, space, region or length. Also assume that there exists a positive quantity which satisfies the following properties: 1. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

14 2. 3. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

15 The probability of success (occurrence) in each trial.
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

16 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

17 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

18 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

19 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

20 Comparison of Poisson and Binomial distributions
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

21 Example Suppose that the average number of telephone calls arriving at the switchboard of a company is 30 calls per hour. (1) What is the probability that no calls will arrive in a 3-minute period? (2) What is the probability that more than five calls will arrive in a 5-minute interval? Assume that the number of calls arriving during any time period has a Poisson distribution. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

22 Assuming time is measured in minutes
Poisson distribution is NOT an appropriate choice. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

23 Assuming time is measured in seconds
Poisson distribution is an appropriate choice. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

24 represents the time length measured in scale of .
The first property provides the basis for transferring the mean rate of occurrence between different observation scales. The “small time interval of length h” can be measured in different observation scales. represents the time length measured in scale of . is the mean rate of occurrence when observation scale is used. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

25 If the first property holds for various observation scales, say , then it implies the probability of exactly one happening in a small time interval h can be approximated by The probability of more than one happenings in time interval h is negligible. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

26 probability that more than five calls will arrive in a 5-minute interval
Occurrences of events which can be characterized by the Poisson distribution is known as the Poisson process. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

27 Probability density functions of continuous random variables
Uniform or rectangular distribution Normal distribution (also known as the Gaussian distribution) Exponential distribution (or negative exponential distribution) Gamma distribution (Pearson Type III) Chi-squared distribution Lognormal distribution 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

28 Uniform or rectangular distribution
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

29 PDF of U(a,b) 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

30 Normal distribution (Gaussian distribution)
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

31 Z 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

32 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

33 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

34 X~N(μ1, σ1) Z~N(0,1) Y~N(μ2, σ2) 3/27/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

35 Commonly used values of normal distributions
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

36 Exponential distribution (negative exponential distribution)
Mean rate of occurrence in a Poisson process. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

37 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

38 Gamma distribution represents the mean rate of occurrence in a Poisson process. is equivalent to  in the exponential density. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

39 The exponential distribution is a special case of gamma distribution with
The sum of n independent identically distributed exponential random variables with parameter has a gamma distribution with parameters 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

40 Pearson Type III distribution (PT3)
, and are the mean, standard deviation and skewness coefficient of X, respectively. It reduces to Gamma distribution if = 0. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

41 The Pearson type III distribution is widely applied in stochastic hydrology.
Total rainfall depths of storm events can be characterized by the Pearson type III distribution. Annual maximum rainfall depths are also often characterized by the Pearson type III or log-Pearson type III distribution. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

42 Chi-squared distribution
The chi-squared distribution is a special case of the gamma distribution with 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

43 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

44 Log-Normal Distribution Log-Pearson Type III Distribution (LPT3)
A random variable X is said to have a log-normal distribution if Log(X) is distributed with a normal density. A random variable X is said to have a Log-Pearson type III distribution if Log(X) has a Pearson type III distribution. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

45 Lognormal distribution
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

46 Approximations between random variables
Approximation of binomial distribution by Poisson distribution Approximation of binomial distribution by normal distribution Approximation of Poisson distribution by normal distribution 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

47 Approximation of binomial distribution by Poisson distribution
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

48 Approximation of binomial distribution by normal distribution
Let X have a binomial distribution with parameters n and p. If , then for fixed a<b, is the cumulative distribution function of the standard normal distribution. It is equivalent to say that as n approaches infinity X can be approximated by a normal distribution with mean np and variance npq. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

49 Approximation of Poisson distribution by normal distribution
Let X have a Poisson distribution with parameter . If , then for fixed a<b It is equivalent to say that as  approaches infinity X can be approximated by a normal distribution with mean  and variance . 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

50 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

51 Example Suppose that two fair dice are tossed 600 times. Let X denote the number of times that a total of 7 dots occurs. What is the probability that ? 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

52 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

53 Transformation of random variables
[Theorem] Let X be a continuous RV with density fx. Let Y=g(X), where g is strictly monotonic and differentiable. The density for Y, denoted by fY, is given by 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

54 Proof: Assume that Y=g(X) is a strictly monotonic increasing function of X.
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

55 Example Let X be a gamma random variable with
Y is also a gamma random variable with scale parameter 1/ and shape parameter . 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

56 Definition of the location parameter
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

57 Example of location parameter
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

58 Definition of the scale parameter
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

59 Example of scale parameter
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

60 Simulation Given a random variable X with CDF FX(x), there are situations that we want to obtain a set of n random numbers (i.e., a random sample of size n) from FX(.) . The advances in computer technology have made it possible to generate such random numbers using computers. The work of this nature is termed “simulation”, or more precisely “stochastic simulation”. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

61 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

62 Pseudo-random number generation
Pseudorandom number generation (PRNG) is the technique of generating a sequence of numbers that appears to be a random sample of random variables uniformly distributed over (0,1). 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

63 A commonly applied approach of PRNG starts with an initial seed and the following recursive algorithm (Ross, 2002) modulo m where a and m are given positive integers, and the above equation means that is divided by m and the remainder is taken as the value of 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

64 The quantity is then taken as an approximation to the value of a uniform (0,1) random variable.
Such algorithm will deterministically generate a sequence of values and repeat itself again and again. Consequently, the constants a and m should be chosen to satisfy the following criteria: For any initial seed, the resultant sequence has the “appearance” of being a sequence of independent uniform (0,1) random variables. For any initial seed, the number of random variables that can be generated before repetition begins is large. The values can be computed efficiently on a digital computer. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

65 A guideline for selection of a and m is that m be chosen to be a large prime number that can be fitted to the computer word size. For a 32-bit word computer, m = and a = result in desired properties (Ross, 2002). 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

66 Simulating a continuous random variable
probability integral transformation 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

67 The cumulative distribution function of a continuous random variable is a monotonic increasing function. 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

68 Example Generate a random sample of random variable V which has a uniform density over (0, 1). Convert to using the above V-to-X transformation. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

69 Random number generation in R
R commands for stochastic simulation (for normal distribution pnorm – cumulative probability qnorm – quantile function rnorm – generating a random sample of a specific sample size dnorm – probability density function For other distributions, simply change the distribution names. For examples, (punif, qunif, runif, and dunif) for uniform distribution and (ppois, qpois, rpois, and dpois) for Poisson distribution. 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

70 Generating random numbers of discrete distribution in R
Discrete uniform distribution R does not provide default functions for random number generation for the discrete uniform distribution. However, the following functions can be used for discrete uniform distribution between 1 and k. rdu<-function(n,k) sample(1:k,n,replace=T) # random number ddu<-function(x,k) ifelse(x>=1 & x<=k & round(x)==x,1/k,0) # density pdu<-function(x,k) ifelse(x<1,0,ifelse(x<=k,floor(x)/k,1))  # CDF qdu <- function(p, k) ifelse(p <= 0 | p > 1, return("undefined"), ceiling(p*k)) # quantile 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

71 Similar, yet more flexible, functions are defined as follows
dunifdisc<-function(x, min=0, max=1) ifelse(x>=min & x<=max & round(x)==x, 1/(max-min+1), 0) >dunifdisc(23,21,40) >dunifdisc(c(0,1)) punifdisc<-function(q, min=0, max=1) ifelse(q<min, 0, ifelse(q>max, 1, floor(q-min+1)/(max-min+1))) >punifdisc(0.2) >punifdisc(5,2,19) qunifdisc<-function(p, min=0, max=1) floor(p*(max-min+1))+min >qunifdisc( ,2,19) >qunifdisc(0.2) runifdisc<-function(n, min=0, max=1) sample(min:max, n, replace=T) >runifdisc(30,2,19) >runifdisc(30) 3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

72 Binomial distribution
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

73 Negative binomial distribution
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

74 Geometric distribution
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

75 Hypergeometric distribution
3/27/2017 Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

76 Poisson distribution 3/27/2017
Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

77 An example of stochastic simulation
The travel time from your home (or dormitory) to NTU campus may involve a few factors: Walking to bus stop (stop for traffic lights, crowdedness on the streets, etc.) Transportation by bus Stop by 7-11 or Starbucks for breakfast (long queue) Walking to campus 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU

78 All Xi’s are independently distributed.
Gamma distribution with mean 30 minutes and standard deviation 10 minutes. Exponential distribution with a mean of 20 minutes. All Xi’s are independently distributed. If you leave home at 8:00 a.m. for a class session of 9:10, what is the probability of being late for the class? 3/27/2017 Lab for Remote Sensing Hydrology and Spatial Modeling Dept of Bioenvironmental Systems Engineering, NTU


Download ppt "STATISTICS Univariate Distributions"

Similar presentations


Ads by Google