Presentation is loading. Please wait.

Presentation is loading. Please wait.

STATISTICS Sampling and Sampling Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

Similar presentations


Presentation on theme: "STATISTICS Sampling and Sampling Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University."— Presentation transcript:

1 STATISTICS Sampling and Sampling Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University

2 Random sample Let the random variables X 1, X 2, …, X n have a joint density that factors as follows: where is the common density of each X i. Then (X 1, X 2, …, X n ) is defined to be a random sample of size n from a population with density. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

3 If X 1, X 2, …, X n is a random sample of size n from, then X 1, X 2, …, X n are stochastically independent. Histogram -- A frequency (or relative frequency) plot of observed data is called a frequency histogram (or relative frequency histogram). 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

4 Frequency Histogram 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

5 Cumulative frequency 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

6 Relative cumulative frequency 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

7 Statistic A statistic is a function of observable random variables, which is itself an observable random variable and does not contain any unknown parameters. A statistic must be observable because we intend to use it to make inferences about the density functions of the random variables. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

8 For example, if a random variable has a probability density function where and are unknown, then is not a statistic. If a statistic is not observable, then it can not be used to inference the parameters of the density function. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

9 An observation of random sample of size n can be regarded as n independent observations of a random variable. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

10 One of the central problems in statistics is to find suitable statistics to represent parameters of the probability distribution function of a random variable. Sample Statistics Population Parameters 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ. ObservableUnknown

11 Sample moments Let X 1, X 2, …, X n be a random sample from the density. Then the r th sample moment about 0 is defined as 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

12 In particular, if r = 1, we have the sample mean ; that is, Also, the r th sample moment about the sample mean is defined as 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

13 Theorem – Let X 1, X 2, …, X n be a random sample from the density. The expected value of the r th sample moment about 0 is equal to the r th population moment; i.e., Also, 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

14 Special case: r=1 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

15 Sample statistics Let X 1, X 2, …, X n be a random sample from the distribution of a random variable X. Sample mean and sample variance of the distribution are respectively defined to be 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

16 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

17 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

18 Estimating the mean Given a random sample from a probability density function f( ) with unknown mean μ and finite variance σ 2, we want to estimate the mean using the random sample. Using only a finite number of values of X (a random sample of size n), can any reliable inferences be made about E(X), the average of an infinite number of values of X? Will the estimate be more reliable if the size of the random sample is larger? 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

19 R-program demonstration 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

20 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

21 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

22 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

23 Mean of sample means w.r.t. sample size 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

24 Mean of sample standard deviations w.r.t. sample size 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

25 Standard deviation of sample means w.r.t. sample size Y=f(x)=? What is the theoretical basis? 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

26 Histograms of sample mean and standard deviation ns=30 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

27 Histograms of sample mean and standard deviation ns=5000 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

28 Weak Law of Large Numbers (WLLN) Let f( ) be a density with mean μ and variance σ 2, and let be the sample mean of a random sample of size n from f( ). Let ε and δ be any two specified numbers satisfying ε>0 and 0<δ<1. If n is any integer greater than, then 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

29 Recall the theorem 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

30 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

31 (Example) Suppose that some distribution with an unknown mean has its variance equal to 1. How large a random sample must be taken such that the probability will be at least 0.95 that the sample mean will lie within 0.5 of the population mean? 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

32 (Example) How large a random sample must be taken in order that you are 99% certain that is within 0.5σ of μ? 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

33 Raingauge network design Assuming there are already some raingauge stations in a catchment, and we are interested in determining the optimal number of stations that should exist to achieve a desired accuracy in the estimation of mean rainfall. Two approaches – (1) Standard deviation of the sample mean should not exceed a certain portion of the population mean. – (2) 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

34 Criterion 1 Standard deviation of the sample mean should not exceed a certain portion of the population mean. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

35 Criterion 2 From the weak law of large numbers, What assumptions have we made for such approaches of network design ? What are the practical considerations in monitoring network design? Data independence 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

36 The Central Limit Theorem Let f( ) be a density with mean μ and finite variance σ 2. Let be the sample mean of a random sample of size n from f( ). Then approaches the standard normal distribution as n approaches infinity. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

37 The importance of the CLT is the fact that the mean of a random sample from any distribution with finite variance σ 2 and mean μ is approximately distributed as a normal random variable with mean μ and variance. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

38 R-program demonstration - Central Limit Theorem 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

39 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

40 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

41 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

42 n=2 n=10 n=25 n=50 n=100 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

43 Sampling distributions Given random samples of certain probability densities, we often are interested in knowing the probability densities of sampling statistics. – Poisson distribution – Exponential distribution – Normal distribution – Chi-square distribution – Standard normal and chi-square distributions – Student s t-distribution 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

44 Poisson distribution 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

45 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

46 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

47 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

48 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

49 Exponential distribution 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

50 Normal distribution 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

51 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

52 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

53 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

54 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

55 Chi-square distribution 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

56 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

57 Standard normal and chi-square distributions 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

58 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

59 Student s t-distribution Students t distribution with k degrees of freedom 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

60 The "student's" distribution was published in 1908 by W. S. Gosset. Gosset, however, was employed at a brewery that forbade the publication of research by its staff members. To circumvent this restriction, Gosset used the name "Student", and consequently the distribution was named "Student t-distribution. 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

61 Order statistics 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

62 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.

63 1/31/ Laboratory for Remote Sensing Hydrology and Spatial Modeling, Dept of Bioenvironmental Systems Engineering, National Taiwan Univ.


Download ppt "STATISTICS Sampling and Sampling Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University."

Similar presentations


Ads by Google