Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive.

Similar presentations


Presentation on theme: "Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive."— Presentation transcript:

1

2 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive

3 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-2 Outline Capture-Recapture experiment Estimator Simulations What is Statistics? Sampling How to estimate unemployment rate?

4 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-3 Capture/Re-capture Goal: 1.Illustrate that how to estimate the population size when the cost of counting all individuals is prohibitive. 2.Illustrate how easy and intuitive statistics could be. Statistics need not be completely deep, murky, and mysterious. Our common sense can help us to negotiate our way through the course.

5 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-4 Counting the stones We are interested in knowing the number of black stones in the box. We only need to do to obtain a reasonable estimate of stones in the box – allowing for errors of counting or estimation.

6 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-5 Two examples Example #1: The box contains only a small number of stones. Example #2: The box contains a lot of stones that will take days to count.

7 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-6 History and examples of capture / recapture method Capture-recapture methods were originally developed in the wildlife biology to monitor the census of bird, fish, and insect populations (counting all individuals is prohibitive). Recently, these methods have been utilized considerably in the areas of disease and event monitoring. http://www.pitt.edu/~yuc2/cr/history.htm

8 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-7 The fish example Estimating the number of fish in a lake or pond. C fish is caught, tagged, and returned to the lake. Later on, R fish are caught and checked for tags. Say T of them have tags. The numbers C, R, and T are used to estimate the fish population.

9 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-8 Stones in a box The objective is to estimate the number of fish (represented by black stones) in a box. Capture one handful of fish (black stones). Count them and call it C. Mark the fish by replacing the black stones with red stones. Put them back into the box. Capture another handful of fish (stones). Count the total number of fish or stones (R) and the number of marked fish or white stones (T). Based on this information, How to obtain a reasonable estimate of the number of fish or stones in the box?

10 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-9 Stones in a box We know that C/N ≈ T/R Hence, a simple estimate is CR/T C= the number of fish or stones captured in the first round. R= the total number of fish or stones captured in the second round. T= the number of marked fish or white stones captured in the second round.

11 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-10 Simulations to see the properties of this proposed estimator How good is the proposed estimator? To see the properties of this proposed estimator, I have use MATLAB to simulation our Capture-recapture experiment with different numbers of capture (C) and different numbers of recapture (R), relative to the total number of fish in the pond. Throughout, N=500 and 1000 simulations

12 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-11 Definition: Estimator Estimator is a formula or a rule that takes a set of data and returns an estimate of the population quantity (also known as population parameter) we are interested in. θ(x 1,x 2,...,x n ) Example: An estimator for the population mean If we are interested in the population mean, a very intuitive estimator of the population mean based on a sample (x 1,x 2,...,x n ) is θ(x 1,x 2,...,x n )= (x 1 +x 2 +...+x n )/n

13 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-12 Simulating the properties of a sample mean estimator If we were to study the properties of the following two estimators for the population mean: θ(x 1,x 2,...,x n )= (x 1 +x 2 +...+x n )/n versus θ(x 1,x 2,...,x n )= (x 1 +x 2 +...+x n +1)/n With some basic computing skills, we may perform Monte Carlo simulations to compare their properties.

14 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-13 Simulating the properties of a sample mean estimator 1.We will need to define a population. Suppose the population consists of 10 balls numbered from 1 to 10 in a bag. We know that the population mean is (1+2+3+4+5+6+7+8+9+10)/10 = 5.5. 2.We will need to define the sampling process. Suppose we draw a sample of size 5 with replacement. For the sample, compute the two sample mean estimates of the population mean. 3.We will need to decide on the number of repetitions. Suppose we will repeat the process for 10,000 times. 4.After repeating the sampling process 10,000 times, we will have 10,000 sample means for each of the estimator, each of them are estimate of the population mean based on the respective samples.

15 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-14 Simulating the properties of a sample mean estimator The above simulation is performed using MATLAB. The means of the 10,000 sample means of the two estimators are 5.4990 and 5.6990.

16 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-15 Which estimator is more desirable?

17 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-16 Simulation design – via MATLAB Individual simulation experiment: Create 500 “black” fish, labelled 1 to 500. Capture a random sample of C fish, mark them by converting their label to zero (i.e., red fish). Capture another random sample of R fish. Count the number of marked fish in the sample. Call it T. Compute the estimate as CR/T. If T=0, we are in trouble. Such experiments with T=0 are dropped. Repeat this experiment 1000 times. Hence, we have 1000 estimates. Compute the mean and standard deviation of these 1000 estimates.

18 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-17 Properties of our estimator Increasing C and R NCRSMeanStd 50040 971640.76401.57 50060 1000579.22321.54 50080 1000533.61154.67 500100 1000522.85104.29 500120 1000513.8277.41 500140 1000507.0460.98 500250 1000500.6422.93 500 1000500.000.00 N = Total number of fish in the pond. C = number of captured fish. R = number of re-captured fish. S = number of simulation with at least one marked fish in recapture.

19 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-18 Properties of our estimator Constant C and increasing R NCRSMeanStd 500120401000507.8675.07 500120601000513.4079.55 500120801000508.1973.56 5001201001000511.2474.55 500120 1000510.9375.41 5001201401000511.2175.63 5001202501000510.4974.04 5001205001000507.4777.32 N = Total number of fish in the pond. C = number of captured fish. R = number of re-captured fish. S = number of simulation with at least one marked fish in recapture.

20 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-19 Properties of our estimator Increasing C and constant R NCRSMeanStd 50040120961646.59405.72 500601201000582.17327.97 500801201000533.28142.23 5001001201000512.2895.40 500120 1000508.7878.75 5001401201000507.5060.61 5002501201000500.8622.38 500 1201000500.000.00 N = Total number of fish in the pond. C = number of captured fish. R = number of re-captured fish. S = number of simulation with at least one marked fish in recapture.

21 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-20 Conclusion from the simulations The proposed estimator generally overestimate the number of fish in pond, i.e., estimate is larger than the true number of fish in pond. That is, there is a bias. Holding R constant, increasing the number of capture (C) helps: Bias is reduced, i.e., Mean is closer to the true population The estimator is more precise, i.e., standard deviation of the estimator is smaller. Holding C constant, increasing the number of recapture (R) does not help: Bias is more or less unchanged. The precision of the estimator is more or less unchanged.

22 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-21 Additional issues Our proposed estimator is good enough but it can be better. Alternative estimators have been developed to reduce or eliminate the bias of estimating N. For instance, Seber (1982, p.60) suggests an estimator of N (C+1)(R+1)/(T+1) – 1 (Note that our proposed formula is CR/T.) Seber, G. (1982): The Estimation of Animal Abundance and Related Parameters, second edition, Charles.

23 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-22 Simulations to see the properties of this modified estimator How good is the modified estimator? To see the properties of this modified estimator, we repeat the above simulation exercise with this new formula. (C+1)(R+1)/(T+1) – 1

24 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-23 Properties of modified estimator Increasing C and R NCRSMeanStd 50040 1000488.60271.05 50060 1000504.39202.16 50080 1000498.88121.47 500100 1000501.7291.20 500120 1000498.1072.01 500140 1000501.1458.44 500250 1000498.6021.72 500 1000500.000.00 N = Total number of fish in the pond. C = number of captured fish. R = number of re-captured fish. S = number of simulation with non-zero marked fish in recapture.

25 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-24 Properties of modified estimator Constant C and increasing R NCRSMeanStd 500120401000498.5567.38 500120601000500.0571.54 500120801000495.5869.22 5001201001000497.0171.14 500120 1000498.4571.05 5001201401000495.1767.46 5001202501000500.4175.29 5001205001000496.7374.27 N = Total number of fish in the pond. C = number of captured fish. R = number of re-captured fish. S = number of simulation with non-zero marked fish in recapture.

26 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-25 Properties of modified estimator Increasing C and constant R NCRSMeanStd 500401201000491.84291.00 500601201000499.33216.81 500801201000496.51117.05 5001001201000493.5087.53 500120 1000503.2473.65 5001401201000498.5956.30 5002501201000499.7622.58 500 1201000500.000.00 N = Total number of fish in the pond. C = number of captured fish. R = number of re-captured fish. S = number of simulation with non-zero marked fish in recapture.

27 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-26 Conclusion from the simulations The modified estimator performs better than the original estimator. There is no apparent bias. The estimator is more precise. Holding R constant, increasing the number of capture (C) helps: The estimator is more precise, i.e., standard deviation of the estimator is smaller. Holding C constant, increasing the number of recapture (R) does not help: The precision of the estimator is more or less unchanged.

28 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-27 What is Meant by Statistics? Statistics is the science of 1.collecting, 2.organizing, 3.presenting, 4.analyzing, and 5.interpreting numerical data to assist in making more effective decisions.

29 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-28 Who Uses Statistics? Statistical techniques are used extensively by Economists, marketing, accounting, quality control, consumers, professional sports people, hospital administrators, educators, politicians, physicians, etc...

30 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-29 Who Uses Statistics? As economists, We must verifying our models with data. We need to provide forecast of the economy (GDP growth). We need quantitative estimates of How individual decisions are influenced by policy variables (such as unemployment benefits, education subsidy) in order to forecast the impact of public policies. How macro policies (government expenditure) will affect output.

31 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-30 Who Uses Statistics? In the business community, managers must make decisions based on what will happen to such things as demand, costs, and profits. These decisions are an effort to shape the future of the organization. If the managers make no effort to look at the past and extrapolate into the future, the likelihood of achieving success is slim.

32 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-31 Why do we need to understand Statistics? We are constantly deluged with statistics in the media (newspapers, magazines, journals, text books, etc.). We need to have a means to condense large quantities of information into a few facts or figures. We need to predict what will likely occur given what has occurred in the past. We need to generalize what we have learned in specific situations to the more general case.

33 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-32 We are users of statistics We do not want to become professors of statistics. We do not want to develop advanced statistics theory. We are users of statistics To be effective users, we need to have a good grip of basic statistics theory. We need to practice using the tools. This course will give you the basic, enough for you to move on to your next Econometrics class.

34 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-33 Populations and Samples A population is a collection of all possible individuals, objects, or measurements of interest. A sample is a portion, or part, of the population of interest.

35 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-34 Populations and Samples Population Sample

36 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-35 Sampling a Population of Existing Units Random Sampling A procedure for selecting a subset of the population units in such a way that every unit in the population has an equal chance of selection Sampling with replacement When a unit is selected as part of the sample, its value is recorded and placed back into the population for possible reselection Sampling without replacement Units are not placed back into the population after selection

37 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-36 Approximate Random Samples Frame A list of all population units. Required for random sampling, but not for approximate random sampling methods like systematic and voluntary response sampling. Systematic Sample Every k-th element of the population is selected for the sample Voluntary Response Sample Sample units are self-selected (as in radio/TV surveys)

38 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-37 How to estimate the unemployment rate First, survey a large number of individuals (say, 1000) Are you 15 and over? If not, you are definitely not in the labor force. If you are 15 and over, Have you work for pay or profit during the seven days before enumeration or have a formal job attachment? If yes, you are counted as employed. If not employed, Have you been available for work during the seven days before enumeration? And Have you sought work during the 30 days before enumeration? If yes to both questions, you are counted as unemployed.

39 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-38 How to estimate the unemployment rate The unemployment rate is computed as #unemployed/ (#unemployed + #employed) Note that the estimate of the unemployment rate is based on a random subset (which we call a sample) of the individuals of an economy -- not all individuals in an economy.

40 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-39 Simulation: An estimation of the unemployment rate A process of estimating unemployment rate may be simulated at home or in a classroom with a bag of black and white stones (as in a game of GO). Suppose black stones stand for unemployed and white stones stand for employed individuals. A random selection of 20 individuals is like randomly grabbing 20 stones from the bag. We ask each selected individuals whether they are white (employed) or black (unemployed). The unemployment rate may be computed using the formula

41 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-40 What to take away today Statistics could be easy and intuitive. Statistics need not be completely deep, murky, and mysterious. Our common sense can help us to negotiate our way through the course.

42 Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-41 - END - Lesson 1: Analysis of Economic Data is difficult but intuitive


Download ppt "Ka-fu Wong © 2004 ECON1003: Analysis of Economic Data Lesson1-1 Lesson 1: Analysis of Economic Data is difficult but intuitive."

Similar presentations


Ads by Google