# Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.

## Presentation on theme: "Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week."— Presentation transcript:

Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week. It is due 2 weeks after your lab (so you’ll have 2 labs due that week, assuming you don’t complete it ahead of time)

Sampling Start with the Population, which is the set of all possible persons, firms, countries, etc. for the particular frame of reference For each research question, define the relevant population: – what is the average income in the United States? – what is the average height in Krakozhia? – who will win the presidential election in November? – what is the average number of volunteer hours per student? – what percent of left-handed people have blue eyes? A sample is the subset of the population selected for analysis – Must be representative of the population to avoid biased estimates U.S. census taken every 10 years, according to the Constitution – First one in 1790 (3.9 million residents; today 312 million) – http://www.archives.gov/exhibits/charters/constitution_transcript.html http://www.ipums.org/ ? http://usa.ipums.org/usa-action/variables/grouphttp://www.archives.gov/exhibits/charters/constitution_transcript.html http://www.ipums.org/http://usa.ipums.org/usa-action/variables/group

Simple Random Sampling Most straightforward way to achieve representativeness is Simple Random Sampling where each person has an equal, and independent, chance of being selected Also called i.i.d. sampling for independent and identically distributed (since drawn from same population) Say we want to know how many magazines a household currently purchases. choose 1000 names from ________? Suppose it is a good idea, now we contact them … if they’re not available, we just scratch them from our list. Or we go to the next name on the list until we find someone who is available. Any problems?

Systematic Sample Partition the population into n groups with k members each (k = N/n) Randomly choose one from the first group of k Take every kth item after that Faster and easier than simple random sample Telephone book, class roster, items from an assembly line, etc. Greater chance of selection bias if there’s a pattern in the population

Stratified Random Sampling Hypothetical research question: What % of students will vote in the election? Only have time & money to survey 100 students. You do so, but get only 2 political science majors in your sample. Problems? Solution: Stratified Random Sampling – If a subgroup, or strata, of the population is particularly relevant to the research question, one may break the population down into strata and take a simple random sample from each strata – Each person can only belong to one strata – Ensures reasonable sample size of the subpopulation of interest or concern – Can stratify on > 1 characteristic -- major and gender

Cluster Sampling Hypothetical survey of rural families spread over a wide area Hypothetical survey of homeless individuals in a large city Problems? – Accurate list of population members – In-person interviews too costly – Mail surveys might lead to really high non-response Solution: Cluster Sampling – Divide the population into geographically small units, or clusters – For example, political wards or residential blocks for a city – Then take a simple random sample of clusters – Each person or household in a chosen cluster is then contacted, that is, a complete census of chosen clusters, or sometimes a simple random sample of units within chosen clusters

Cluster Sampling

Sources of Error from a Survey Sampling Errors – come from having info on only a subset of population – statistical theory is used to quantify Non-sampling Errors – can occur even with a complete census of the population – possible sources: Population sampled is not relevant one or list is incomplete (coverage error, sample selection bias) Measurement error – Inaccurate or dishonest answers – Halo effect – Poor wording of questions Non-response (to whole survey or some questions) – try to minimize at outset & check up on some answers

Sample Statistics Population parameterSample statistic

Sample Statistics Denote an i.i.d. sample by X 1, X 2, X 3,...,X n What exactly is an X i ? Actual outcomes are x 1, x 2, x 3,..., x n How many samples could we take? How many samples do we actually take? A sample statistic is formed by taking some function of the random variables X 1, X 2, X 3,...,X n, A = f(X 1, X 2, X 3,...,X n ) Examples The point estimate of the population parameter is a single number rather than a range

Sampling Distribution Sample statistic A is a random variable! Why? Thus, a sample statistic has a probability distribution, known as a sampling distribution Example: Let S = {0,1,2,3,4,5,6} Graph the sampling distribution of for n = 2

Mean & Variance are set of random variables from an i.i.d. sample of size n, what are the mean and variance of sample average? Things we would like to know … 1. 2. 3. What does the p.d.f. of look like? What does the pdf of look like?

Central Limit Theorem Rough statement of CLT: “Sample means are eventually, approximately normally distributed.” Formal statement of CLT: Let X 1, X 2, X 3,...,X n, where X i is a random variable denoting the outcome of the i th observation, be an i.i.d. sample from ANY population distribution with mean and variance then as n becomes large Graphically (page 236 in BLK, 10 th edition, has a nice visual)

CLT!

Point and Interval Estimates A point estimate is a single number, a confidence interval provides additional information about variability Point Estimate Lower Confidence Limit Upper Confidence Limit Width of confidence interval

We can estimate a Population Parameter … Point Estimates with a Sample Statistic (a Point Estimate) Mean Proportion p π X μ

Confidence Level, (1-  ) Suppose confidence level = 95% Also written (1 -  ) = 0.95 A relative frequency interpretation: –In repeated samples, 95% of all the confidence intervals that can be constructed are expected to contain the unknown true parameter A specific interval either will contain or will not contain the true parameter –No probability involved in a specific interval

Confidence Intervals Population Mean σ Unknown Confidence Intervals Population Proportion σ Known

Confidence Intervals for  First, assume  2 is known & X ~ N, so Things are different when these are not true. Random sample of n observations We will use to make inferences about 

Confidence Interval for μ ( σ Known) Assumptions –Population standard deviation σ is known –Population is normally distributed Confidence interval estimate:

Finding the Critical Value, Z Consider a 95% confidence interval: Z= -1.96Z= 1.96 Point Estimate Lower Confidence Limit Upper Confidence Limit Z units: X units: Point Estimate 0

Download ppt "Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week."

Similar presentations