Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAMPLING: Process of Selecting your Observations

Similar presentations


Presentation on theme: "SAMPLING: Process of Selecting your Observations"— Presentation transcript:

1 SAMPLING: Process of Selecting your Observations
(Masoud Hemmasi, Ph.D.)

2 SAMPLING: Process of Selecting your Observations
QUESTION: During presidential election campaigns, in a typical poll, of the potentially 100 million potential voters, how many would you say are contacted? History and Evolution of Political Polling

3 Types of Probability Sampling:
SAMPLING: Process of Selecting your Observations Types of Probability Sampling: Simple (Unrestricted) Random Sampling Complex (Restricted) Probability Sampling: Some times offer more efficient alternatives to Simple Random Sampling b. Stratified Random Sampling c. Cluster Sampling a. Systematic Sampling d. Convenience Sampling e. Double Sampling

4 Types of Probability Sampling:
Simple Random (or Unrestricted) Sampling A sampling procedure in which every element in the population has a known and equal chance of being selected as a subject (e.g., drawing names out of a hat). Advantage: has the least bias and offers the most generalizability. Disadvantage: At times, can be inefficient/expensive.

5 Systematic Sampling If a sample size of n is desired from a population containing N elements, we might sample one element for every n/N elements in the population. First, we randomly select one of the first n/N elements from the population list. We then select every n/Nth element that follows in the population list. This method has the properties of a simple random sample, especially if the list of the population elements is a random ordering.

6 Systematic Sampling Advantage: The sample usually will be easier to identify than it would be if simple random sampling were used. Example: Selecting every 100th listing in a telephone book after the first randomly selected listing

7 Stratified Random Sampling
The population is first divided into groups called strata with respect to salient/relevant characteristics (e.g., gender, age, race, department, location, industry, etc.) Each element in the population belongs to one and only one stratum. Best results are obtained when the elements within each stratum are as much alike as possible (i.e. a homogeneous group). A simple random sample is taken from each stratum. Advantage: If strata are homogeneous, this method is as “precise” as simple random sampling but with a smaller total sample size.

8 Cluster Sampling The population is first divided into separate groups called clusters. Ideally, each cluster would be a small-scale version (representative) of the population. A simple random sample of the clusters is then taken. All elements within each selected cluster will make up the final sample. Example: A primary application is area sampling, where clusters are city blocks or other well-defined areas (neighborhoods, precincts, school districts, etc.).

9 Cluster Sampling Advantage: The close proximity of elements can be cost and time effective (i.e. many sample observations can be obtained in a short time). Disadvantage: This method generally requires a larger total sample size than simple or stratified random sampling.

10 Convenience Sampling It is a nonprobability sampling technique. Items are included in the sample without known probabilities of being selected. The sample is identified primarily by convenience. Example: A professor conducting research might use student volunteers to constitute a sample. Advantage: Sample selection and data collection are relatively easy. Disadvantage: It is impossible to determine how representative of the population the sample is.

11 Sample Size Determination
Sampling Process of Selecting your observations Sample Size Determination

12 Standard Deviation—What does it measure?
SAMPLING: Process of Selecting your Observations Standard Deviation—What does it measure? Variations/differences in scores among members of a group with respect to a given characteristic (e.g., test scores for a class, income). Standard deviation represents the average distance of a group of numbers from their mean. How do we calculate it? Hint: You can think of it as the average deviation from the norm/typical. For a Population: For a Sample: Sx

13 Income level for particular a class like this:
Xs = Incomes of students in an MBA Class $6,000 $15,000 $16,000 $39,000 $38,000 $50,000 $70,000 ΣX = $240,000 Average = x = $240,000 / 8 = $30,000 Grad Assistants Part-Time Employed Part-Time Employed

14 X X - x (X - x )2 6,000 -24,000 576,000,000 15,000 -15,000 225,000,000 16,000 -14,000 196,000,000 39,000 9,000 81,000,000 38,000 8,000 64,000,000 50,000 20,000 400,000,000 70,000 40,000 1,600,000,000 Sum ( å x ) 240,000 3,718,000,000 Average = x 30,000 Variance = 2 3,718,000,000 / 8 = 464,750,000 Std. Dev. =  $21,558.06

15 SAMPLING: Process of Selecting Your Observations
Freq Suppose frequency distribution of life of light bulbs is normal. ……… …………….. ……………………. …………………………….. x = life of light bulbs—e.g., bulbs lasted 108 hrs each . ………… ……………….. ………………………. ……………………………………….. What can we say about the expected life of a randomly selected bulb (xi) = ? xi X= Hours x = 100 hrs x = 5 hrs Life of a randomly drawn light bulb: – 5 Z  x  Z Z = 1 for 68% confidence, Z = 1.96 for 95% confidence, Z = 3for 99% confidence Formula: X = x + Z x (Where Z is an index that reflects the level of confidence/certainty with which we wish to estimate x.)

16 Income Distribution for a hypothetical population
$1 $0 $2 $3 $4 $5 $6 $7 $8 $9 True Population Mean = μ = Σxi / n = 45 / 10 = $4.5 Population Standard Deviation: Income of a randomly drawn person (Xi) = ? = 2.87

17 SAMPLING: Process of Selecting your Observations
This formula: X = x + Z x is ONLY applicable when the population distribution is NORMAL What is the Distribution of our hypothetical population?

18 Distribution of the Hypothetical Population
10 9 8 7 6 5 4 3 2 1 $0 $1 $ $ $ $ $6 $ $8 $9 Uniform Distribution * * * * * * * * * * x

19 SAMPLING: Process of Selecting your Observations
X = x + Z x NOTE that X is the X of a sample of size n = 1 What is the generic formula for mean (X) of samples of any size (any n)? That is, what if instead of a single observation/case (X), we draw a random sample of a particular size from the population? Can we say something about the mean of that sample--X? If (and only if) we know that our sample mean ( x ) comes from a normally distributed population, the same formula can be modified and applied. Std. Error Rather than X = mx + Z x use X = x + Z x But, what does this statement mean?

20 45 Possible Samples of size n = 2, thus 45 possible sample means.
Sampling Distribution = Frequency distribution of sample means Sampling Distribution for Samples of Size n = 2 (from our earlier population) 45 Possible Samples of size n = 2, thus 45 possible sample means. Distribution of these 45 sample means is called Sampling Distribution! See next slide!!! x = Standard Error is the standard dev. of these Xs Mean of all the 45 sample means xs = mx = mx = 4.5 (i.e., the same as mean of the original population So, the earlier statement means: if these sample means are normally distributed, we can use the related formula.

21 Sampling Distribution of Samples of Size n=2
# SAMPLE MEAN 1 $0 & $1 0.5 2 $0 & $2 1.0 3 $0 & $3 1.5 . 10 $1 & $2 11 $1 & $3 2.0 44 $7 & $9 8.0 45 $8 & $9 8.5 x = ($0+$1)/2=$.50 μx = ($0+$3)/2=$1.50 & ($1+$2)/2=$1.50 x

22 What is SAMPLING DISTRIBUTION?
SAMPLING: Process of Selecting your Observations What do we mean by “only if our sample mean ( x ) comes from a normally distributed population?” We mean if our sampling distribution is normal. What is SAMPLING DISTRIBUTION? It is the frequency distribution of the means (Xs) of all possible samples of a particular size (n) drawn from a population. Mean of a Sampling Distribution (Mean of all sample means) = μX = μX Std. Dev. of Sample Means = Standard Error = x

23 SAMPLING: Process of Selecting Your Observations
Freq So, if we know that distribution of our Sample Means (i.e., Sampling Distribution) is NORMAL, as shown below: ……… ……X…….. …….…….X….……. …………………………….. . ………… ………--..….. ………………………. ……………………………………….. x x = x  x = Standard Error = x / n We will be able to say the following about the mean ( x ) of a randomly selected sample: x = mx + Z x Since μX = μX , substitute mx for mx : x = mx + Z x

24 SAMPLING: Process of Selecting your Observations
QUESTION: What is the primary purpose of sampling? Answer: To use sample characteristics (e.g., X) as estimates of population characteristics (e.g., mx) What is the significance of this formula? x = mx + Z x Answer: Shows the relationship between mx and x. --So, if x comes from a normal distribution, we can rewrite the formula to estimate mx based on value of x Question: But, is the sampling distribution (i.e., distribution of x ) always normal (so that we can use the above formula)? Let’s see it! x = mx + Z x mx = x + Z x

25 (n = 1) Think of these as distribution of life of all individual light bulbs (X). Think of these as distribution of average life of samples of n light bulbs (X). Distribution of Sample Means (Xs) for Different Population Distributions

26 SAMPLING: Process of Selecting your Observations
Conclusion? As n increases, sampling distribution (i.e., distribution of Xs) will more and more resemble a normal distribution so that for all n > 30, sampling distribution will always be normal, regardless of the distribution of the original population.

27 Income Distribution for a hypothetical population
$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 True Population Mean = μ = Σxi / n = 45 / 10 = $4.5 Population Standard Deviation: = 2.87 This was our hypothetical population.

28 Distribution of the Hypothetical Population
10 9 8 7 6 5 4 3 2 1 $0 $1 $ $ $ $ $6 $ $8 $9 And this was its Frequency Distribution (Uniform). * * * * * * * * * * x Now let’s see its Sampling Distributions for different n sizes.

29 Sampling Distribution of Samples of size n = 1
10 9 8 7 6 5 4 3 2 1 $0 $ $ $ $ $ $6 $7 $ $9 (Distribution of the Means of All Possible Samples of Size n = 1 From Our Original Population) 10 Possible samples of size n=1, thus, 10 possible sample means ( * ) True population mean μx = $4.50 Sampling Distribution is not always normal. x * * * * * * * * * * x * Estimate of the mean for a sample of size =1)

30 Sampling Distribution of Samples of Size n = 2
45 Possible Samples of size n = 2, thus 45 possible sample means. Let’s examine the shape of this sampling distribution!

31 Sampling Distribution of Samples of Size n=2
# SAMPLE MEAN 1 $0 & $1 0.5 2 $0 & $2 1.0 3 $0 & $3 1.5 . 10 $1 & $2 11 $1 & $3 2.0 44 $7 & $9 8.0 45 $8 & $9 8.5 x = ($0+$1)/2=$.50 μx = ($0+$3)/2=$1.50 & ($1+$2)/2=$1.50 x

32 Sampling Distributions of Larger Samples
Let’s now try means of even larger samples (n = 3, 4, 5, and 6) n = 3 n = 4 x x

33 Sampling Distributions of Samples of size 5 and 6
x x NOTE: With larger sample sizes: (a) sampling distribution will more closely resemble normal distribution--applicability of the formula? and (b) Standard Error will become smaller--accuracy of the estimates?

34 Distribution of Sample Means (Xs) for Different Population Distributions

35 CONCLUSION? SAMPLING: Process of Selecting your Observations
With every increase in the sample size… Distribution of sample means improves such that: x becomes smaller, range of x values becomes narrower and, thus, x becomes a more accurate estimate of . Extreme Case: What would be the largest possible sample (say, in our example)? How many such samples? What will be the sample mean? Sampling distribution (i.e., distribution of means) increasingly follows a normal distribution so that for n > 30, it will always be normal, regardless of the distribution shape of the original population.

36 SAMPLING: Process of Selecting Your Observations
Sampling distribution is also guaranteed to be normal, regardless of n, since Xs are normally distributed. Variable of interest X is normally distributed. X1 n1=15 Xs Xs X2 n2=15 n3=15 X3 Distribution of Xs Mean of Xs = x Std. Dev. of Xs =x Distribution of Xs (the Sampling Distribution) Mean of Xs = x Std. Error = x =

37 SAMPLING: Process of Selecting Your Observations
Sampling distribution is guaranteed to be normal only when n 30 is used. Variable of interest X is NOT normally distributed. n1>30 Xs n2 >30 n3 >30 Distribution of for all samples of the same size (Sampling Distribution) Mean of = = x Std. Error =  = Distribution of Xs Mean of Xs = x Std. Dev. of Xs =x

38 _ So, for samples of n  30: SO, mx = X + Z x / n
SAMPLING: Process of Selecting your Observations So, for samples of n  30: mx = X + Z x SO, mx = X + Z x / n Now, Let’s examine the elements of this formula! _ _ Standard Error = x = x / n

39 SAMPLING: Process of Selecting your Observations
We are interested in estimating mx from x Estimation involves a margin of error, that is Actual Score = Estimate + Margin of Error mx = X + Z x / n _ Estimate Margin of Error, lets call it “E” Actual Score So, when using random samples of size n > 30, margin of error in estimation would be: E = Z x / n

40 E2 = Z2 2x / n SAMPLING: Process of Selecting your Observations
Square both sides of the equation: E2 = Z2 2x / n Rewrite it to solve for n: n = Z2 2x / E2 x (population Std. Dev.) is often unknown. Sx (Std. Dev. of a sample) is a reasonable estimate (substitute) for it. Sx can be estimated based on previous studies or a pilot study n = Z2 S2x / E2

41 SAMPLING: Process of Selecting your Observations
Sample size required for estimating a population mean* (mx): n = Z2 S2x / E2 n = Sample size required E = Margin of error we are willing/able to tolerate in estimating the population characteristic (mean) Z = An index reflecting the degree of confidence/ certainty we wish to have in achieving the level of precision/accuracy represented by E above. S = An estimate of Std. Dev. of the characteristic being estimated/studied. * The case of n for estimating a population proportion will be covered later.

42 n = Z2 S2 / E2 SAMPLING: Process of Selecting your Observations
An example: Suppose you were to use a random sample to estimate average IQ of adult males. Suppose you know, from a pilot study that the Std. Dev. of males’ IQ is about 16 points. What size sample should you use if you wish to be 95% sure that your margin of error in estimating average IQ is no more than 3 points (that is if you wish to be 95% sure that the estimate you will obtain from the sample would be within +3 points of the actual/true average IQ of the adult male population)? Z = ? S = ? E = ? Z = 2 S = n = 22 (16)2 / 32 = round up = 114 E = 3

43 n = Z2 S2 / E2 SAMPLING: Process of Selecting your Observations
Assuming worst case scenario when S is unknown: n = Z2 S2 / E2 If no information is available on S, you can assume maximum variability by setting S = ¼ of Range. An Example: Suppose we were to use a random sample to estimate average IQ of adult males. Further suppose that we have absolutely no basis for determining the Std. Dev. of males’ IQ. But, we know that the IQ of the overwhelming majority of adult males ranges between 80 and What size sample should we use if we wish to be 99% sure that our margin of error in estimating the average IQ is no more than 2 points (that is if we wish to be 99% sure that the estimate we will obtain from the sample would be within +2 points of the actual/true average IQ of the adult male population)? Range = 120 – 80 = 40 S = 40/4 = 10 Z=3 n = 32 ( 10)2 / 22 = 225 E=2

44 n = Z2 S2x / E2 SAMPLING: Process of Selecting your Observations
Assessing Resulting Accuracy/Precision of the Estimates, Given a Particular Sample Size: Suppose, we used a survey with lots of 7-point scale items, Collected data from 225 respondents, and Descriptive statistics on the data shows typical Std. Dev. on most items/variables is in the 1.3 to 1.5 range. What can we say about the precision/accuracy of our results, say, with 95% confidence/certainty? n = Z2 S2x / E2 E2 = Z2 S2 / n E = Z S / \/ n E = 2 (1.5) / \/ 225 = 3/15 = .2 ? We can be 95% certain that the sample mean for a typical variable is not off from the true population mean by more than two-tenth of a point. (e.g., if the reported sample mean on a given variable is 4.7, we can be 95% sure that the actual population mean is between 4.5 and 4.9).

45 SAMPLING: Process of Selecting your Observations
Sample size determination for estimating Proportions (p): EXAMPLE: Projecting the percentage of people who would be voting for a particular candidate in a presidential election. In such cases, dispersion is measured by = pq (instead of variance, s2) Where, p = proportion of the population that is expected to have the attribute under study, and q = (1- p), the proportion of the population that is expected NOT to have that attribute So, the sample size formula will change to: n = Z2 pq / E2 Or : NOTE: If we have no basis for judging the expected value of p, we can assume maximum variability (i.e., err on the side of overestimating the required sample size) by setting p at p=0.50 (see the example on next slid). n = Z2 p(1-p) / E2

46 SAMPLING: Process of Selecting your Observations
Sample size determination for Estimating Proportions: EXAMPLE: Suppose you are to project the percentage of potential voters who would be expected to vote for the Republican candidate in the upcoming presidential election. Suppose you have no basis for estimating/guessing what the percentage could possibly be. Also, suppose that you want to be 99% confident/certain that your margin of error would be 3% (i.e., 99% certain that your projection/estimate will be within + 3% of the actual number). What size sample will you need? n = Z2 p(1-p) / E2 Z = 3 p = 0.50 E = 0.03 n = Z2 p(1-p) / E2 n = 32 ( 0.5) (0.5) / 0.032 n = 9 (0.25) / = 2500

47 Sample size determination for most practical situations Source: Krejcie, R. & Morgan D. (1970). Determining Sample Size for Research Activities, Educational and Psychological Measurement, 30, Where: N = Population Size S = Sample Size

48 SAMPLING: Process of Selecting your Observations
QUESTIONS OR COMMENTS ?


Download ppt "SAMPLING: Process of Selecting your Observations"

Similar presentations


Ads by Google