Statistical inference uses impersonal chance to draw conclusions about a population or process based on data drawn from a random sample or randomized.

Slides:



Advertisements
Similar presentations
Binomial Distributions Section 8.1. The 4 Commandments of Binomial Distributions There are n trials. There are n trials. Each trial results in a success.
Advertisements

Sampling Distributions for Counts and Proportions
Objective: To test claims about inferences for two proportions, under specific conditions Chapter 22.
Chapter 6 Sampling and Sampling Distributions
SAMPLING DISTRIBUTIONS Chapter How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important.
Section 7.4 Approximating the Binomial Distribution Using the Normal Distribution HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2008.
Chapter 10: Sampling and Sampling Distributions
Biostatistics Unit 4 - Probability.
Discrete Probability Distributions Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College Publishing.
1 Sampling Distributions Chapter Introduction  In real life calculating parameters of populations is prohibitive because populations are very.
Chapter 7 Sampling and Sampling Distributions
BHS Methods in Behavioral Sciences I
Lecture 3 Sampling distributions. Counts, Proportions, and sample mean.
Chapter 7: Variation in repeated samples – Sampling distributions
Chapter 11: Random Sampling and Sampling Distributions
Normal and Sampling Distributions A normal distribution is uniquely determined by its mean, , and variance,  2 The random variable Z = (X-  /  is.
Objectives (BPS chapter 13) Binomial distributions  The binomial setting and binomial distributions  Binomial distributions in statistical sampling 
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Chapter 5 Sampling Distributions
10.3 Estimating a Population Proportion
AP Statistics Chapter 9 Notes.
Chapter 8 The Binomial and Geometric Distributions YMS 8.1
5.5 Distributions for Counts  Binomial Distributions for Sample Counts  Finding Binomial Probabilities  Binomial Mean and Standard Deviation  Binomial.
The Binomial and Geometric Distribution
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sampling distributions - for counts and proportions IPS chapter 5.1 © 2006 W. H. Freeman and Company.
Bernoulli Trials Two Possible Outcomes –Success, with probability p –Failure, with probability q = 1  p Trials are independent.
Probability Unit 4 - Statistics What is probability? Proportion of times any outcome of any random phenomenon would occur in a very long series of repetitions.
Chapter 7: Sampling and Sampling Distributions
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Statistics Section 5-6 Normal as Approximation to Binomial.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
Aim: How do we use sampling distributions for proportions? HW5: complete last slide.
Stat 1510: Sampling Distributions
AP Statistics Semester One Review Part 2 Chapters 4-6 Semester One Review Part 2 Chapters 4-6.
Chapter 7: Sampling Distributions Section 7.1 How Likely Are the Possible Values of a Statistic? The Sampling Distribution.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a measure of the population. This value is typically unknown. (µ, σ, and now.
© 2010 Pearson Prentice Hall. All rights reserved 7-1.
1 7.3 RANDOM VARIABLES When the variables in question are quantitative, they are known as random variables. A random variable, X, is a quantitative variable.
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 5 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Ch1 Larson/Farber 1 1 Elementary Statistics Larson Farber Introduction to Statistics As you view these slides be sure to have paper, pencil, a calculator.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Sampling Distributions Chapter 18. Sampling Distributions If we could take every possible sample of the same size (n) from a population, we would create.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Sampling Distributions Chapter 18. Sampling Distributions A parameter is a number that describes the population. In statistical practice, the value of.
Unit 3: Probability.  You will need to be able to describe how you will perform a simulation  Create a correspondence between random numbers and outcomes.
Statistics 200 Objectives:
Sampling Distributions Chapter 18
Sampling and Sampling Distributions
CHAPTER 14: Binomial Distributions*
CHAPTER 6 Random Variables
Section 9.2 – Sample Proportions
The binomial applied: absolute and relative risks, chi-square
Chapter 5 Sampling Distributions
Chapter 5 Sampling Distributions
AP Statistics: Chapter 7
Review of Hypothesis Testing
Chapter 5 Sampling Distributions
Hypothesis Testing.
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Sampling Distributions
Analyzing the Association Between Categorical Variables
Bernoulli Trials Two Possible Outcomes Trials are independent.
Psychological Experimentation
The Binomial Distributions
Applied Statistical and Optimization Models
Presentation transcript:

Statistical inference uses impersonal chance to draw conclusions about a population or process based on data drawn from a random sample or randomized experiment.

 When data are produced by random sampling or randomized experiment, a statistic is a random variable that obeys the laws of probability.

 A sampling distribution shows how a statistic would vary with repeated random sampling of the same size and from the same population.  A sampling distribution, therefore, is a probability distribution of the results of an infinitely large number of such samples.

 A population distribution of a random variable is the distribution of its values for all members of the population.  Thus a population distribution is also the probability distribution of the random variable when we choose one individual (i.e. observation or subject) from the population at random.

 Recall that a sampling distribution is a conceptual ideal: it helps us to understand the logic of drawing random samples of size-n from the same population in order to obtain statistics by which we make inferences about a parameter.  Population distribution is likewise a conceptual ideal: it tells us that sample statistics are based on probabilities attached to the population from which random samples are drawn.

Counts & Sample Proportions

 Count: random variable X is a count of the occurrences of some outcome—of some ‘success’ versus a corresponding ‘failure’— in a fixed number of observations.  A count is a discrete random variable that describes categorical data (concerning success vs. failure).

 Sample proportion: if the number of observations is n, then the sample proportion of observations is X/n.  A sample proportion is also a discrete random variable that describes categorical data (concerning success vs. failure).

 Inferential statistics for counts & proportions are premised on a binomial setting.

The Binomial Setting 1.There are a fixed number n of observations. 2. The n observations are all independent. 3. Each observation falls into one of just two categories, which for convenience we call ‘success’ or ‘failure.’ 4.The probability of a success, p, is the same for each observation. 5.Strictly speaking, the population must be at least 20 times greater than the sample for counts, 10 times greater for proportions.

Counts  The distribution of the count X of successes in the binomial setting is called the binomial distribution with parameters n & p (i.e. number of observations & probability of success on any one observation). X is B(n, p)

 Finding binomial probabilities: use factorial, binomial table, or software.  Binomial mean & standard deviation:

 Example: An experimental study finds that, in a placebo group of 2000 men, 84 got heart attacks, but in a treatment group of another 2000, just 56 got heart attacks.  That is, 2000 independent observations of men have found count X of heart attacks is B(2000, 0.04), so that:. mean=np=(2000)(.04)=80. sd=sqrt(2000)(.04)(.96)=8.76

 Treatment group. bitesti N Observed k Expected k Assumed p Observed p Pr(k >= 56) = (one-sided test) Pr(k <= 56) = (one-sided test) Pr(k = 106) = (two-sided test)  So, it’s quite unlikely (p=.002) that there would be <=56 heart attacks by chance: the treatment looks promising. What about the placebo group?

 Placebo group. bitesti N Observed k Expected k Assumed p Observed p Pr(k >= 84) = (one-sided test) Pr(k <= 84) = (one-sided test) Pr(k = 84) = (two-sided test)  By contrast, it’s quite likely (p=.70) that the heart attack count in the placebo group would occur by chance. By comparison, then, the treatment looks promising.

Required Sample Size, Unbiased Estimator  Strictly speaking, he population must be at least 20 times greater than the sample for counts (10 times greater for proportions).  The formula for the binomial mean signifies that np is an unbiased estimator of the population mean.

 Binomial test example (pages ): Corinne is a basketball player who makes 75% of her free throws. In a key game, she shoots 12 free throws but makes just 7 of them. What are the chances that she would make 7 or fewer free throws in any sample of 12?. bitesti N Observed k Expected k Assumed p Observed p Pr(k >= 7) = (one-sided test) Pr(k <= 7) = (one-sided test) Pr(k = 12) = (two-sided test) Note: ‘bitesti…, detail’ gives k==.103

 See Stata ‘help bitest’.

 We’ve just considered sample counts.  Next let’s considered sample proportions.

Sample Proportion  Count of successes in a sample divided by sample size-n.  Whereas a count has whole- number values, a sample proportion is always between 0 & 1.

 This is another example of categorical data (success vs. failure).  Mean & standard deviation of a sample proportion:

 The population must be at least 10 times greater than the sample.  Formula for a proportion’s mean: unbiased estimator of population mean.

 Sample proportion example (pages ): A survey asked a nationwide sample of 2500 adults if they agreed or disagreed that “I like buying new clothes, but shopping is often frustrating & time- consuming.”  Suppose that 60% of all adults would agree with the question. What is the probability that the sample proportion who agree is at least 58%?

 Step 1: compute the mean & standard deviation.

 Step 2: solve the problem.

 How to do it in Stata:. prtesti One-sample test of proportionx: Number of obs = 2500 Variable Mean Std. Err.[95% Conf. Interval] x P(Z>z) =  That is, there is a 98% probability that the percent of respondents who agree is at least 58%: this is quite consistent with the broader evidence.

 See Stata ‘help prtest’.

 We’ve just considered sample proportions.  Next let’s consider sample means.

Sampling Distribution of a Sample Mean  This is an example of quantitative data.  A sample mean is just an average of observations (based on a variable’s expected value).  There are two reasons why sample means are so commonly used:

(1) Averages are less variable than individual observations. (2) Averages are more normally distributed than individual observations.

Sampling distribution of a sample mean  Sampling distribution of a sample mean: if a population has a normal distribution, then the sampling distribution of a sample mean of x for n independent observations will also have a normal distribution.  General fact: any linear combination of independent normal random variables is normally distributed.

Standard deviation of a sample mean: ‘Standard error’  Divide the standard deviation of the sample mean by the square root of sample size-n. This is the standard error.  Doing so anchors the standard deviation to the sample’s size-n: the sampling distribution of the sample mean across relatively small samples has larger spread & across relatively large samples has smaller spread.

 Sampling distribution of a sample mean: If population’s distribution = then the sampling distribution of a sample mean =

 Why does the the sampling distribution of the sample mean in relatively small samples have larger spread & in relatively large samples have smaller spread?

 Because the standard deviation of the mean is divided by the square root of sample size-n.  So, if you want the sampling distribution of sample means (i.e. the estimate of the population mean)to be less variable, what’s the most basic thing to do?

 Make the sample size-n larger.  But there are major costs involved, not only in obtaining a larger sample size per se, but also in the amount of increase needed.  This is because the standard deviation of the sample mean is divided by the square root of n.

 What does dividing the mean’s standard deviation by the square root of n imply?  It implies that we’re estimating the variability of the sampling distribution of sample means from the expected value of the population, for an average sample of size n.

 In short, we’re using a sample to estimate the population’s standard deviation of the sampling distribution of sample means.

 Here’s another principle—one that’s even more important to the sampling distribution of sample means than the Law of Large Numbers.

Central Limit Theorem  As the size of a random sample increases, the sampling distribution of the sample mean gets closer to a normal distribution.  This is true no matter what shape the population distribution has.

 The following graphs illustrate the Central Limit Theorem.  The first sample sizes are very small small; the sample sizes become progressively larger.

 Note: the Central Limit Theorem applies to the sampling distribution of not only sample means but also sample sums.  Other statistics (e.g., standard deviations) have their own sampling distributions.

 The Central Limit Theorem allows us to use normal probability calculations to answer questions about sample means from many observations, even when the population distribution is not normal.  Thus, it justifies reliance of inferential statistics on the normal distribution.  N=30 (but perhaps up to 100 or more, depending on the population’s standard deviation) is a common benchmark threshold for the Central Limit Theorem— although a far larger sample is usually necessary for other statistical reasons. The larger the population’s standard deviation, the larger N must be.

 Why not estimate a parameter on the basis of just one observation?

 First, because the sample mean is an unbiased estimator of the population mean & is less variable than a single observation.  Recall that averages are less variable than individual observations.  And recall that averages are more normally distributed than individual observations.

 Second, because a sample size of just one observations yields no measure of variability.  That is, we can’t estimate where the one observed value falls in a sampling distribution of values.

In summary, the sampling distribution of sample means is:  Normal if the population distribution is normal (i.e. a sample mean is a linear combination of independent normal random variables).  Approximately normal for large samples in any case (according to the Central Limit Theorem).

 How can we confirm these pronouncements?  By drawing simulated samples from the sampling distribution applet, or by simulating samples of varying sizes via a statistics software program (see Moore/McCabe, chapter 3, for review).

Let’s briefly review several principles of probability that are strategic to doing inferential statistics: (1) In random samples, the sample mean, the binomial count, & the sample proportion are unbiased estimators of the population mean; & they can be made less variable by substantially increasing sample size-n. (2) The Law of Large Numbers (which is based on the sample size-n, not on the proportion of the population that is sampled).

(3) Averages are less variable than individual observations & are more normally distributed than individual observations. (4) The sampling distribution of sample means is normal if the population distribution is normal. Put differently, the sample mean is a linear combination of independent normal random variables.

(5) The Central Limit Theorem: the sampling distribution of sample means is approximately normal for large samples, even if the underlying population distribution is not normal.

 These principles become additionally important because—by justifying the treatment of means drawn from relatively large samples as more or less normal distributions—they underpin two more fundamental elements of inferential statistics: confidence intervals & significance tests.

 What problems could bias your predictions, even if your sample is well designed?

Answer  Non-sampling problems such as undercoverage, non- response, response bias, & poorly worded questions.