1 Sociology 601, Class 4: September 10, 2009 Chapter 4: Distributions Probability distributions (4.1) The normal probability distribution (4.2) Sampling.

Slides:



Advertisements
Similar presentations
Chapter 6 – Normal Probability Distributions
Advertisements

Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Normal Distribution; Sampling Distribution; Inference Using the Normal Distribution ● Continuous and discrete distributions; Density curves ● The important.
For Explaining Psychological Statistics, 4th ed. by B. Cohen
Chapter 18 Sampling Distribution Models
Chapter 7 Introduction to Sampling Distributions
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 6-1 Introduction to Statistics Chapter 7 Sampling Distributions.
1 Hypothesis Testing In this section I want to review a few things and then introduce hypothesis testing.
Sampling Distributions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
QMS 6351 Statistics and Research Methods Probability and Probability distributions Chapter 4, page 161 Chapter 5 (5.1) Chapter 6 (6.2) Prof. Vera Adamchik.
Chapter 6 Normal Probability Distributions
Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof. Sharyn O’Halloran.
Random Variables and Probability Distributions
Chapter 11: Random Sampling and Sampling Distributions
Inferential Statistics
Business Statistics: Communicating with Numbers
© Copyright McGraw-Hill CHAPTER 6 The Normal Distribution.
Probability Quantitative Methods in HPELS HPELS 6210.
Chapter 6: Probability Distributions
Probability & the Normal Distribution
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Chapter 6: Probability Distributions
1 Normal Random Variables In the class of continuous random variables, we are primarily interested in NORMAL random variables. In the class of continuous.
Chap 6-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 6 Introduction to Sampling.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Vegas Baby A trip to Vegas is just a sample of a random variable (i.e. 100 card games, 100 slot plays or 100 video poker games) Which is more likely? Win.
NOTES The Normal Distribution. In earlier courses, you have explored data in the following ways: By plotting data (histogram, stemplot, bar graph, etc.)
Random Variables Numerical Quantities whose values are determine by the outcome of a random experiment.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Chapter 6. Continuous Random Variables Reminder: Continuous random variable.
Chapter 6 Lecture 3 Sections: 6.4 – 6.5.
Essential Statistics Chapter 31 The Normal Distributions.
Slide 1 © 2002 McGraw-Hill Australia, PPTs t/a Introductory Mathematics & Statistics for Business 4e by John S. Croucher 1 n Learning Objectives –Identify.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal.
Sampling W&W, Chapter 6. Rules for Expectation Examples Mean: E(X) =  xp(x) Variance: E(X-  ) 2 =  (x-  ) 2 p(x) Covariance: E(X-  x )(Y-  y ) =
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 6 Probability Distributions Section 6.2 Probabilities for Bell-Shaped Distributions.
Copyright © 2012 Pearson Education. All rights reserved © 2010 Pearson Education Copyright © 2012 Pearson Education. All rights reserved. Chapter.
Lecture 2 Review Probabilities Probability Distributions Normal probability distributions Sampling distributions and estimation.
BUS304 – Chapter 6 Sample mean1 Chapter 6 Sample mean  In statistics, we are often interested in finding the population mean (µ):  Average Household.
NORMAL DISTRIBUTION AND ITS APPL ICATION. INTRODUCTION Statistically, a population is the set of all possible values of a variable. Random selection of.
Chapter 11 Univariate Data Analysis; Descriptive Statistics These are summary measurements of a single variable. I.Averages or measures of central tendency.
Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.
Two Main Uses of Statistics: 1)Descriptive : To describe or summarize a collection of data points The data set in hand = the population of interest 2)Inferential.
Random Variables Presentation 6.. Random Variables A random variable assigns a number (or symbol) to each outcome of a random circumstance. A random variable.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Stats 95. Normal Distributions Normal Distribution & Probability Events that will fall in the shape of a Normal distribution: –Measures of weight, height,
Chapter 12 Confidence Intervals and Hypothesis Tests for Means © 2010 Pearson Education 1.
Inference: Probabilities and Distributions Feb , 2012.
Chapter 6 The Normal Distribution.  The Normal Distribution  The Standard Normal Distribution  Applications of Normal Distributions  Sampling Distributions.
Probability Theory Modelling random phenomena. Permutations the number of ways that you can order n objects is: n! = n(n-1)(n-2)(n-3)…(3)(2)(1) Definition:
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 16, 2009.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
Chapter 7: The Distribution of Sample Means
Normal Probability Distributions 1 Larson/Farber 4th ed.
THE NORMAL DISTRIBUTION
Construction Engineering 221 Probability and statistics Normal Distribution.
Theoretical distributions: the Normal distribution.
GOVT 201: Statistics for Political Science
Sampling Distributions
Normal Distribution and Parameter Estimation
Sampling Distributions and Estimation
Distribution of the Sample Means
Chapter 7 Sampling Distributions.
The normal distribution
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Chapter 7 Sampling Distributions.
Random Variables and Probability Distributions
Chapter 7 Sampling Distributions.
Presentation transcript:

1 Sociology 601, Class 4: September 10, 2009 Chapter 4: Distributions Probability distributions (4.1) The normal probability distribution (4.2) Sampling distributions (4.3, 4.4)

2 4.1: probability distributions We study probability to get an idea of how well sample statistics match up to their population parameters probability: the proportion of times that a particular outcome would occur in a long run of repeated observations –example: you go to Monte Carlo and watch people play roulette. What is the probability of observing the number “23” in a single spin of a roulette wheel with 38 slots? probability distribution: a listing of possible outcomes for a variable, together with their probabilities

3 Probability distributions for discrete variables: formulas let y denote a possible outcome for variable Y, and let P(y) denote the probability of that outcome. –then 0  P(y)  1 and  all y P(y) = 1 the mean of a probability distribution:  =  (y*P(y)) –why do we use  instead of Y bar ? –Is this equation compatible with our formula for a sample mean? variance of a probability distribution:  2 =  ((y-  ) 2 *P(y))

4 Probability distributions for discrete variables: 3 flips of a coin

5 Probability distributions for discrete variables: example (p. 83) we will estimate parameters from this chart:

6 Calculating the mean, variance, and standard deviation of a probability distribution based on the previous chart: yP(y)y*P(y)µy - µ(y - µ) 2 (y - µ) 2 *P(y) µσ2σ2 σ

7 Probability distributions for continuous variables So far we have described discrete probability distributions where the variable can take on only a finite number of values. As the number of possible values for the variable increases, the probability distribution becomes a continuous function. In such cases, we must solve areas under curves to find: o Population mean or standard deviation o Probability for a certain range of the x-variable.

8 4.2: The normal probability distribution Many social and natural variables have a distinctive continuous probability distribution when we measure them, sort of a ‘bell- shaped’ curve, or a normal distribution.

9 Examples of normal probability distributions Graph on board: Normal distribution for adult women’s heights:  = 64.3 inches,  = 2.8 inches Normal distribution for adult men’s heights:  = 69.9 inches,  = 3.0 inches

10 Standardizing scores Standardizing a score is taking a raw score, a mean, and a standard deviation, and translating the score into a number of standard deviations from the mean. formula: z = (y -  ) /  examples:if y =  then z = 0 y =  +  z = 1 y =  + 2  z = 2 y =  - 2  z = - 2

11 Standardizing scores: Examples Calculate a z-score for each example 1.SAT score: y = 350,  = 500,  = SAT score: y = 520,  = 500,  = IQ score: y = 88,  = 100,  = 15 4.Woman’s height: y = 71,  = 65,  = Psychological test: y = -2.58,  = 0,  = 1

12 General properties of the normal curve The normal curve is symmetric about the mean The normal curve is bell-shaped, with the highest probability occurring at the mean for z from –1 to +1, the probability is about 0.68 for z from –2 to +2, the probability is about 0.95 for z from –3 to +3, the probability is about If a curve is not symmetrical, or if a z-score is inconsistent with the above probabilities, then it is not a normal curve. any z-score is conceptually possible, because the normal curve never quite converges to a probability of zero.

13 Formula for a normal probability distribution A normal probability distribution (e.g. the probability distribution for a roll of 100 dice) is based on the formula: Note that  and  are both elements of the probability. This formula is impossible to integrate, so it is difficult to calculate the probability that an observation will be between y 1 and y 2.

14 A dilemma and a solution The dilemma: the universe is filled with phenomena that have a probability distribution we can’t calculate! The solution: since this distribution recurs so often, it is worth the effort to painstakingly estimate the probabilities associated with each part of the normal distribution, list them by z-scores, then put all the results in a table for everybody to use. (see Appendix A, page 668) –This is an important purpose of standardization.

15 Using Table A (page 668) to estimate areas under the normal curve You are given a z-score and asked to find a p-value Example: z = 1.53, p(z >1.53 = ?) 1.) Move down to the row with the first decimal (1.5) 2.) Move across to the row with the second decimal (.03) 3.) Write the corresponding p-value in an inequality (P(z > 1.53) =.063, by chance alone) For negative z-scores, use the same procedure but reverse the inequality. (p(z < -1.53) =.063, by chance alone)

16 Using Table A (page 668) to estimate areas under the normal curve Practice these examples: what is p(z ≥ 1.19) by chance alone? what is p(z ≤ -.04) by chance alone ? what is p(-1 ≤ z ≤ 1) by chance alone? what is p(z ≤ -1.96) or p(z ≥ 1.96) by chance alone? what is p(|z| ≥ 1.96) by chance alone?

17 reading stata computer outputs #1 going between z-statistics and p-values using DISPLAY NORMPROB and DISPLAY INVNORM note differences between these results and Page 668! display invnorm(.025) display invnorm(.975) * to verify that +/-1.96 are the z-scores you want display normprob(-1.96) display normprob(1.96)

18 Notes about working with the normal curve The table for deriving probabilities only works for normal distributions. If you have some other distribution, you can still calculate σ and z, but you can’t match z to a p-value. Axis references are often confusing in statistics books: the x-axis often lists values for what we call the y-variable the y-axis often has no scale listed at all. It probably should have values for probability per unit of the y-variable. Tables are also confusing: some texts provide tables for p(z<z), while some texts provide tables for p(z>z). To save space, texts don’t provide information for z<0, it is assumed that you understand that the distribution is symmetrical

19 4.3: Sampling distributions Why would we care about a distribution of samples? We can’t study a population, but we can study a sample. We can’t know how well this sample reflects the population, but we can use probability theory to study how samples would tend to come out if we did know the characteristics of the population.

20 Definitions: Sampling distribution: a probability distribution that determines probabilities of a possible values of a sample statistic (i.e. a relative frequency distribution of many sample means). Standard error of a sampling distribution: a measure of the typical distance between a sample mean and a population mean Standard deviation of a population: a measure of the typical distance between an observation and the population mean.

21 Equations: Mean of a sampling distribution: Standard error of a sampling distribution: –Example: estimate the standard error of this sample: –1, 3, 5, 5, 5, 7, 9 –Is this estimate the true standard error of the population?

22 An advantage of large samples: The central limit theorem. As the sample size n grows, the sampling distribution of Y(bar) approaches a normal distribution. This is true even for variables that are not normally distributed in the population, such as age or income!

23

24

25 Why is the central limit theorem a big deal? When you use a sample statistic to guess a parameter, you will want to know how good your guess is. If the distribution of sample means about the population mean is normal, you can estimate how far off a given sample mean might be. With a moderate sample size, the sampling distribution is normal, even if the underlying distribution is not! However, you still may not have a large enough sample to estimate the parameter with the precision you want.

26 Another advantage of large samples: The law of large numbers. The bigger the sample, the closer (on average) the sample statistic to the parameter. In other words, as samples become larger, the variation between samples becomes smaller. Note: the law of large numbers does not involve any sort of telos.(Example of 4th coin toss)

27 The law of large numbers in action. Here is the complete sampling distribution of possible sample means for up to four coin tosses (score variable “heads” = “1” if heads, “0” if tails) n=101 n=20 (0,0).5 (0,1).5 (1,0) 1 (1,1) n=30 (0,0,0).33 (0,0,1).33 (0,1,0).67 (0,1,1).33 (1,0,0).67 (1,0,1).67 (1,1,0) 1 (1,1,1) n=40 0,0, 0,0.25 0,0, 0,1.25 0,0, 1,0.5 0,0, 1,1.25 0,1, 0,0.5 0,1, 0,1.5 0,1, 1,0.75 0,1, 1,1.25 1,0, 0,0.5 1,0, 0,1.5 1,0, 1,0.75 1,0, 1,1.5 1,1, 0,0.75 1,1, 0,1.75 1,1, 1,0 1 1,1, 1,1

28 The law of large numbers: the standard error of a sample shrinks as n increases Recall the formula for a variance of a probability distribution: σ 2 = Σ((y – μ) 2 * P(y)) For n = 1, σ 2 = ((0 -.5) 2 *.5) + ((1 -.5) 2 *.5) =.25 σ =.5 For n = 2, σ 2 2 =.125, σ 2 =.35 For n = 4, σ 4 2 =.0625, σ 4 =.25 The standard error is the standard deviation of a distribution of samples. This is not the same thing as a standard deviation of a single sample, or the standard deviation of a population. The sample standard deviation does not shrink as n increases.

29 Summary: Why we work with samples On average, a statistic from a good random sample will have the same value as the corresponding population parameter. With a larger sample, the sample statistic will be closer to the population parameter on average. If the distribution of sample means is normal, one can make additional guesses about how close the sample statistic might be to the population parameter. We assume the distribution of sample means is normal … - If n > 30 (by the central limit theorem), or - If the population is normally distributed