Statistics Review.

Slides:



Advertisements
Similar presentations
Class Session #2 Numerically Summarizing Data
Advertisements

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
QUANTITATIVE DATA ANALYSIS
Calculating & Reporting Healthcare Statistics
The standard error of the sample mean and confidence intervals
Descriptive Statistics
Chapter Sampling Distributions and Hypothesis Testing.
Introduction to Educational Statistics
Data observation and Descriptive Statistics
Measures of Central Tendency
Today: Central Tendency & Dispersion
Descriptive Statistics
Comparing Two Groups’ Means or Proportions
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma.
Probability Distributions What proportion of a group of kittens lie in any selected part of a pile of kittens?
Measures of Central Tendency or Measures of Location or Measures of Averages.
Quantitative Skills: Data Analysis
Estimation Statistics with Confidence. Estimation Before we collect our sample, we know:  -3z -2z -1z 0z 1z 2z 3z Repeated sampling sample means would.
A Sampling Distribution
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Statistics Recording the results from our studies.
Copyright © 2010 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Smith/Davis (c) 2005 Prentice Hall Chapter Six Summarizing and Comparing Data: Measures of Variation, Distribution of Means and the Standard Error of the.
Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Descriptive Statistics, The Normal Distribution, and Standardization.
Slide 6-1 Copyright © 2004 Pearson Education, Inc.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Copyright © 2009 Pearson Education, Inc. Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Central Tendency and Variability Chapter 4. Variability In reality – all of statistics can be summed into one statement: – Variability matters. – (and.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
The Central Tendency is the center of the distribution of a data set. You can think of this value as where the middle of a distribution lies. Measure.
INVESTIGATION 1.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Descriptive Statistics The goal of descriptive statistics is to summarize a collection of data in a clear and understandable way.
Find out where you can find rand and randInt in your calculator. Write down the keystrokes.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Central Tendency & Dispersion
Descriptive & Inferential Statistics Adopted from ;Merryellen Towey Schulz, Ph.D. College of Saint Mary EDU 496.
Chapter Eight: Using Statistics to Answer Questions.
Unit 2 (F): Statistics in Psychological Research: Measures of Central Tendency Mr. Debes A.P. Psychology.
LIS 570 Summarising and presenting data - Univariate analysis.
Descriptive Statistics for one Variable. Variables and measurements A variable is a characteristic of an individual or object in which the researcher.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
Descriptive Statistics for one variable. Statistics has two major chapters: Descriptive Statistics Inferential statistics.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Standard Deviation as a Ruler and the Normal Model.
Descriptive Statistics(Summary and Variability measures)
Statistics Josée L. Jarry, Ph.D., C.Psych. Introduction to Psychology Department of Psychology University of Toronto June 9, 2003.
Chapter 6: Descriptive Statistics. Learning Objectives Describe statistical measures used in descriptive statistics Compute measures of central tendency.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 6- 1.
Outline Sampling Measurement Descriptive Statistics:
Statistics.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
IB Psychology Today’s Agenda: Turn in:
IB Psychology Today’s Agenda: Turn in:
Description of Data (Summary and Variability measures)
An Introduction to Statistics
Descriptive and inferential statistics. Confidence interval
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
Chapter Nine: Using Statistics to Answer Questions
Presentation transcript:

Statistics Review

Measurement Levels of Measurement One must know the nature of one’s variables in order to understand what manipulations are appropriate (and later, which statistical tests to use because they must be mathematically manipulated for statistics). Nominal Level of Measurement Ordinal Level of Measurement Interval Level of Measurement “Continuous” Ratio Level of Measurement

Measurement The special case of dichotomous variables: A dichotomous variable can take one of two values. For example: Sex: 0=Male, 1=Female Race: 0=Other, 1=Hispanic Cars: 0=Other, 1=SUV Are dichotomous variables nominal, ordinal, interval, or ratio?

Descriptive Statistics Descriptive Statistics are Used by Researchers to Report on Populations and Samples In Sociology: Summary descriptions of measurements (variables) taken about a group of people By Summarizing Information, Descriptive Statistics Speed Up and Simplify Comprehension of a Group’s Characteristics

Descriptive Statistics Summarizing Data: Central Tendency (or Groups’ “Middle Values” on a Variable) Mean Median Mode Variation (or Summary of Differences Within Groups on a Variable) Variance Standard Deviation

Mean Most commonly called the “average.” Add up the values for each case and divide by the total number of cases. Y-bar = (Y1 + Y2 + . . . + Yn) n Y-bar = Σ Yi

Mean What’s up with all those symbols, man? Y-bar = (Y1 + Y2 + . . . + Yn) n Y-bar = Σ Yi Some Symbolic Conventions in this Class: Y = your variable (could be X or Q or  or even “Glitter”) “-bar” or line over symbol of your variable = mean of that variable Y1 = first case’s value on variable Y “. . .” = ellipsis = continue sequentially Yn = last case’s value on variable Y n = number of cases in your sample Σ = Greek letter “sigma” = sum or add up what follows i = a typical case or each case in the sample (1 through n)

Mean The mean is the “balance point.” Each IQ unit away from the mean is like 1 pound placed that far away on a scale. If IQ mean equals 110: 93 106 131 110 17 units 21 units 4 units 0 units The scale is balanced because… 17 + 4 = 21

Mean Means can be badly affected by outliers (data points with extreme values unlike the rest) Outliers can make the mean a bad measure of central tendency or common experience Income in the U.S. Bill Gates All of Us Outlier Mean

Median The middle value when a variable’s values are ranked in order; the point that divides a distribution into two equal halves. When data are listed in order, the median is the point at which 50% of the cases are above and 50% below it. The 50th percentile.

Median Class A--IQs of 13 Students 89 93 97 98 102 106 109 110 115 119 128 131 140 Median = 109 (six cases above, six below)

Median If the first student were to drop out of Class A, there would be a new median: 89 93 97 98 102 106 109 110 115 119 128 131 140 Median = 109.5 109 + 110 = 219/2 = 109.5 (six cases above, six below)

Median The median is unaffected by outliers, making it a better measure of central tendency, better describing the “typical person” than the mean when data are skewed. All of Us Bill Gates outlier

Median If the recorded values for a variable form a symmetric distribution, the median and mean are identical. In skewed data, the mean lies further toward the skew than the median. Symmetric Skewed Mean Mean Median Median

Mode The most common data point is called the mode. The combined IQ scores for Classes A & B: 80 87 89 93 93 96 97 98 102 103 105 106 109 109 109 110 111 115 119 120 127 128 131 131 140 162 BTW, It is possible to have more than one mode! A la mode!!

Mode It may give you the most likely experience rather than the “typical” or “central” experience. In symmetric distributions, the mean, median, and mode are the same. In skewed data, the mean and median lie further toward the skew than the mode. Symmetric Skewed Mean Median Mode Mode Median Mean

Descriptive Statistics Summarizing Data: Central Tendency (or Groups’ “Middle Values”) Mean Median Mode Variation (or Summary of Differences Within Groups) Variance Standard Deviation

Variance and Standard Deviation Often we want to know how “spread out” cases of a variable are. If you find the “average deviation” from the mean, you can get an idea about “spread.” Variance and Standard Deviation help us measure “spread.” Variance is a number you get in the process of calculating standard deviation.

Variance and Standard Deviation Variance and Standard Deviation (sd) are measures of the spread of the recorded values on a variable, measures of dispersion. The larger the variance and sd, the further the individual cases are from the mean. The smaller the variance and sd, the closer the individual scores are to the mean. Mean Mean

Variance and Standard Deviation Variance and sd are numbers that at first seems complex to calculate. We create Variance first. Calculating variance starts with a “deviation.” A deviation is the distance away from the mean of a case’s score. Yi – Y-bar If the average person’s car costs $20,000, my deviation from the mean is - $14,000! 6K - 20K = -14K

Deviation The deviation of 102 from 110.54 is? Deviation of 115? 102 - 110.54 = -8.54 115 - 110.54 = 4.46 Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Y-barA = 110.54

Deviation Squared Since we are looking for an “average spread,” we want to add these to get total deviations: But if we were to do that, we would get zero every time. Why? We need a way to eliminate negative signs. Squaring the deviations will eliminate negative signs... A Deviation Squared: (Yi – Y-bar)2 Back to the IQ example, A deviation squared for 102 is: of 115: (102 - 110.54)2 = (-8.54)2 = 72.93 (115 - 110.54)2 = (4.46)2 = 19.89

Sum of Squares If you were to add all the squared deviations together, you’d get what we call the “Sum of Squares.” Sum of Squares (SS) = Σ (Yi – Y-bar)2 SS = (Y1 – Y-bar)2 + (Y2 – Y-bar)2 + . . . + (Yn – Y-bar)2

Variance Class A, sum of squares: Class A--IQs of 13 Students (102 – 110.54)2 + (115 – 110.54)2 + (126 – 110.54)2 + (109 – 110.54)2 + (131 – 110.54)2 + (89 – 110.54)2 + (98 – 110.54)2 + (106 – 110.54)2 + (140 – 110.54)2 + (119 – 110.54)2 + (93 – 110.54)2 + (97 – 110.54)2 + (110 – 110.54) = SS = 2825.39 Class A--IQs of 13 Students 102 115 128 109 131 89 98 106 140 119 93 97 110 Y-bar = 110.54

Variance The last step… Looking for “average spread,” now that we’ve added, we want to divide by the number of cases. The approximate average sum of squares is the variance. SS/N = Variance for a population. SS/n-1 = Variance for a sample. Variance = Σ(Yi – Y-bar)2 / n – 1

Variance For Class A, Variance = 2825.39 / n - 1 = 2825.39 / 12 = 235.45 How helpful is that???

Standard Deviation To convert variance into something of meaning, let’s create standard deviation. Recall that we squared our deviations before. Let’s “unsquare” and get back to original units of IQ. The square root of the variance reveals the average deviation of the observations from the mean. s.d. = Σ(Yi – Y-bar)2 n - 1

Standard Deviation For Class A, the standard deviation for IQ is: 235.45 = 15.34 The average of persons’ deviation from the mean IQ of 110.54 is 15.34 IQ points. Review: 1. Deviation: Yi – Y-bar 2. Deviation squared: (Yi – Y-bar)2 3. Sum of squares: Σ (Yi – Y-bar)2 4. Variance: Σ(Yi – Y-bar)2 / n – 1 5. Standard deviation: Σ(Yi – Y-bar)2 / n – 1

Standard Deviation Larger s.d. = greater amounts of variation around the mean. For example: 19 25 31 13 25 37 Y = 25 Y = 25 s.d. = 3 s.d. = 6 s.d. = 0 only when all values are the same (only when you have a constant and not a “variable”) If you were to “rescale” a variable, the s.d. would change by the same magnitude—if we changed units above so the mean equaled 250, the s.d. on the left would be 30, and on the right, 60 Like the mean, the s.d. will be inflated by an outlier case value.

Empirical Rule Many naturally occurring variables have bell-shaped distributions. That is, their histograms take a symmetrical and unimodal shape. When this is true, you can be sure that the empirical rule will hold. Empirical rule: If the histogram of data is approximately bell-shaped, then: About 68% of the cases fall between Y-bar – s.d. and Y-bar + s.d. About 95% of the data fall between Y-bar – 2s.d. and Y-bar + 2s.d. All or nearly all the data fall between Y-bar – 3s.d. and Y-bar + 3s.d.

Empirical Rule Empirical rule: If the histogram of data is approximately bell-shaped, then: About 68% of the cases fall between Y-bar – s.d. and Y-bar + s.d. About 95% of the cases fall between Y-bar – 2s.d. and Y-bar + 2s.d. All or nearly all the cases fall between Y-bar – 3s.d. and Y-bar + 3s.d. Body Pile: 100% of Cases s.d. 15 15 15 s.d. 15 M = 100 s.d. = 15 55 70 85 115 130 145 + or – 1 s.d. + or – 2 s.d. + or – 3 s.d.

Normal Curve 68% 68% Z = -3 -2 -1 0 1 2 3 Z=-3 -2 -1 0 1 2 3 The Normal Probability Distribution No matter what the actual s.d. () value is, the proportion of cases under the curve that corresponds with the mean ()+/- 1s.d. is the same (68%). The same is true of mean+/- 2s.d. (95%) And mean +/- 3s.d. (almost all cases) Because of the equivalence of all Normal Distributions, these are often described in terms of the Standard Normal Curve where mean = 0 and s.d. = 1 and is called “z” Z = # of standard deviations away from the mean 68% 68% Z = -3 -2 -1 0 1 2 3 Z=-3 -2 -1 0 1 2 3

Normal Curve IQ  = 100  = 15 Values 55 70 85 100 115 130 145 Converting to z-scores To compare different normal curves, it is helpful to know how to convert data values into z-scores. It is like have two rulers beneath each normal curve. One for data values, the second for z-scores. IQ  = 100  = 15 Values 55 70 85 100 115 130 145 Z-scores -3 -2 -1 0 1 2 3

Normal Curve Z = Y –   Z = 100 – 100 / 15 = 0 Converting to z-scores Z = Y –   Z = 100 – 100 / 15 = 0 Z = 145 – 100 / 15 = 45/15 = 3 Z = 70 – 100 / 15 = -30/15 = -2 Z = 105 – 100 / 15 = 5/15 = .33 IQ  = 100  = 15 Values 55 70 85 100 115 130 145 Z-scores -3 -2 -1 0 1 2 3

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. The sample means would stack up in a normal curve. A normal sampling distribution. z -3 -2 -1 0 1 2 3 Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean Remember how a sampling distribution of means is created? Take a sample of size 500 from the US. Record the mean self-esteem. If the mean should be 25, you might get this. The sample means would stack up in a normal curve. A normal sampling distribution. 2.5% 2.5% z -3 -2 -1 0 1 2 3 Self-esteem 15 20 25 30 35 40

Sampling Distribution for the Mean The sample size affects the sampling distribution: Standard error = population standard deviation / square root of sample size Y-bar= /n

Sampling Distribution for the Mean And if we increase our sample size (n)… Our repeated sample means will be closer to the true mean: 2.5% 2.5% Z-3 -2 -1 0 1 2 3 z -3 -2 -1 0 1 2 3

Sampling Distribution for the Mean Means will be closer to the true mean, and our standard error of the sampling distribution is smaller: 2.5% 2.5% Z-3 -2 -1 0 1 2 3 z -3 -2 -1 0 1 2 3

Sampling Distribution for the Mean The range of particular middle percentages gets smaller: Self-esteem 15 20 25 30 35 40 Z-3 -2 -1 0 1 2 3 95% Range z -3 -2 -1 0 1 2 3

Sampling Distribution for the Mean …But we can say that 95% of the sample means in repeated sampling will always be in the range marked by -2 over to +2 standard errors. Self-esteem 15 20 25 30 35 40 Z-3 -2 -1 0 1 2 3 95% Range z -3 -2 -1 0 1 2 3

Sampling Distribution for the Mean Ooops! Technically speaking, on a normal curve, 95% of cases always fall between +/- 1.96 standard deviations rather than 2. See comparison on next slide…

Sampling Distribution for the Mean Empirical Rule vs. Actuality 68% 1z 68% 0.99z 95% 2z 95% 1.96z 99% 2.58z Almost all 3z 99.9973% 3z

Sampling Distribution for the Mean …But we can say that 95% of the sample means in repeated sampling will always be in the range marked by -1.96 over to +1.96 standard errors. Self-esteem 15 20 25 30 35 40 1.96 Z-3 -2 -1 0 1 2 3 -1.96 95% Range z -3 -2 -1 0 1 2 3

Inferential Statistics for the Mean 1.96z The sampling distribution’s standard error is a measuring stick that we can use to indicate the range of a specified middle percentage of sample means in repeated sampling. 95% 1z 68% 3z 99.99% 25 -3 -1.96 -1 0 1 1.96 3 68% 95% 99.99%

Inferential Statistics for the Mean We use that measuring stick for two things: Confidence Interval Significance Test 1.96z 95% 1z 68% 3z 99.99%  -3 -1.96 -1 0 1 1.96 3 68% 95% 99.99%

Inferential Statistics for the Mean But First! How many samples do we normally get to take? Sample Population

Inferential Statistics for the Mean If we have only one sample to work with, how do we know anything about the sampling distribution???? 25 -3 -1.96 -1 0 1 1.96 3

Inferential Statistics for the Mean Like Magic, we in fact we use our sample’s standard deviation as an estimate of the population’s. Standard error = estimate of population standard deviation / square root of sample size Y-bar= /n Remember why we used: Σ(Yi – Y-bar)2 / n – 1

Inferential Statistics for the Mean But what would happen if we got a sample whose mean did not equal the population mean? 25 -3 -1.96 -1 0 1 1.96 3 ? ? ? ?

Inferential Statistics for the Mean We’d center our sampling distribution over a biased estimate of the population mean!!! 25 -3 -1.96 -1 0 1 1.96 3 ? ? ? ?

Inferential Statistics for the Mean And where we drew our “middle 95%” would change! 25 95% of Sample Means

Inferential Statistics for the Mean But Notice: 95% of Samples’ “95% Ranges” would contain the population mean! 25 95% of Sample Means

Inferential Statistics for the Mean We will not know the true population mean, but 95% of the time the 95% range generated by your estimate of the sampling distribution will contain the true population mean! We call this 95% Range: 95% Confidence Interval Self-esteem 15 20 25 30 35 40 95% Ranges for different samples.

Inferential Statistics for the Mean If we want that range to contain the true population mean 99% of the time (99% confidence interval) we just construct a wider interval. Self-esteem 15 20 25 30 35 40 99% Ranges for different samples.

Inferential Statistics for the Mean We use that measuring stick for two things: Confidence Interval Significance Test 1.96z 95% 1z 68% 3z 99.99%  -3 -1.96 -1 0 1 1.96 3 68% 95% 99.99%

Inferential Statistics for the Mean Confidence Interval Example: I collected a sample of 2,500 with an average self-esteem score of 28 with a standard deviation of 8. What if we want a 99% confidence interval? CI = Mean +/- z * s.e. Find the standard error of the sampling distribution: s.d. / n = 8/50 = 0.16 Build the width of the Interval. 99% corresponds with a z of 2.58. 2.58 * 0.16 = 0.41 Insert the mean to build the interval: 99% C.I. = 28 +/- 0.41 The interval: 27.59 to 28.41 We are 99% confident that the population mean falls between these values.

Inferential Statistics for the Mean And if we wanted a 95% Confidence Interval instead? I collected a sample of 2,500 with an average self-esteem score of 28 with a standard deviation of 8. What if we want a 99% confidence interval? CI = Mean +/- z * s.e. Find the standard error of the sampling distribution: s.d. / n = 8/50 = 0.16 Build the width of the Interval. 99% corresponds with a z of 2.58. 2.58 * 0.16 = 0.41 Insert the mean to build the interval: 99% C.I. = 28 +/- 0.41 The interval: 27.59 to 28.41 We are 99% confident that the population mean falls between these values. 95% X 95% 1.96 X X X X 0.31 1.96 X X 0.31 95% X X 27.69 to 28.31 X 95%

Inferential Statistics for the Mean By centering my sampling distribution’s +/- 1.96z range around my sample’s mean... I can identify a range that, if my sample is one of the middle 95%, it would contain the population’s mean. Or I have a 95% chance that the population’s mean is somewhere in that range.

Inferential Statistics for the Mean By centering my sampling distribution’s +/- 1.96z range around my sample’s mean... I can identify a range that, if my sample is one of the middle 95%, would contain the population’s mean. Or I have a 95% chance that the population’s mean is somewhere in that range. X 2.58z X 99% 99% X

Inferential Statistics for the Mean Besides construct a confidence interval, we can also do a significance test.

Inferential Statistics for the Mean Let’s build a sampling distribution around our guess, 20: sample of size 100; s.d. = 10. Sample, Y-bar s.e. = 10/100 = 10/10 = 1 Self-doubt 16 18 20 22 24 26 28 Z: -3 -2 -1 0 1 2 3 4 5

Inferential Statistics for the Mean Our sample appears to be larger than a critical value of 1.96 (outer 5% of samples) or even 2.58 (outer 1% of samples). Sample, Y-bar s.e. = 10/100 = 10/10 = 1 Self-doubt 16 18 20 22 24 26 28 Z: -3 -2 -1 0 1 2 3 4 5

Inferential Statistics for the Mean How many z’s is our sample mean away from our guess? Z = Y-bar –  / s.e. Z = 25 – 20 / 1 z = 5 s.e. = 10/100 = 10/10 = 1 Sample, Y-bar Self-doubt 16 18 20 22 24 26 28 Z: -3 -2 -1 0 1 2 3 4 5

Inferential Statistics for the Mean Indeed, our sample z-score is 5, well above 1.96 or 2.58. Reject the guess! Looking in a z-Table… Our sample has a .0000287 % chance of having come from a population whose mean is 20! s.e. = 10/100 = 10/10 = 1 Sample, Y-bar Self-doubt 16 18 20 22 24 26 28 Z: -3 -2 -1 0 1 2 3 4 5

Significance Tests A mean is not the only statistic you can do this with. Other statistics, such as the difference between two groups’ means and other new statistics you will learn about, can have significance tests conducted on them. Just use the handy 7-step significance test format

Significance Tests Seven-Step Significance Test Format for Normally Distributed Sampling Distributions: By slapping the sampling distribution for a statistic (sample’s measure of a variable) over a guess of about what the population parameter (census measure of a variable) equals, Ho, we can find out whether our sample could have been drawn from a population whose parameter equals our guess. Set -level for one or two tails (usually = .05) Find Critical Z (usually = +/- 1.96) Determine the null and alternative hypotheses: Ho: Population parameter = some value (usually parameter = 0) Ha: Population parameter  some value (usually parameter  0) Collect Data Calculate Z: z = Statistic – Parameter standard error Make decision about the null hypothesis Find P-value

Statistics Review Congratulations! You are now statistically sophisticated.