Sociology 601: Midterm review, October 15, 2009

Slides:



Advertisements
Similar presentations
Inferential Statistics & Hypothesis Testing
Advertisements

Sociology 601 Class 8: September 24, : Small-sample inference for a proportion 7.1: Large sample comparisons for two independent sample means.
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
AP Statistics – Chapter 9 Test Review
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Sociology 601 Class 10: October 1, : Small sample comparisons for two independent groups. o Difference between two small sample means o Difference.
QUANTITATIVE DATA ANALYSIS
PSY 307 – Statistics for the Behavioral Sciences
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
Sociology 601 Class 7: September 22, 2009
Topic 2: Statistical Concepts and Market Returns
Final Review Session.
Sociology 601 Class 29: December 10, 2009 REVIEW Homework 10 Review –Chart reviewing which tests when –5 steps in hypothesis testing –Chi-Square (maybe.
Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared.
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Chapter 2 Simple Comparative Experiments
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Inferences About Process Quality
Statistical Methods in Computer Science Hypothesis Testing I: Treatment experiment designs Ido Dagan.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Hypothesis Testing Using The One-Sample t-Test
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
AM Recitation 2/10/11.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Hypothesis Testing:.
Hypothesis Testing Charity I. Mulig. Variable A variable is any property or quantity that can take on different values. Variables may take on discrete.
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Education 793 Class Notes T-tests 29 October 2003.
T-distribution & comparison of means Z as test statistic Use a Z-statistic only if you know the population standard deviation (σ). Z-statistic converts.
More About Significance Tests
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Topic 5 Statistical inference: point and interval estimate
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
Chapter 15 Data Analysis: Testing for Significant Differences.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
9-1 Hypothesis Testing Statistical Hypotheses Definition Statistical hypothesis testing and confidence interval estimation of parameters are.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Review of Chapters 1- 6 We review some important themes from the first 6 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
4 Hypothesis & Testing. CHAPTER OUTLINE 4-1 STATISTICAL INFERENCE 4-2 POINT ESTIMATION 4-3 HYPOTHESIS TESTING Statistical Hypotheses Testing.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Academic Research Academic Research Dr Kishor Bhanushali M
Inferential Statistics. Coin Flip How many heads in a row would it take to convince you the coin is unfair? 1? 10?
Review Lecture 51 Tue, Dec 13, Chapter 1 Sections 1.1 – 1.4. Sections 1.1 – 1.4. Be familiar with the language and principles of hypothesis testing.
Chapter Eight: Using Statistics to Answer Questions.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
1 Testing Statistical Hypothesis The One Sample t-Test Heibatollah Baghi, and Mastee Badii.
Chapter 13 Understanding research results: statistical inference.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Lecture 7: Bivariate Statistics. 2 Properties of Standard Deviation Variance is just the square of the S.D. If a constant is added to all scores, it has.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Chapter 9 Introduction to the t Statistic
Statistics II: An Overview of Statistics
15.1 The Role of Statistics in the Research Process
Chapter Nine: Using Statistics to Answer Questions
Introductory Statistics
Presentation transcript:

Sociology 601: Midterm review, October 15, 2009 Basic information for the midterm Date: Tuesday October 20, 2009 Start time: 2 pm. Place: usual classroom, Art/Sociology 3221 Bring a sheet of notes, a calculator, two pens or pencils Notify me if you anticipate any timing problems Review for midterm terms symbols steps in a significance test testing differences in groups contingency tables and measures of association equations

Important terms from chapter 1 Terms for statistical inference: population sample parameter statistic Key idea: You use a sample to make inferences about a population

Important terms from chapter 2 2.1) Measurement: variable interval scale ordinal scale nominal scale discrete variable continuous variable 2.2-2.4) Sampling: simple random sample probability sampling stratified sampling cluster sampling multistage sampling sampling error Key idea: Statistical inferences depend on measurement and sampling.

Important terms from chapter 3 3.1) Tabular and graphic description frequency distribution relative frequency distribution histogram bar graph 3.2-3.4) Measures of central tendency and variation mean median mode proportion standard deviation variance interquartile range quartile, quintile, percentile

Important terms from chapter 3 Key ideas: 1.) Statistical inferences are often made about a measure of central tendency. 2.) Measures of variation help us estimate certainty about an inference.

Important terms from Chapter 4 probability distribution sampling distribution sample distribution normal distribution standard error central limit theorem z-score Key ideas: 1.) If we know what the population is like, we can predict what a sample might be like. 2.) A sample statistic gives us a best guess of the population parameter. 2.) If we work carefully, a sample can tell us how confident to be about our sample statistic.

Important terms from chapter 5 point estimator estimate unbiased efficient confidence interval Key ideas: 1.) We have a standard set of equations we use to make estimates. 2.) These equations are used because they have specific desirable properties. 3.) A confidence interval provides your best guess of a parameter. 4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter.

Important terms from chapter 6 6.1 – 6.3) Statistical inference: Significance tests assumptions hypothesis test statistic p-value conclusion null hypothesis one-sided test two-sided test z-statistic

Key Idea from chapter 6 A significance test is a ritualized way to ask about a population parameter. 1.) Clearly state assumptions 2.) Hypothesize a value for a population parameter 3.) Calculate a sample statistic. 4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic. 5.) Decide whether the hypothesis can be thrown out.

More important terms from chapter 6 6.4, 6.7) Decisions and types of errors in hypothesis tests type I error type II error power 6.5-6.6) Small sample tests t-statistic binomial distribution binomial test Key ideas: 1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference. 2.) Small sample tests have the same principles as large sample tests, but require different assumptions and techniques.

symbols

Significance tests, Step 1: assumptions An assumption that the sample was drawn at random. this is pretty much a universal assumption for all significance tests. An assumption whether the variable has two outcome categories (proportion) or many intervals (mean). An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. Some tests assume a normal population distribution. Other tests assume different minimum sample sizes. Some tests do not make this assumption. Declare α level at the start, if you use one.

Significance Tests, Step 2: Hypothesis State the hypothesis as a null hypothesis. Remember that the null hypothesis is about the population from which you draw your sample. Write the equation for the null hypothesis. The null hypothesis can imply a one- or two-sided test. Be sure the statement and equation are consistent.

Significance Tests, Step 3: Test statistic For the test statistic, write: the equation, your work, and the answer. Full disclosure maximizes partial credit. I recommend four significant digits at each computational step, but present three as the answer.

Significance tests, Step 4: p-value Calculate an appropriate p-value for the test-statistic. Use the correct table for the type of test; Use the correct degrees of freedom if applicable; Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.

Significance Tests, Step 5: Conclusion Write a conclusion write the p-value, your decision to reject H0 or not; a statement of what your decision means; discuss the substantive importance of your sample statistic.

test statistics and 95% confidence intervals

other important equations #1

other important equations #2 know how to calculate: medians z-scores from Y-scores, p-values from z-scores z-scores from p-values, Y-scores from z-scores t-scores from Y-scores, p-values from t-scores t-scores from p-values, Y-scores from t-scores

immediate test for sample mean using TTESTI: Useful STATA outputs immediate test for sample mean using TTESTI: . * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500 . ttesti 100 508 100 500, level(95) One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 100 508 10 100 488.1578 527.8422 Degrees of freedom: 99 Ho: mean(x) = 500 Ha: mean < 500 Ha: mean != 500 Ha: mean > 500 t = 0.8000 t = 0.8000 t = 0.8000 P < t = 0.7872 P > |t| = 0.4256 P > t = 0.2128

immediate test for sample proportion using PRTESTI: Useful STATA outputs immediate test for sample proportion using PRTESTI: . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5 . prtesti 832 .53 .50, level(95) One-sample test of proportion x: Number of obs = 832 ------------------------------------------------------------------------------ Variable | Mean Std. Err. [95% Conf. Interval] -------------+---------------------------------------------------------------- x | .53 .0173032 .4960864 .5639136 Ho: proportion(x) = .5 Ha: x < .5 Ha: x != .5 Ha: x > .5 z = 1.731 z = 1.731 z = 1.731 P < z = 0.9582 P > |z| = 0.0835 P > z = 0.0418

Useful STATA outputs Small sample comparison of proportions using bitesti bitesti 12 2 .53 N Observed k Expected k Assumed p Observed p ------------------------------------------------------------ 12 2 6.36 0.53000 0.16667 Pr(k >= 2) = 0.998312 (one-sided test) Pr(k <= 2) = 0.011440 (one-sided test) Pr(k <= 2 or k >= 11) = 0.017159 (two-sided test)

Useful STATA outputs Predicting the required sample size to estimate a population proportion using sampsi sampsi .5 .53, alpha(.05) power(.5) onesample Estimated sample size for one-sample comparison of proportion to hypothesized value Test Ho: p = 0.5000, where p is the proportion in the population Assumptions: alpha = 0.0500 (two-sided) power = 0.5000 alternative p = 0.5300 Estimated required sample size: n = 1068 Other sampsi commands to know sampsi 12 13, sd(2.5) power(.5) onesample a(.01) sampsi .5 .53, alpha(.05) n(100) onesample a(.05)

Comparison of two means using ttesti Useful STATA outputs Comparison of two means using ttesti ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal Two-sample t test with unequal variances ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 4252 18.1 .1978304 12.9 17.71215 18.48785 y | 6764 32.6 .221294 18.2 32.16619 33.03381 combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597 diff | -14.5 .2968297 -15.08184 -13.91816 Satterthwaite's degrees of freedom: 10858.6 Ho: mean(x) - mean(y) = diff = 0 Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 t = -48.8496 t = -48.8496 t = -48.8496 P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000

Summary of the first half of the course Terminology: Greek letters represent population parameters. Greek letters with a “hat” represent estimates of population parameters Arabic letters represent sample statistics The goal of statistical inference is to make statements about a population based on information from a sample. Variables may have nominal, ordinal, and interval scales. The scales you use affect the power of your statistical test The scales you use also affect the possibility of erroneous inferences

Descriptive statistics you need to know How to interpret frequency distributions and relative frequency distributions. Measures of central tendency: mean, median, mode. (plus weighted means, effects of outliers on means) Measures of variation: range, variance, standard deviation, standard deviation in graphical form.

Old equations for descriptive statistics: .

sampling distributions A sampling distribution of the mean is a probability distribution for the sample mean of all possible samples of size n for a population. The central limit theorem and the law of large numbers state that with increasing sample size, sampling distributions become narrower and more like a normal distribution. Sampling distributions are a basis for statistical inference: One uses n and the variance of cases within the sample to estimate the typical difference between a sample mean and the population mean.

statistical inference A point estimator is a sample statistic that predicts the value of a parameter. A hypothesis test uses a sample statistic to test a specific prediction about possible values of a parameter. A confidence interval predicts the distance of a point estimator from the population parameter (with a bit of difficult logic).

Tests for statistical significance Key question: could some pattern in the sample merely be a result of random sampling error, or does it reflect a true pattern in the underlying population? Key terms: assumptions, hypothesis, test statistic, p-value, conclusion Other key concepts: rejecting Ho, fixed decision rules, type I and type II errors, power of a test, t-distribution

Old equations for statistical inference .

Chapter 6: Significance Tests for Single Sample sample size best test mean large z-test for Ybar - 0 proportion z-test for hat - 1 small t-test for Ybar - 0 Fisher’s exact test

Equations for tests of statistical significance

Chapter 7: Comparing scores for two groups sample size sample scheme best test mean large independent z-test for 2 - 1 proportion z-test for 2 - 1 small t-test for 2 - 1 Fisher’s exact test dependent z-test for D McNemar test t-test for D Binomial test

Two Independent Groups: Large Samples, Means It is important to be able to recognize the parts of the equation, what they mean, and why they are used. Equal variance assumption? NO

Two Independent Groups: Large Samples, Proportions Equal variance assumption? YES (if proportions are equal then so are variances). df = N1 + N2 - 2

Two Independent Groups: Small Samples, Means 7.3 Difference of two small sample means: Equal variance assumption: SOMETIMES (for ease) NO (in computer programs)

Two Independent Groups: Small Samples, Proportions Fisher’s exact test via stata, SAS, or SPSS calculates exact probability of all possible occurences

Dependent Samples: Means: Proportions:

Chapter 8: Analyzing associations Contingency tables and their terminologies: marginal distributions and joint distributions conditional distribution of R, given a value of E. (as counts or percentages in A & F) marginal, joint, and conditional probabilities. (as proportions in A & F) “Are two variables statistically independent?”

Descriptive statistics you need to know How to draw and interpret contingency tables (crosstabs) Frequency and probability/ percentage terms marginal conditional joint Measures of relationships: odds, odds ratios gamma and tau-b

Observed and expected cell counts fo, the observed cell count, is the number of cases in a given cell. fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other. fe = row total * column total / N the equation for fe is a correction for rows or columns with small totals.

Chi-squared test of independence Assumptions: 2 categorical variables, random sampling, fe >= 5 Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.) Test statistic: 2 = ((fo-fe)2/fe) p-value from 2 table, df = (r-1)(c-1) Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.

Probabilities, odds, and odds ratios. Given a probability, you can calculate an odds and a log odds. odds = p / (1-p) 50/50 = 1.0 0  ∞ log odds = log (p / (1-p) ) = log (p) – log(1-p) 50/50 = 0.0 -∞  +∞ odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ] Given an odds, you can calculate a probability. p = odds / ( 1 + odds)

Measures of association with ordinal data concordant observations C: in a pair, one is higher on both x and y discordant observations D: in a pair, one is higher on x and lower on y ties in a pair, same on x or same on y gamma (ignores ties) tau-b is a gamma that adjusts for “ties” gamma often increases with more collapsed tables b and  both have standard errors in computer output b can be interpreted as a correlation coefficient