Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sociology 601: Midterm review, October 15, 2009

Similar presentations


Presentation on theme: "Sociology 601: Midterm review, October 15, 2009"— Presentation transcript:

1 Sociology 601: Midterm review, October 15, 2009
Basic information for the midterm Date: Tuesday October 20, 2009 Start time: 2 pm. Place: usual classroom, Art/Sociology 3221 Bring a sheet of notes, a calculator, two pens or pencils Notify me if you anticipate any timing problems Review for midterm terms symbols steps in a significance test testing differences in groups contingency tables and measures of association equations

2 Important terms from chapter 1
Terms for statistical inference: population sample parameter statistic Key idea: You use a sample to make inferences about a population

3 Important terms from chapter 2
2.1) Measurement: variable interval scale ordinal scale nominal scale discrete variable continuous variable ) Sampling: simple random sample probability sampling stratified sampling cluster sampling multistage sampling sampling error Key idea: Statistical inferences depend on measurement and sampling.

4 Important terms from chapter 3
3.1) Tabular and graphic description frequency distribution relative frequency distribution histogram bar graph ) Measures of central tendency and variation mean median mode proportion standard deviation variance interquartile range quartile, quintile, percentile

5 Important terms from chapter 3
Key ideas: 1.) Statistical inferences are often made about a measure of central tendency. 2.) Measures of variation help us estimate certainty about an inference.

6 Important terms from Chapter 4
probability distribution sampling distribution sample distribution normal distribution standard error central limit theorem z-score Key ideas: 1.) If we know what the population is like, we can predict what a sample might be like. 2.) A sample statistic gives us a best guess of the population parameter. 2.) If we work carefully, a sample can tell us how confident to be about our sample statistic.

7 Important terms from chapter 5
point estimator estimate unbiased efficient confidence interval Key ideas: 1.) We have a standard set of equations we use to make estimates. 2.) These equations are used because they have specific desirable properties. 3.) A confidence interval provides your best guess of a parameter. 4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter.

8 Important terms from chapter 6
6.1 – 6.3) Statistical inference: Significance tests assumptions hypothesis test statistic p-value conclusion null hypothesis one-sided test two-sided test z-statistic

9 Key Idea from chapter 6 A significance test is a ritualized way to ask about a population parameter. 1.) Clearly state assumptions 2.) Hypothesize a value for a population parameter 3.) Calculate a sample statistic. 4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic. 5.) Decide whether the hypothesis can be thrown out.

10 More important terms from chapter 6
6.4, 6.7) Decisions and types of errors in hypothesis tests type I error type II error power ) Small sample tests t-statistic binomial distribution binomial test Key ideas: 1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference. 2.) Small sample tests have the same principles as large sample tests, but require different assumptions and techniques.

11 symbols

12 Significance tests, Step 1: assumptions
An assumption that the sample was drawn at random. this is pretty much a universal assumption for all significance tests. An assumption whether the variable has two outcome categories (proportion) or many intervals (mean). An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. Some tests assume a normal population distribution. Other tests assume different minimum sample sizes. Some tests do not make this assumption. Declare α level at the start, if you use one.

13 Significance Tests, Step 2: Hypothesis
State the hypothesis as a null hypothesis. Remember that the null hypothesis is about the population from which you draw your sample. Write the equation for the null hypothesis. The null hypothesis can imply a one- or two-sided test. Be sure the statement and equation are consistent.

14 Significance Tests, Step 3: Test statistic
For the test statistic, write: the equation, your work, and the answer. Full disclosure maximizes partial credit. I recommend four significant digits at each computational step, but present three as the answer.

15 Significance tests, Step 4: p-value
Calculate an appropriate p-value for the test-statistic. Use the correct table for the type of test; Use the correct degrees of freedom if applicable; Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.

16 Significance Tests, Step 5: Conclusion
Write a conclusion write the p-value, your decision to reject H0 or not; a statement of what your decision means; discuss the substantive importance of your sample statistic.

17 test statistics and 95% confidence intervals

18 other important equations #1

19 other important equations #2
know how to calculate: medians z-scores from Y-scores, p-values from z-scores z-scores from p-values, Y-scores from z-scores t-scores from Y-scores, p-values from t-scores t-scores from p-values, Y-scores from t-scores

20 immediate test for sample mean using TTESTI:
Useful STATA outputs immediate test for sample mean using TTESTI: . * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500 . ttesti , level(95) One-sample t test | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] x | Degrees of freedom: 99 Ho: mean(x) = 500 Ha: mean < Ha: mean != Ha: mean > 500 t = t = t = P < t = P > |t| = P > t =

21 immediate test for sample proportion using PRTESTI:
Useful STATA outputs immediate test for sample proportion using PRTESTI: . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5 . prtesti , level(95) One-sample test of proportion x: Number of obs = Variable | Mean Std. Err [95% Conf. Interval] x | Ho: proportion(x) = .5 Ha: x < Ha: x != Ha: x > .5 z = z = z = P < z = P > |z| = P > z =

22 Useful STATA outputs Small sample comparison of proportions using bitesti bitesti N Observed k Expected k Assumed p Observed p Pr(k >= 2) = (one-sided test) Pr(k <= 2) = (one-sided test) Pr(k <= 2 or k >= 11) = (two-sided test)

23 Useful STATA outputs Predicting the required sample size to estimate a population proportion using sampsi sampsi , alpha(.05) power(.5) onesample Estimated sample size for one-sample comparison of proportion to hypothesized value Test Ho: p = , where p is the proportion in the population Assumptions: alpha = (two-sided) power = alternative p = Estimated required sample size: n = Other sampsi commands to know sampsi 12 13, sd(2.5) power(.5) onesample a(.01) sampsi , alpha(.05) n(100) onesample a(.05)

24 Comparison of two means using ttesti
Useful STATA outputs Comparison of two means using ttesti ttesti , unequal Two-sample t test with unequal variances | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] x | y | combined | diff | Satterthwaite's degrees of freedom: Ho: mean(x) - mean(y) = diff = 0 Ha: diff < Ha: diff != Ha: diff > 0 t = t = t = P < t = P > |t| = P > t =

25 Summary of the first half of the course
Terminology: Greek letters represent population parameters. Greek letters with a “hat” represent estimates of population parameters Arabic letters represent sample statistics The goal of statistical inference is to make statements about a population based on information from a sample. Variables may have nominal, ordinal, and interval scales. The scales you use affect the power of your statistical test The scales you use also affect the possibility of erroneous inferences

26 Descriptive statistics you need to know
How to interpret frequency distributions and relative frequency distributions. Measures of central tendency: mean, median, mode. (plus weighted means, effects of outliers on means) Measures of variation: range, variance, standard deviation, standard deviation in graphical form.

27 Old equations for descriptive statistics:
.

28 sampling distributions
A sampling distribution of the mean is a probability distribution for the sample mean of all possible samples of size n for a population. The central limit theorem and the law of large numbers state that with increasing sample size, sampling distributions become narrower and more like a normal distribution. Sampling distributions are a basis for statistical inference: One uses n and the variance of cases within the sample to estimate the typical difference between a sample mean and the population mean.

29 statistical inference
A point estimator is a sample statistic that predicts the value of a parameter. A hypothesis test uses a sample statistic to test a specific prediction about possible values of a parameter. A confidence interval predicts the distance of a point estimator from the population parameter (with a bit of difficult logic).

30 Tests for statistical significance
Key question: could some pattern in the sample merely be a result of random sampling error, or does it reflect a true pattern in the underlying population? Key terms: assumptions, hypothesis, test statistic, p-value, conclusion Other key concepts: rejecting Ho, fixed decision rules, type I and type II errors, power of a test, t-distribution

31 Old equations for statistical inference
.

32 Chapter 6: Significance Tests for Single Sample
sample size best test mean large z-test for Ybar - 0 proportion z-test for hat - 1 small t-test for Ybar - 0 Fisher’s exact test

33 Equations for tests of statistical significance

34 Chapter 7: Comparing scores for two groups
sample size sample scheme best test mean large independent z-test for 2 - 1 proportion z-test for 2 - 1 small t-test for 2 - 1 Fisher’s exact test dependent z-test for D McNemar test t-test for D Binomial test

35 Two Independent Groups: Large Samples, Means
It is important to be able to recognize the parts of the equation, what they mean, and why they are used. Equal variance assumption? NO

36 Two Independent Groups: Large Samples, Proportions
Equal variance assumption? YES (if proportions are equal then so are variances). df = N1 + N2 - 2

37 Two Independent Groups: Small Samples, Means
7.3 Difference of two small sample means: Equal variance assumption: SOMETIMES (for ease) NO (in computer programs)

38 Two Independent Groups: Small Samples, Proportions
Fisher’s exact test via stata, SAS, or SPSS calculates exact probability of all possible occurences

39 Dependent Samples: Means: Proportions:

40 Chapter 8: Analyzing associations
Contingency tables and their terminologies: marginal distributions and joint distributions conditional distribution of R, given a value of E. (as counts or percentages in A & F) marginal, joint, and conditional probabilities. (as proportions in A & F) “Are two variables statistically independent?”

41 Descriptive statistics you need to know
How to draw and interpret contingency tables (crosstabs) Frequency and probability/ percentage terms marginal conditional joint Measures of relationships: odds, odds ratios gamma and tau-b

42 Observed and expected cell counts
fo, the observed cell count, is the number of cases in a given cell. fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other. fe = row total * column total / N the equation for fe is a correction for rows or columns with small totals.

43 Chi-squared test of independence
Assumptions: 2 categorical variables, random sampling, fe >= 5 Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.) Test statistic: 2 = ((fo-fe)2/fe) p-value from 2 table, df = (r-1)(c-1) Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.

44 Probabilities, odds, and odds ratios.
Given a probability, you can calculate an odds and a log odds. odds = p / (1-p) 50/50 = 1.0 0  ∞ log odds = log (p / (1-p) ) = log (p) – log(1-p) 50/50 = 0.0 -∞  +∞ odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ] Given an odds, you can calculate a probability. p = odds / ( 1 + odds)

45 Measures of association with ordinal data
concordant observations C: in a pair, one is higher on both x and y discordant observations D: in a pair, one is higher on x and lower on y ties in a pair, same on x or same on y gamma (ignores ties) tau-b is a gamma that adjusts for “ties” gamma often increases with more collapsed tables b and  both have standard errors in computer output b can be interpreted as a correlation coefficient


Download ppt "Sociology 601: Midterm review, October 15, 2009"

Similar presentations


Ads by Google