Presentation is loading. Please wait.

Presentation is loading. Please wait.

GOVT 201: Statistics for Political Science Spring 2010

Similar presentations


Presentation on theme: "GOVT 201: Statistics for Political Science Spring 2010"— Presentation transcript:

1 GOVT 201: Statistics for Political Science Spring 2010
Final Exam: Review GOVT 201: Statistics for Political Science Spring 2010

2 Final Exam: Review Topics
1. Nominal, Ordinal, Interval data 2. Frequency Distributions, Column and Row Percentages 3. Mode, Median, Mean 4. Deviation, Variance, Standard Deviation 5. Probability 6. Random samples 7. T Score 8. F Ratio 9. Chi-Square tests 10. Regression Analysis

3 The Nature of Social Science Research
Using Numbers in Political Research: Levels of Measurement Numbers serve important functions for researchers, depending on the level of measurement employed. Nominal: Refers to discrete or mutually exclusive categories. Individual cases can only fit into one category at a time. Used to classify, categorize or label. Example: party affiliation, voter, non-voter. Ordinal: Involves the ranking or ordering of cases in terms of the degree to which they possess a certain characteristic. Example: Social class, Measurements of attitudes. Interval-Ratio: Measurements for all cases are expressed in the same units. There are equal intervals between points on a scale and either a real or theoretical zero point. Example: Income, temperature, SAT scores, weight

4 The Nature of Social Science Research
Levels of Measurement: Limitations Nominal: cannot indicate grade, ranking, a quality scale (better or worse), higher or lower, more or less. It is simply a label. Ordinal Data: provides a ranking, but not a magnitude of difference between numbers, or points on a scale. Intervals between points on a scale are not known: Teeth Cleaning Filling----Root Canal Difference?

5 The Nature of Social Science Research
Levels of Measurement: Strengths Interval-Ratio: Allows you to indicate the order of categories, but also the exact differences between them. Uses constant units of measurement with equal intervals between them. Temperature: Difference?

6 The Nature of Social Science Research
Levels of Measurement: Strengths Difference between Interval-Ratio: Interval: artificial zero point: Zero Degree: cold, but a temperature. Ratio: absolute or true zero point: Zero Age: Birth. Interval-Ratio: Can Be Natural or Invented. Some variables in their natural form are interval level (weight, number of siblings you have, hours you watch TV per day). Others become interval because we scale them.

7 Column Cross-Tabulation: Row
Seat Beat Use by Gender with Total Percents SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 14.4% 66 6.6% 58 5.8% 39 3.9% 60 6.0% 355 35.6% 110 11.0% 44 4.4% 55 5.5% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 36.8% % % Row

8 Marginal Cross-Tabulation: Marginal
Seat Beat Use by Gender with Total Percents SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 14.4% 66 6.6% 58 5.8% 39 3.9% 60 6.0% 355 35.6% 110 11.0% 44 4.4% 55 5.5% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 36.8% % % Marginal

9 Cross-Tabulation: Seat Beat Use by Gender with Total Percents (Table 2.16) SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 14.4% 66 6.6% 58 5.8% 39 3.9% 60 6.0% 355 35.6% 110 11.0% 44 4.4% 55 5.5% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 36.8% % %

10 Cross-Tabulation: Seat Beat Use by Gender with Row Percents (Table 2.17) SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 28.9% 66 37.5% 58 46.8% 39 47.0% 60 52.2% 355 71.1% 110 62.5% 53.2% 44 53.0% 55 47.8% 499 100.0% 176 124 83 115 Total 36.8% % %

11 Cross-Tabulation: Seat Beat Use by Gender with Column Percents (Table 2.18) SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 39.2% 66 18.0% 58 15.8% 39 10.6% 60 16.3% 355 56.3% 110 17.5% 10.5% 44 7.0% 55 8.7% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 36.8% % %

12 Cross-Tabulation: Choosing among Total, Row and Column Percents
When determining which percent to use, the rule of thumb is: when the IV is on the row, use row percents, when the IV is on the columns, use column percents. Determining the IV and DV It is not always easy to determine which variable is the Independent Variable and which is the Dependent Variable in a cross-tab. Thus, when in doubt use total percents.

13 Cross-Tabulation: (Table 2.19) Wife Dem Rep Total Husband Democrat
Republican 70 70.0% 63.6% 36.6% 40 44.4% 36.4% 21.1% 30 30.0% 37.5% 15.8% 50 55.6% 62.5% 26.3% 100 52.6% 90 47.4 Total 57.9% % % f Row% Col% Total %

14 Measures of central tendency:
Measures of central tendency are numbers that describe what is average or typical in a distribution We will focus on three measures of central tendency: The Mode The Median The Mean (average) Our choice of an appropriate measure of central tendency depends on three factors: (a) the level of measurement, (b) the shape of the distribution, (c) the purpose of the research.

15 The Mode The Mode: The mode is the most frequent, most typical or most common value or category in a distribution. Example: There are more protestants in the US than people of any other religion. The mode is always a category or score, not a frequency. The mode is not necessarily the category with the majority (that is, 50% or more) of cases. It is simply the category in which the largest number (or proportion) of cases falls.

16 Example of a Bimodal Frequency Distribution

17 The Median The Median: The median is the score that divides the distribution into two equal parts so that half of the cases are above it and half are below it. The median can be calculated for both ordinal and interval levels of measurement, but not for nominal data. It must be emphasized that the median is the exact middle of a distribution. So, now let’s look at ways we can find the median in sorted data:

18 The Mode and Median The Median: The Mode:
- Divides the distribution into two equal (exact middle 50% above and below) The median can be calculated for both ordinal and interval levels of measurement, but not for nominal data. Need to sort data to calculate The Mode: Most frequent or most common value or category. category or score (not a frequency.) not necessarily majority Used to describe nominal variables!

19 In some cases, we can find the median by simple inspection.
Poor Jim Good Sue Only Fair Bob Jorge Excellent Karen Total (N) 5 Let’s look at the responses (A) to the question: “Think about the economy, how would you rate economic conditions in the country today?” First, we arrange the responses (B) in order from lowest to highest (or highest to lowest). Since we have an odd number of cases, let’s find the middle case. A Poor Jim Jorge Only Fair Bob Good Sue Excellent Karen Total (N) 5 B

20 Calculating the median:
We can find the median through visual inspection and through calculation. We can also find the middle case by adding 1 to N and dividing by 2: (N + 1) ÷2. Since N is 5, you calculate (5 + 1) ÷ 2 = 3. The middle case is, thus, the third case (Bob), the median response is “Only Fair.” Jim Poor Jorge Bob Only Fair Sue Good Karen Excellent

21 Median N = 20 (N + 1)/2 21/2 = 10.5

22 The Mean The Mean: Here is formula for calculating the mean
The mean is what most people call the average. It find the mean of any distribution simply add up all the scores and divide by the total number of scores. Here is formula for calculating the mean

23 What’s the most frequent case (Mo)?
Other purposes and Purchase auto because they both have the score of 9. What is the middlemost score (Mdn)? 9, because (N + 1) ÷2 or (6+1)÷2= 3.5 What is the mean ( )? 16, because the sum of the scores is 96 and we divide this by 6 to get 16. Home improvements/ repairs 45 Consolidate debts 26 Other purposes 9 Purchase auto Pay for education or medical 4 Invest in other real estate 3 Total (N = 6) 96

24 So what does this tell us?
The mode is the peak of the curve. The mean is found closest to the tail, where the relatively few extreme cases will be found. The median is found between the mode and mean and is aligned with them in a normal distribution.

25 Measures of Variability
Just what is variability? Variability is the spread or dispersion of scores. Measuring Variability There are a few ways to measure variability and they include: 1) The Range 2) The Mean Deviation 3) The Standard Deviation 4) The Variance

26 Variability Measures of Variability
Range: The range is a measure of the distance between highest and lowest. R= H – L Temperature Example: Range: Honolulu: 89° – 65° 24° Phoenix: 106° – 41° 65°

27 Variance and Standard Deviation
Variance: is a measure of the dispersion of a sample (or how closely the observations cluster around the mean [average]). Also known as the mean of the squared deviations. Standard Deviation: the square root of the variance, is the measure of variation in the observed values (or variation in the clustering around the mean).

28 The Variance The mean of the squared deviations is the same as the variance, and can be symbolized by s2

29 Variance: Weeks on Unemployment:
Step 1: Calculate the Mean Step 2: Calculate Deviation Step 3: Calculate Sum of square Dev Step 4: Calculate the Mean of squared dev. X (weeks) Deviation: (raw score from the mean, squared) Variance: 9 8 6 4 2 1 9-5= 4 8-5=3 6-5=1 4-5=-1 2-5=-3 1-5=-4 42 = 16 32 = 9 12 = 1 -12 = 1 -32 = 9 -42 = 16 (weeks squared) ΣX=30 χ= 30=5

30 Variance: Raw Data

31 What is a standard deviation?
Standard Deviation: It is the typical (standard) difference (deviation) of an observation from the mean. Think of it as the average distance a data point is from the mean, although this is not strictly true.

32 What is a standard deviation?
Standard Deviation: The standard deviation is calculated by taking the square root of the variance.

33 Variance: Weeks on Unemployment:
Step 1: Calculate the Mean Step 2: Calculate Deviation Step 3: Calculate Sum of square Dev Step 4: Calculate the Mean of squared dev. Step 5: Calculate the Square root of the Var. X (weeks) Deviation: (raw score from the mean, squared) Variance: Standard Deviation: (square root of the variance) 9 8 6 4 2 1 9-5= 4 8-5=3 6-5=1 4-5=-1 2-5=-3 1-5=-4 42 = 16 32 = 9 12 = 1 -12 = 1 -32 = 9 -42 = 16 (weeks squared) ΣX=30 χ= 30=5 s = 2.94

34 Standard Error of the Difference between Means
Rarely do we know the standard deviation of the distribution of mean differences. Fortunately, it can be estimated based on two samples that we draw from the same population. This estimation is the standard error of the difference between means. The formula for combines the information from the two samples.

35 Exam 4: Overview Question 5-8: A random sample conducted to test alcohol consumption (drinks per month) differences among public and private high school students. The results are as follows: Private Public mean 8.2 9.7 S (standard dev.) 1.3 1.8 N 55 66

36 5. What is the standard error of the difference between means?

37 We can now use our standard error results to change difference between sample mean into a t ratio:
.293 t = - _-1.5_ t = REMEMBER: We use t instead of z because we do not know the true population standard deviation.

38 In Table C, use a critical value of 40 since 58 is not given.
We aren’t finished yet! Turn to Table C. df = N1 + N2 – 2 = df ( ) = 119 In Table C, use a critical value of 40 since 58 is not given. We see that our t-value of exceeds all the standard critical points. Therefore, based on what we established BEFORE our study, we reject the null hypothesis at the .10, .05, or .01 level. df .20 .10 .05 .02 .01 .001 40 1.303 1.684 2.021 2.423 2.704 3.551

39 Variance: Groups: Sum of Squares
Question 14-16: An addiction researcher is interested in relapses for those who are dependent on alcohol alone, drugs alone, or both. He selects 15 subjects representing each of these groups. The data are as follows: N = 5 N = 5 N = 5 Alcohol Alone Drugs Alone Drugs and Alcohol ΣX (sum of X) 17 9.0 18 Mean 3.4 1.8 3.6 ΣX2 (sum of X squared) 69 19 70

40 Step 1: Find the mean for each sample
Already Have Step 2: Cal. (1) Sum of scores, (2) sum of sq. scores, (3) number of subjs., (4) and total mean 1) = 3.4 = = 44 2) = = 158 3) = 1.8 = = 15 4) = 44 15 = 2.93 = 3.6

41 Sum of squared scores – N total (mean total) squared.
= 158 – (15)(2.93) = 158 – (15)(8.58) = = 29.3 2 Sum of squared scores – N for each group (mean for each group) squared. = 158 – 5(3.4)2 + 5(1.8)2 + 5(3.6)2 = 158 – 5(11.56) + 5(3.24) + 5(12.96) = 158 – = 19.2 Sum (N for each group)(mean for each group) squared – N total (mean total) squared. = 4(6.75)2 + 4(5)2 + 4(5)2 + 4(2.75)2 – 16(4.875)2 = – = – 128.7 = 10.1

42 = - 1 = 3 = = 12

43 = 47.5 12 = 3.95 = 32.25 3 = 10.75

44 = 3.95 = 2.72 2.72 3.49 3, 12 Retain the null hypothesis

45 Nonparametric Tests: Chi-Square
Two Nonparametric Tests: The Chi-Square Test: concerned with the distinction between expected frequencies and observed frequencies.

46 Nonparametric Tests: Chi-Square
Some things to know about chi square: 1) It compares the distribution of one variable (DV) across the category of another variable (IV) 2) It makes comparisons across frequencies rather than mean scores. 3) It is a comparison of what we expect to what we observe. Null versus Research Hypotheses: The research hypotheses states that the populations do not differ with respect to the frequency of occurrence of a given characteristic, whereas a research hypothesis asserts that sample difference reflects population difference in terms of the relative frequency of a given characteristic.

47 Nonparametric Tests: Chi-Square
Chi Square: Example: Political Orientation and Child Rearing Null Hypothesis: The relative frequency or percentage of liberals who are permissive IS the same as the relative frequency of conservatives who are permissive. Research Hypothesis: The relative frequency or percentage of liberals who are permissive is NOT the same as the relative frequency of conservatives who are permissive.

48 Chi Square: Example: Political Orientation and Child Rearing
Nonparametric Tests: Chi Square: Example: Political Orientation and Child Rearing Expected and Observed Frequencies: The chi-square test of significance is defined by Expected and Observed Frequencies. Expected Frequencies (fe) refers to the frequency we would expect to get if the hull hypothesis is true, that is there is no difference between the populations. Observed Frequencies (fo) refers to results we actually obtain when conducting a study (may or may not vary between groups). Only if the difference between expected and observed frequencies is large enough do we reject the null hypothesis and decide that a population difference does exist.

49 Political Orientation Child-Rearing Methods
Nonparametric Tests: Chi Square: Political Orientation and Child Rearing: Observed Frequencies Row Marginal Political Orientation Child-Rearing Methods Liberals Conservatives Total 13 7 Permissive 20 Not Permissive 20 20 20 N = 40 Total Col. Marginal

50 Calculating Expected Frequencies
fe = (column marginal)(row marginal) N Example: fe = (25)(20) 40 = 500 = 12.5

51 fe = (column marginal)(row marginal) N
Example: fe = (25)(20) 40 = 500 = 12.5 fe = (column marginal)(row marginal) N Political Orientation Child-Rearing Methods Liberals Conservatives Total 15 (12.5) 10 (12.5) 5 (7.5) 10 (7.5) Permissive 25 Not Permissive 15 20 20 N = 40 Total The answer is 12.5 (62.5% of 20 or .625 x 20). We then know that the expected frequency for non permissive is 7.5 (20 – 12.5).

52 The Chi-Square Test Formula
Once we have the observed and expected frequencies we can use the following formula to calculate Chi-square. Where: fo = observed frequency in any cell fe = expected frequency in any cell

53 Nonparametric Tests: Chi-Square Tests
Observed Expected Subtract Square Divide by fe Sum After obtaining fo and fe, we subtract fe from fo, square the difference, divide by the fe and then add them up.

54 Nonparametric Tests: Chi-Square Tests
Formula for Finding the Degrees of Freedom df = (r-1)(c-1) Where r = the number of rows of observed frequencies c = the number of columns of observed frequencies

55 Formula for Finding the Degrees of Freedom
Since there are two rows and two columns of observed frequencies in our 2 x 2 table df = (r-1)(c-1) df = (2-1)(2-1) = (1)(1) = 1 Next Step, Table E, where we will find a list of chi-square scores that are significant at .05 and .01 levels. Table E (.05, df = 1): 3.84 Obtained X = 2.66 Retain null 2

56 Yate’s Correction HOWEVER, when working with a 2x2 table where any expected frequency is less than 10 but greater than 5, use Yate’s correction which reduces the difference between the expected and observed frequencies. The vertical indicate that we must reduce the absolute value (ignoring minus signs) of each fo – fe by .5

57 Yate’s Correction Smoking Status Nationality American Canadian
Nonsmokers Smokers 15 (11.67) 6 (9.33) 5 (8.33) 10 (6.67) 20 16 N = 36 Total 21 15 Observed Expected Subtract Subtract .5 Square Divide by fe Sum

58 Regression Analysis Regression Model: Y = a + bX + e
Y = DV: Sentence Length. X = IV: Prior Convictions. a = Y-intercept: base-line: No Priors (What Y is when X = zero). b = Slope (regression coefficient) for X. (Amount that Y changes for each change in one unit of X). e = error term (what is unpredictable).

59 Y-Intercept (baseline) (Regression coefficient)
Regression Analysis Regression Model: How much is Sentence (DV) effected by the number of a defendants prior convictions (IV: Cause)? DV: Effect Y-Intercept (baseline) Slope (Regression coefficient) IV (Cause) Error Term Y = a + bX + e Sentence Length. No Priors. (Y when X=0) Amount Y changes for change in X Number of Priors Unpredictable

60 Regression Analysis: Alternative Method
Regression Model: Y = a + bX + e Calculating each variable: = 4 (mean of priors) = 26 (mean of sentences) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = a [y-intercept] = SP SSx or: b =

61 Regression Analysis Calculating: b [regression coefficient] Y = a + bX + e 300 = 3 100

62 Regression Model Regression Model: Y = a + bX + e
Calculating each variable: = 4 (mean of priors) = 26 (mean of sentences) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = Σ (X – χ)(Y – y) = 300 = 3 Σ(X – χ) a [y-intercept] = – (3)(4) = 14 Y = a + bX + e Y = X

63 Regression Analysis Calculating Regression Coefficient: (Sum of Squares and Sum of Products) Hh


Download ppt "GOVT 201: Statistics for Political Science Spring 2010"

Similar presentations


Ads by Google