GOVT 201: Statistics for Political Science Spring 2010

Slides:



Advertisements
Similar presentations
STATISTICAL ANALYSIS. Your introduction to statistics should not be like drinking water from a fire hose!!
Advertisements

Bivariate Analysis Cross-tabulation and chi-square.
Hypothesis Testing IV Chi Square.
Statistical Tests Karen H. Hagglund, M.S.
Measures of Central Tendency Levin and Fox Elementary Statistics In Social Research Chapter 3 1.
Methods and Measurement in Psychology. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Measures of Variability
Data observation and Descriptive Statistics
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Measures of Central Tendency
Inferential Statistics
Today: Central Tendency & Dispersion
Understanding Research Results
Statistical Analysis I have all this data. Now what does it mean?
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Chapter Eleven A Primer for Descriptive Statistics.
Measures of Central Tendency and Dispersion Preferred measures of central location & dispersion DispersionCentral locationType of Distribution SDMeanNormal.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
Measures of Central Tendency Monroe and Levin and Fox Elementary Statistics In Social Research Chapter 3 1.
© 2006 McGraw-Hill Higher Education. All rights reserved. Numbers Numbers mean different things in different situations. Consider three answers that appear.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Copyright © 2014 by Nelson Education Limited. 3-1 Chapter 3 Measures of Central Tendency and Dispersion.
Psychology 101. Statistics THE DESCRIPTION, ORGANIZATION AND INTERPRATATION OF DATA.
Central Tendency & Dispersion
1 Chapter 9 Nonparametric Tests of Significance. 2 Power of a Test To understand the important position of nonparametric tests in social research, we.
Testing Differences between Means, continued Statistics for Political Science Levin and Fox Chapter Seven.
Nonparametric Tests of Significance Statistics for Political Science Levin and Fox Chapter Nine Part One.
Exam 2: Review G 201 Statistics for Political Science 1.
Chapter Eight: Using Statistics to Answer Questions.
Exam 1 Review GOVT 120. Review: Levels of Analysis Theory: Concept 1 is related to Concept 2 Hypothesis: Variable 1 (IV) is related to Variable 2 (DV)
Data Analysis.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Statistics for Political Science Levin and Fox Chapter Seven
Anthony J Greene1 Central Tendency 1.Mean Population Vs. Sample Mean 2.Median 3.Mode 1.Describing a Distribution in Terms of Central Tendency 2.Differences.
1 Chapter 9 Nonparametric Tests of Significance. 2 Introduction  A test of significance, such as the t-test, is referred to as a parametric test when.
Dr.Rehab F.M. Gwada. Measures of Central Tendency the average or a typical, middle observed value of a variable in a data set. There are three commonly.
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Outline Sampling Measurement Descriptive Statistics:
A QUANTITATIVE RESEARCH PROJECT -
GOVT 201: Statistics for Political Science
I. ANOVA revisited & reviewed
Statistical analysis.
Doc.RNDr.Iveta Bedáňová, Ph.D.
Chapter 9: Non-parametric Tests
Exam 1 Review GOVT 120.
Inference and Tests of Hypotheses
Statistical analysis.
Statistics: The Z score and the normal distribution
Measures of Central Tendency
Hypothesis Testing Review
AP Biology Intro to Statistics
Exam 1 Review GOVT 120.
Qualitative data – tests of association
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Introduction to Statistics
Hypothesis testing. Chi-square test
Exam 5 Review GOVT 201.
MEASURES OF CENTRAL TENDENCY
Contingency Tables (cross tabs)
Chapter 3: Central Tendency
15.1 The Role of Statistics in the Research Process
Descriptive Statistics
Chapter Nine: Using Statistics to Answer Questions
Chapter Fifteen Frequency Distribution, Cross-Tabulation, and
Skills 5. Skills 5 Standard deviation What is it used for? This statistical test is used for measuring the degree of dispersion. It is another way.
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Presentation transcript:

GOVT 201: Statistics for Political Science Spring 2010 Final Exam: Review GOVT 201: Statistics for Political Science Spring 2010

Final Exam: Review Topics 1. Nominal, Ordinal, Interval data 2. Frequency Distributions, Column and Row Percentages 3. Mode, Median, Mean 4. Deviation, Variance, Standard Deviation 5. Probability 6. Random samples 7. T Score 8. F Ratio 9. Chi-Square tests 10. Regression Analysis

The Nature of Social Science Research Using Numbers in Political Research: Levels of Measurement Numbers serve important functions for researchers, depending on the level of measurement employed. Nominal: Refers to discrete or mutually exclusive categories. Individual cases can only fit into one category at a time. Used to classify, categorize or label. Example: party affiliation, voter, non-voter. Ordinal: Involves the ranking or ordering of cases in terms of the degree to which they possess a certain characteristic. Example: Social class, Measurements of attitudes. Interval-Ratio: Measurements for all cases are expressed in the same units. There are equal intervals between points on a scale and either a real or theoretical zero point. Example: Income, temperature, SAT scores, weight

The Nature of Social Science Research Levels of Measurement: Limitations Nominal: cannot indicate grade, ranking, a quality scale (better or worse), higher or lower, more or less. It is simply a label. Ordinal Data: provides a ranking, but not a magnitude of difference between numbers, or points on a scale. Intervals between points on a scale are not known: Teeth Cleaning --------------Filling----Root Canal 1 2 3 4 5 Difference?

The Nature of Social Science Research Levels of Measurement: Strengths Interval-Ratio: Allows you to indicate the order of categories, but also the exact differences between them. Uses constant units of measurement with equal intervals between them. Temperature: 80------------90------------100 Difference?

The Nature of Social Science Research Levels of Measurement: Strengths Difference between Interval-Ratio: Interval: artificial zero point: Zero Degree: cold, but a temperature. Ratio: absolute or true zero point: Zero Age: Birth. Interval-Ratio: Can Be Natural or Invented. Some variables in their natural form are interval level (weight, number of siblings you have, hours you watch TV per day). Others become interval because we scale them.

Column Cross-Tabulation: Row Seat Beat Use by Gender with Total Percents SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 14.4% 66 6.6% 58 5.8% 39 3.9% 60 6.0% 355 35.6% 110 11.0% 44 4.4% 55 5.5% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 367 630 997 36.8% 63.2% 100.0% Row

Marginal Cross-Tabulation: Marginal Seat Beat Use by Gender with Total Percents SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 14.4% 66 6.6% 58 5.8% 39 3.9% 60 6.0% 355 35.6% 110 11.0% 44 4.4% 55 5.5% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 367 630 997 36.8% 63.2% 100.0% Marginal

Cross-Tabulation: Seat Beat Use by Gender with Total Percents (Table 2.16) SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 14.4% 66 6.6% 58 5.8% 39 3.9% 60 6.0% 355 35.6% 110 11.0% 44 4.4% 55 5.5% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 367 630 997 36.8% 63.2% 100.0%

Cross-Tabulation: Seat Beat Use by Gender with Row Percents (Table 2.17) SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 28.9% 66 37.5% 58 46.8% 39 47.0% 60 52.2% 355 71.1% 110 62.5% 53.2% 44 53.0% 55 47.8% 499 100.0% 176 124 83 115 Total 367 630 997 36.8% 63.2% 100.0%

Cross-Tabulation: Seat Beat Use by Gender with Column Percents (Table 2.18) SB Use Male Female Total All the Time Most of the Time Some of the Time Seldom Never 144 39.2% 66 18.0% 58 15.8% 39 10.6% 60 16.3% 355 56.3% 110 17.5% 10.5% 44 7.0% 55 8.7% 499 50.1% 176 17.7% 124 12.4% 83 8.3% 115 11.5% Total 367 630 997 36.8% 63.2% 100.0%

Cross-Tabulation: Choosing among Total, Row and Column Percents When determining which percent to use, the rule of thumb is: when the IV is on the row, use row percents, when the IV is on the columns, use column percents. Determining the IV and DV It is not always easy to determine which variable is the Independent Variable and which is the Dependent Variable in a cross-tab. Thus, when in doubt use total percents.

Cross-Tabulation: (Table 2.19) Wife Dem Rep Total Husband Democrat Republican 70 70.0% 63.6% 36.6% 40 44.4% 36.4% 21.1% 30 30.0% 37.5% 15.8% 50 55.6% 62.5% 26.3% 100 52.6% 90 47.4 Total 110 80 190 57.9% 42.1% 100.0% f Row% Col% Total %

Measures of central tendency: Measures of central tendency are numbers that describe what is average or typical in a distribution We will focus on three measures of central tendency: The Mode The Median The Mean (average) Our choice of an appropriate measure of central tendency depends on three factors: (a) the level of measurement, (b) the shape of the distribution, (c) the purpose of the research.

The Mode The Mode: The mode is the most frequent, most typical or most common value or category in a distribution. Example: There are more protestants in the US than people of any other religion. The mode is always a category or score, not a frequency. The mode is not necessarily the category with the majority (that is, 50% or more) of cases. It is simply the category in which the largest number (or proportion) of cases falls.

Example of a Bimodal Frequency Distribution

The Median The Median: The median is the score that divides the distribution into two equal parts so that half of the cases are above it and half are below it. The median can be calculated for both ordinal and interval levels of measurement, but not for nominal data. It must be emphasized that the median is the exact middle of a distribution. So, now let’s look at ways we can find the median in sorted data:

The Mode and Median The Median: The Mode: - Divides the distribution into two equal (exact middle 50% above and below) The median can be calculated for both ordinal and interval levels of measurement, but not for nominal data. Need to sort data to calculate The Mode: Most frequent or most common value or category. category or score (not a frequency.) not necessarily majority Used to describe nominal variables!

In some cases, we can find the median by simple inspection. Poor Jim Good Sue Only Fair Bob Jorge Excellent Karen Total (N) 5 Let’s look at the responses (A) to the question: “Think about the economy, how would you rate economic conditions in the country today?” First, we arrange the responses (B) in order from lowest to highest (or highest to lowest). Since we have an odd number of cases, let’s find the middle case. A Poor Jim Jorge Only Fair Bob Good Sue Excellent Karen Total (N) 5 B

Calculating the median: We can find the median through visual inspection and through calculation. We can also find the middle case by adding 1 to N and dividing by 2: (N + 1) ÷2. Since N is 5, you calculate (5 + 1) ÷ 2 = 3. The middle case is, thus, the third case (Bob), the median response is “Only Fair.” Jim Poor Jorge Bob Only Fair Sue Good Karen Excellent

Median 2 3 4 5 5 5 6 6 6 6 7 7 7 8 8 9 9 10 10 N = 20 (N + 1)/2 21/2 = 10.5   1 2 3 4 5 5 5 6 6 6 6 7 7 7 8 8 9 9 10 10 1 2 3 4 5 6 7 8 9 10 10.5

The Mean The Mean: Here is formula for calculating the mean The mean is what most people call the average. It find the mean of any distribution simply add up all the scores and divide by the total number of scores. Here is formula for calculating the mean

What’s the most frequent case (Mo)? Other purposes and Purchase auto because they both have the score of 9. What is the middlemost score (Mdn)? 9, because (N + 1) ÷2 or (6+1)÷2= 3.5 What is the mean ( )? 16, because the sum of the scores is 96 and we divide this by 6 to get 16. Home improvements/ repairs 45 Consolidate debts 26 Other purposes 9 Purchase auto Pay for education or medical 4 Invest in other real estate 3 Total (N = 6) 96

So what does this tell us? The mode is the peak of the curve. The mean is found closest to the tail, where the relatively few extreme cases will be found. The median is found between the mode and mean and is aligned with them in a normal distribution.

Measures of Variability Just what is variability? Variability is the spread or dispersion of scores. Measuring Variability There are a few ways to measure variability and they include: 1) The Range 2) The Mean Deviation 3) The Standard Deviation 4) The Variance

Variability Measures of Variability Range: The range is a measure of the distance between highest and lowest. R= H – L Temperature Example: Range: Honolulu: 89° – 65° 24° Phoenix: 106° – 41° 65°

Variance and Standard Deviation Variance: is a measure of the dispersion of a sample (or how closely the observations cluster around the mean [average]). Also known as the mean of the squared deviations. Standard Deviation: the square root of the variance, is the measure of variation in the observed values (or variation in the clustering around the mean).

The Variance The mean of the squared deviations is the same as the variance, and can be symbolized by s2

Variance: Weeks on Unemployment: Step 1: Calculate the Mean Step 2: Calculate Deviation Step 3: Calculate Sum of square Dev Step 4: Calculate the Mean of squared dev. X (weeks) Deviation: (raw score from the mean, squared) Variance: 9 8 6 4 2 1 9-5= 4 8-5=3 6-5=1 4-5=-1 2-5=-3 1-5=-4 42 = 16 32 = 9 12 = 1 -12 = 1 -32 = 9 -42 = 16 (weeks squared) ΣX=30 χ= 30=5

Variance: Raw Data

What is a standard deviation? Standard Deviation: It is the typical (standard) difference (deviation) of an observation from the mean. Think of it as the average distance a data point is from the mean, although this is not strictly true.

What is a standard deviation? Standard Deviation: The standard deviation is calculated by taking the square root of the variance.

Variance: Weeks on Unemployment: Step 1: Calculate the Mean Step 2: Calculate Deviation Step 3: Calculate Sum of square Dev Step 4: Calculate the Mean of squared dev. Step 5: Calculate the Square root of the Var. X (weeks) Deviation: (raw score from the mean, squared) Variance: Standard Deviation: (square root of the variance) 9 8 6 4 2 1 9-5= 4 8-5=3 6-5=1 4-5=-1 2-5=-3 1-5=-4 42 = 16 32 = 9 12 = 1 -12 = 1 -32 = 9 -42 = 16 (weeks squared) ΣX=30 χ= 30=5 s = 2.94

Standard Error of the Difference between Means Rarely do we know the standard deviation of the distribution of mean differences. Fortunately, it can be estimated based on two samples that we draw from the same population. This estimation is the standard error of the difference between means. The formula for combines the information from the two samples.

Exam 4: Overview Question 5-8: A random sample conducted to test alcohol consumption (drinks per month) differences among public and private high school students. The results are as follows: Private Public mean 8.2 9.7 S (standard dev.) 1.3 1.8 N 55 66

5. What is the standard error of the difference between means?

We can now use our standard error results to change difference between sample mean into a t ratio: .293 t = - _-1.5_ t = - 5.11 REMEMBER: We use t instead of z because we do not know the true population standard deviation.

In Table C, use a critical value of 40 since 58 is not given. We aren’t finished yet! Turn to Table C. df = N1 + N2 – 2 = df (55 + 66 - 2) = 119 In Table C, use a critical value of 40 since 58 is not given. We see that our t-value of -5.11 exceeds all the standard critical points. Therefore, based on what we established BEFORE our study, we reject the null hypothesis at the .10, .05, or .01 level. df .20 .10 .05 .02 .01 .001 40 1.303 1.684 2.021 2.423 2.704 3.551

Variance: Groups: Sum of Squares Question 14-16: An addiction researcher is interested in relapses for those who are dependent on alcohol alone, drugs alone, or both. He selects 15 subjects representing each of these groups. The data are as follows: N = 5 N = 5 N = 5 Alcohol Alone Drugs Alone Drugs and Alcohol ΣX (sum of X) 17 9.0 18 Mean 3.4 1.8 3.6 ΣX2 (sum of X squared) 69 19 70

Step 1: Find the mean for each sample Already Have Step 2: Cal. (1) Sum of scores, (2) sum of sq. scores, (3) number of subjs., (4) and total mean 1) = 3.4 = 17 + 9.0 + 18 = 44 2) = 69 + 19 + 70 = 158 3) = 1.8 = 5 + 5 + 5 = 15 4) = 44 15 = 2.93 = 3.6

Sum of squared scores – N total (mean total) squared. = 158 – (15)(2.93) = 158 – (15)(8.58) = 158 - 128.7 = 29.3 2 Sum of squared scores – N for each group (mean for each group) squared. = 158 – 5(3.4)2 + 5(1.8)2 + 5(3.6)2 = 158 – 5(11.56) + 5(3.24) + 5(12.96) = 158 – 57.8 + 16.2 + 64.8 = 19.2 Sum (N for each group)(mean for each group) squared – N total (mean total) squared. = 4(6.75)2 + 4(5)2 + 4(5)2 + 4(2.75)2 – 16(4.875)2 = 182.25 + 200 + 30.25 – 380.25 = 138.8 – 128.7 = 10.1

= - 1 = 3 = 16 - 4 = 12

= 47.5 12 = 3.95 = 32.25 3 = 10.75

= 10.75 3.95 = 2.72 2.72 3.49 3, 12 Retain the null hypothesis

Nonparametric Tests: Chi-Square Two Nonparametric Tests: The Chi-Square Test: concerned with the distinction between expected frequencies and observed frequencies.

Nonparametric Tests: Chi-Square Some things to know about chi square: 1) It compares the distribution of one variable (DV) across the category of another variable (IV) 2) It makes comparisons across frequencies rather than mean scores. 3) It is a comparison of what we expect to what we observe. Null versus Research Hypotheses: The research hypotheses states that the populations do not differ with respect to the frequency of occurrence of a given characteristic, whereas a research hypothesis asserts that sample difference reflects population difference in terms of the relative frequency of a given characteristic.

Nonparametric Tests: Chi-Square Chi Square: Example: Political Orientation and Child Rearing Null Hypothesis: The relative frequency or percentage of liberals who are permissive IS the same as the relative frequency of conservatives who are permissive. Research Hypothesis: The relative frequency or percentage of liberals who are permissive is NOT the same as the relative frequency of conservatives who are permissive.

Chi Square: Example: Political Orientation and Child Rearing Nonparametric Tests: Chi Square: Example: Political Orientation and Child Rearing Expected and Observed Frequencies: The chi-square test of significance is defined by Expected and Observed Frequencies. Expected Frequencies (fe) refers to the frequency we would expect to get if the hull hypothesis is true, that is there is no difference between the populations. Observed Frequencies (fo) refers to results we actually obtain when conducting a study (may or may not vary between groups). Only if the difference between expected and observed frequencies is large enough do we reject the null hypothesis and decide that a population difference does exist.

Political Orientation Child-Rearing Methods Nonparametric Tests: Chi Square: Political Orientation and Child Rearing: Observed Frequencies Row Marginal Political Orientation Child-Rearing Methods Liberals Conservatives Total 13 7 Permissive 20 Not Permissive 20 20 20 N = 40 Total Col. Marginal

Calculating Expected Frequencies fe = (column marginal)(row marginal) N Example: fe = (25)(20) 40 = 500 = 12.5

fe = (column marginal)(row marginal) N Example: fe = (25)(20) 40 = 500 = 12.5 fe = (column marginal)(row marginal) N Political Orientation Child-Rearing Methods Liberals Conservatives Total 15 (12.5) 10 (12.5) 5 (7.5) 10 (7.5) Permissive 25 Not Permissive 15 20 20 N = 40 Total The answer is 12.5 (62.5% of 20 or .625 x 20). We then know that the expected frequency for non permissive is 7.5 (20 – 12.5).

The Chi-Square Test Formula Once we have the observed and expected frequencies we can use the following formula to calculate Chi-square. Where: fo = observed frequency in any cell fe = expected frequency in any cell

Nonparametric Tests: Chi-Square Tests Observed Expected Subtract Square Divide by fe Sum After obtaining fo and fe, we subtract fe from fo, square the difference, divide by the fe and then add them up.

Nonparametric Tests: Chi-Square Tests Formula for Finding the Degrees of Freedom df = (r-1)(c-1) Where r = the number of rows of observed frequencies c = the number of columns of observed frequencies

Formula for Finding the Degrees of Freedom Since there are two rows and two columns of observed frequencies in our 2 x 2 table df = (r-1)(c-1) df = (2-1)(2-1) = (1)(1) = 1 Next Step, Table E, where we will find a list of chi-square scores that are significant at .05 and .01 levels. Table E (.05, df = 1): 3.84 Obtained X = 2.66 Retain null 2

Yate’s Correction HOWEVER, when working with a 2x2 table where any expected frequency is less than 10 but greater than 5, use Yate’s correction which reduces the difference between the expected and observed frequencies. The vertical indicate that we must reduce the absolute value (ignoring minus signs) of each fo – fe by .5

Yate’s Correction Smoking Status Nationality American Canadian Nonsmokers Smokers 15 (11.67) 6 (9.33) 5 (8.33) 10 (6.67) 20 16 N = 36 Total 21 15 Observed Expected Subtract Subtract .5 Square Divide by fe Sum

Regression Analysis Regression Model: Y = a + bX + e Y = DV: Sentence Length. X = IV: Prior Convictions. a = Y-intercept: base-line: No Priors (What Y is when X = zero). b = Slope (regression coefficient) for X. (Amount that Y changes for each change in one unit of X). e = error term (what is unpredictable).

Y-Intercept (baseline) (Regression coefficient) Regression Analysis Regression Model: How much is Sentence (DV) effected by the number of a defendants prior convictions (IV: Cause)? DV: Effect Y-Intercept (baseline) Slope (Regression coefficient) IV (Cause) Error Term Y = a + bX + e Sentence Length. No Priors. (Y when X=0) Amount Y changes for change in X Number of Priors Unpredictable

Regression Analysis: Alternative Method Regression Model: Y = a + bX + e Calculating each variable: = 4 (mean of priors) = 26 (mean of sentences) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = a [y-intercept] = SP SSx or: b =

Regression Analysis Calculating: b [regression coefficient] Y = a + bX + e 300 = 3 100

Regression Model Regression Model: Y = a + bX + e Calculating each variable: = 4 (mean of priors) = 26 (mean of sentences) Y = DV: Sentence Length. X = IV: Prior Convictions. b [regression coefficient] = Σ (X – χ)(Y – y) = 300 = 3 Σ(X – χ)2 100 a [y-intercept] = 26 – (3)(4) = 14 Y = a + bX + e Y = 14 + 3X

Regression Analysis Calculating Regression Coefficient: (Sum of Squares and Sum of Products) Hh