Download presentation
Presentation is loading. Please wait.
Published byBelinda Richards Modified over 6 years ago
1
Dr. John V. Richardson, Professor UCLA GSE&IS DIS
UCLA DIS 280 Social Science Research Methodology: Bivariate Models Dr. John V. Richardson, Professor UCLA GSE&IS DIS
2
Bivariate Models Goal and Principles
Definitions, Assumptions, & Interpretations Statistical Tests Chi-square Pearsonian correlations
3
Goal: Relating and Describing
To relate a single variable (or other variables sequentially) to another variable “Describing a relation between two variables basically refers to the ‘form’ or ‘shape’ of the relationship as well as to the ‘closeness’ or ‘degree’ of association.” SOURCE: Costner, p. 343
4
Bivariate Model Principles
AxB design As you read the literature, look for assertions in the following form: Y Ü X (i.e., Y is influenced by something) Y Ü X (ditto) where Y is the dependent variable and X is an independent variable. These are models of relationships between two variables
5
Model Specification and Error
Economy Model specification error: Overly simplistic view of reality
6
The Bivariate Model, Pros/Cons
Advantage: Extremely economical Doesn’t require extensive data collection Sophisticated computer support unnecessary (i.e., paper and pen will do or Excel) Disadvantage: Single variables, one at a time Only includes manifest variables
7
Chi-square Principle “If one cannot use any other method, one can almost always partition or cross-partition subjects.” SOURCE: Kerlinger, Foundations (1973), p. 438
8
Chi-Square Definitions
Contingency table consisting of cells or conditions; rows and columns Cross-classified data (e.g., hair color and eye color). Observed versus expected values. Expected values could come from the literature, but most often it is calculated from the data.
9
Degrees of Freedom: DF or df
DF or df means the degrees of freedom. Defined as the number of categories minus the number of restrictions. The df equals (rows - 1) x (columns - 1) or df= (r - 1) (c - 1) In a two by two contingency table, it is (n-1);
10
Chi-square Assumptions
Individual observations must be independent of each other Chi square must be limited to frequency (or counting) kinds of data; nominal level The sum of the expected frequencies must equal the sum of the actual frequencies Rare categories will automatically necessitate a large sample size.
11
More Chi-square Assumptions
The marginal position must be ³ 5 for each i and j (cell or condition). When DF = 1, then no expected frequency should be less than 5, unless Yates’ correction is applied. A 3x4 contingency table requires 60 cases, then power becomes an especial issue.
12
Chi-square Interpretations
In 2x2 contingency tables, Goodman’s and Kruskal’s tau-b are equal to phi squared (Costner, ASR, p. 351) and could be used as a proportional reduction in error measure In small n’s (N<30), use continuity-adjusted chi-square or Fisher exact test
13
Goal of Correlations (R2)
To expose and describe the dependence of two or more ratio level variables
14
Principle As one variable changes, so does the other
For example, as one variable increases, the other decreases, although not necessarily a one-to-one relationship; aka inversely proportional relationship A measure of association, not causation
15
Relationships (Busha & Harter, 1980)
16
Correlation Definitions
11/21/2018 Correlation Definitions Rxy values range from -1 to +1. For example, or 0.0 or +1.0 In other words, if r=-1 or +1, then the relationship is directly or indirectly linear. If r=o, then there is no relationship. R2 is the variance in X associated with Y.
17
Meaningful Calculations
11/21/2018 Meaningful Calculations In order for the calculated Pearson Correlation value to be meaningful, the underlying data must meet certain assumptions: Paired data; linearity of the data Measures of Normality Skewness and Kurtosis Homoscedasticity (, mean of the arrays) Mean, median, and the mode stack up on top of each other
18
Basic Correlational Assumptions
Data must be paired on same individual, e.g., same library, same book, same user Linearity—examine the data in an XY chart (graph). If they fall on a straight line, then this relationship is called linear--because it has a straight line appearance. If a curvilinear relationship exists, then r underestimates the strength of the relationship; extreme outliers could be dropped.
19
Normal Distribution (Busha & Harter, 1980)
20
Skewness and Kurtosis Skewness and kurtosis affect correlations
Skewness relates to asymmetrical data, meaning that data may be either positively or negatively skewed Kurtosis is the thickness or thinness of tails: lepto- (“peaked”) or short tails while platy- (‘flat”) means long tails. A distribution with the same kurtosis as the normal curve is called mesokurtic.
21
Positively Skewed
22
Negatively Skewed
24
Test for Non-Normality
Values greater than 2 for skewness or for kurtosis suggest non-normality; hence, some type of transformation may be necessary. Even if skewness and kurtosis is under 2, it is still necessary to plot variable(s) and visually check for normal distribution (or, Gaussian curve or distribution; named after K. F. Gauss, mathematician & astronomer) or more illustratively bell-shaped curve
25
More Correlational Assumptions
Homoscedasticity with lots of data (say, 500 cases), the arrays look like a frequency distribution with a normal curve. A standard deviation (s) can be figured; the s should be equal in each of the arrays. If this situation occurs, one can speak of homoscedasticity, meaning that the standard deviation (s) of each column’s arrays are equal.
26
One Last Caution… “The use of Pearson’s product moment-correlation coefficient (r) is not an appropriate method for values which are not bi-variantly normally distributed. Using logarithms for residual levels shows that residues are not normally distributed around a mean value. Non parametric statistics should be used.” [such as Spearman’s rank] SOURCE: Nature 239 (13 October 1972): 410.
27
Correlational Interpretation
In assigning qualitative terms to meaningful correlational values, common practice says: slight is < .20 low is .20 to .40 moderate is .40 to .70 high is .70 to .90 very high is > .90 SOURCE: J. P. Guilford, Fundamental Statistics in Psychology & Education, p.145
28
Correlation Interpretations
Correlations are especially sensitive to outliers “winsorizing the mean” or throwing out 5% on each end Coefficient of Alienation (the square root of 1-r2), an index indicating lack of relationship Index of Forecasting Efficiency (1-square root of 1-r2)*100. For example, if r=.6, then 20% or if r=.9, then 56%.
29
A Correlational Study "Readability and Readership of Journals in Library Science," Journal of Academic Librarianship 3 (March 1977): Paired Data: Readability (Flesch, ratio level) Circulation (ratio level)
30
Correlational Assumptions?
Bivariate Model Y Ü X (i.e., Y (circulation) is influenced by X (readability) Normality of Data What is the skewness? What about kurtosis? Homoscedasticity Interpretation of the Calculated Value Moderate, High, or Very High?
31
Alpha, Power, & Sample Size
32
Proper Role of Statistician
Review of inquiry methodology, especially instrumentation Assist in selection of appropriate statistics Check to make sure that all assumptions of tests are met: including data screening, normality, and any necessary transformations Interpretation of statistical values
33
Questions Does the methodological section indicate that data screening (e.g., missing data, outliers, and checks for normality) will be performed? Are the statistical tests appropriate? Have all the assumptions of these tests been examined? Will these assumptions be met?
34
Just A Gentle Reminder... Remove the disk from the drive now!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.