# Quantitative Research Methods

## Presentation on theme: "Quantitative Research Methods"— Presentation transcript:

Quantitative Research Methods

Overview Intro to quantitative methods A Number
“… the characteristic of an individual by which it is treated as a unit or of a collection by which it is treated in terms of units” A Variable “ A concept or characteristic that contains variation” Measurement “The assignment of numbers to indicate different values of a variable” Techniques or Instruments

Measurement Its purposes Measurement <=> Invalid Research ?
… to provide the basis for the results, conclusions, and significance of the research. Measurement <=> Invalid Research ? … to provide information about the variables that are being studied. Measurement <=> Variables The way in which the numbers are used to describe something determines the amount of information that is communicated A useful classification of this process is referred to as scales of measurement

Measurement Scale N O I R Ratio Interval Ordinal Nominal
Or Levels of Measurement Interval Ordinal Nominal Numbers assigned to categories N Numbers ranked-ordered O + Equal intervals between numbers I + + Numbers expressed as ratios R + +

Descriptive Statistics
Descriptive Statistics: Used to help describe a group of numbers What can we say about the following set of numbers? II III I Frequency distribution: It will indicates how often each score is obtained

Symmetrical Distribution
Measures of Variability Range: “Difference between the highest and lowest score” (28-15) = 13 Standard Deviation: “Average distance of the scores from the mean” SD=4 Symmetrical Distribution Positive Skew Negative Skew (mean, median, mode) (mode, median, mean) (mean, median, mode) Standard deviations x = individual scores M = mean N = number of scores in group -2σ -1σ +1σ +2σ

Pearson Product-Moment Correlation
Correlation: Measure of relationship between two or more quantitative variables Correlation Coefficient: A number between –1 and +1Which indicates the direction and strength of the relationship. Pearson Product-Moment Correlation r= .45

Correlation Some examples

Correlation vs. Causation

Measurement: The basics
Construct = a characteristic that can’t be directly measured, e.g., intelligence Operational definition = a breakdown of what the elements are of that construct (e.g., verbal, quantitative, and analytical ability), or what that construct “looks like” in reality Measure = a numerical representation of part of the construct, e.g., items on an IQ test Measures have to be both reliable and valid

Reliability and Validity
Reliability = consistency of results … No matter when something is measured No matter how it is measured (measured well!) No matter where it is measured Validity = accuracy … Measuring the right thing, and Measuring the thing right!

Reliable AND Valid

Reliable but not Valid

Not Reliable (so not valid either)

Construct Validity Construct validity (checking we have measured the right thing and have measured it right) consists of several elements, including: Content validity – the measure covers everything it needs to cover (e.g., intelligence test covers verbal, quantitative, & analytic abilities) Convergent and discriminant validity – it correlates with other related tests (e.g., other cognitive ability tests), and not with what it shouldn’t be related to (e.g., personality) Criterion-related validity – it predicts what it should predict (e.g., IQ score predicts GPA) Face validity – it looks valid to people

Reliability Internal consistency – all the items (single questions) within a scale (set of items added up) are measuring the same thing Equivalence – different forms of the test generate about the same scores (incl. split-half reliability, Cronbach’s alpha, and some others) Stability, a.k.a. test-retest reliability – people score about the same no matter when they take it (assuming no change has occurred in between)

Norm-referenced vs. criterion-referenced tests
Norm-referenced tests tell you where someone is relative to everyone else, e.g., IQ tests: an IQ of 115 => 84th percentile Criterion-referenced tests tell you whether someone has achieved a certain level of performance, e.g., written driver’s license test, and the test for this class!

Inferential Statistics and Meta-Analysis

Overview What are “inferential statistics”?
Error, confidence intervals, & statistical power Hypothesis testing Some of the basics: t-tests, chi-square, & ANOVA Correlation and regression analyses Synthesizing multiple findings The literature review Meta-analysis

What are “inferential statistics”?
Descriptive statistics Show us how a single variable is distributed (frequency graphs) Show us a picture of the relationship between two variables (correlations) Inferential statistics Allow us to get serious about checking hunches and hypotheses Usually look at the strength of relationship between two or more variables

Errors: Getting the inference wrong
Example: Deciding to let/not let someone into graduate school on the basis of GRE scores Unsuccessful in graduate school Successful in graduate school Low GRE => rejected True negatives False negatives (Type I error) High GRE => accepted False positives (Type II error) True positives

Confidence intervals 1: “margin of error” for an individual score
68% 95% -3sd -2sd -1sd +1sd +2sd +3sd 99% If we took a random individual from this population, there is a 95% chance that that person’s score will fall between –2sd and +2sd. For IQ, that’s a 95% chance of being between 70 and 130.

Confidence intervals 2: means
Distribution of IQ scores in the “normal population”: 95% of the individual scores lie between 70 and 130 (SD = 15) 55 70 85 100 115 130 145 If we took 100,000 groups of nine (9) people each, the mean IQs of those groups would be distributed like this  i.e., 95% of the means would lie between 90 and 110 (SE = 15/(SQRT(9)) = 5) 90 100 110 85 95 105 115

Example: Intelligence (IQ)
“The normal population” Mean = 100 A sample of 9 Michigan residents Mean = 108 55 70 85 100 115 130 145 Question: Are Michigan folks unusually smart, or did we just accidentally end up with some particularly smart people in the sample?

Hypothesis testing We need to test two alternate hypotheses:
No cause for alarm, America – Michigan folks are just like other regular folks (i.e., the mean for this sample is not that ‘off-the-wall’) The “null hypothesis” (H0) Holy guacamole – it looks like Michigan folks really are more brilliant than the rest (i.e., the mean for this sample is wa~~y out there)!! The “alternative hypothesis” (H1)

Is our group significantly different?
Remember: If we took 100,000 groups of nine (9) people each, the means of those groups would be distributed like this  i.e., 95% of the means would lie between 90 and 110 (SE = 15/(SQRT(9)) = 5) OK, so is our mean of 108 unusually high? No! Because it’s inside the 95% range (90 to 110). => We “fail to reject the null hypothesis.” 95% 90 100 110

What if we had a bigger sample?
If we sample groups of 9 people, 95% of the means for those groups fall between IQ 90 and 110 95% 90 100 110 If we sample groups of 25 people, 95% of the means for those groups fall between IQ 94 and 106 (narrower) 95% 94 100 106 => If our MI sample had been of 25 people with a mean IQ of 108, we could have been 95% certain that Michigan people were smarter (we’d have had more power). But with a sample of only 9, we just couldn’t be certain enough (even though it looked likely).

Why do we want to be 95% sure?
Consider the trade-off we make between errors: We really are about the same We really are smarter We conclude MI folks are just like other Americans True negative (we are the same, and we got that right) False negative (we really are smarter, but didn’t figure it out – how smart is that??) We conclude MI folks are smarter than rest of USA False positive (we are really the same, but inferred we were smarter) True positive (we are smarter, and we got that right!)

Statistical vs. Practical Significance
There’s a flip-side to the statistical power issue … with a big sample size, you can detect “statistically significant” effects that are trivial in the real world. Example: the Headstart program Tens of thousands of children A “statistically significant” effect => many researchers claimed “it worked”! The size of the change was trivial

Choosing the right statistical test: 1
Figure out which is/are your independent variable(s) These are the “predictors” or the things that go on the X-axis of a graph Figure out which is your dependent variable This is usually the main thing you are interested in, the outcome, the thing that goes on the Y-axis of a graph

Example: IVs and DVs

Choosing the right statistical test: 2
The type of test we use depends on the types of variables we have: Categorical (multiple categories) Caucasian vs. African American vs. Hispanic vs. … Group 1 vs. 2 vs. 3 Continuous (interval & ratio scales) Age Test scores Shoe sizes Dichotomous (just two options) Male vs. female Experimental group vs. control group Pretest vs. posttest time

Choosing the right test: 3
DV is  Dichotomous Categorical Continuous IV is/are: Dichot-omous Chi-square t-test Cate-gorical ANOVA Contin-uous Discriminant function analysis Correlation or regression

Synthesizing Studies: Two Methods
The literature review: Reviewing, summarizing, and critiquing the main studies in a particular area, and drawing a conclusion about the strength of the evidence over multiple studies Use when many of the best studies in the area are qualitative, or when there are not enough quant studies for a meta-analysis Meta-analysis A statistical technique for combining information about “effect sizes” to come to an overall conclusion about the strength of the evidence over multiple studies Use when there are many quant studies out there already, some with conflicting results; use to look for meta-effects (e.g., type of sample, type of intervention, etc)

Experimental and Mixed Methods …
.

Overview Experimental methods Using multiple methods
Why use experimental methods? Ruling out rival explanations Some useful experimental designs Using multiple methods Balancing weaknesses in methods Uses of multiple methods

Why use experimental methods?
Main point = to nail down causality Causality involves ruling out rival explanations for the effects observed, e.g., New kind of hearing aid Higher math test scores in hearing-impaired students

What kinds of rival explanations?
What else could have accounted for the increase in math scores of students using the new hearing aid? The students were better on the posttest because of the practice they got on the pretest – testing/practice effect The students tested happened to score a bit low on the day of the pretest, so the “improvement” was just the posttest moving closer to the average – regression to the mean

Rival explanations (contd) …
The approach to teaching math changed between the pretest and the posttest – history The lowest-performing students were absent the day of the posttest – mortality Students of that age naturally get better at math at about that age – maturation Using the new hearing aid needed parental consent, and only those parents with a strong interest in their child’s academic performance consented - selection

Rival explanations (contd) …
Students using the hearing aid felt this was special treatment, so tried harder – Hawthorne effect The hearing aid is novel, so the students feel excited and more motivated about listening (though the novelty wears off later on, after the posttest) – novelty effect The teacher expected to see better results among these students, and subconsciously tended to grade their answers more favorably – researcher expectancy effect

Simple experimental designs
X = treatment, C = control, O = test/measure XO = posttest only (no pretest) OXO = pretest and posttest OXO/OCO = pretest and posttest on both an experimental and a nonequivalent control group Hard to rule out any of the rival explanations Rules out selection and mortality Rules out several rival explanations, but weaker because control group not equivalent

Randomized designs R: OXO OCO R: XO CO XO
randomly assigned experimental and control groups pre & post posttest only on randomly assigned exp’tal and control grps Solomon four-group design: two experimental and two control groups; half pretested, all posttested

What should a control group be?
Think of the practical question you need to answer with your research, e.g., Is this treatment/method better than nothing? Is it better than what we are using now? Consider the option of using a “Placebo” In drug studies, this is the “sugar pill” In experimental designs, it is an intervention that is not expected to affect the DV, but make sure the control group doesn’t feel “left out” of the experimental group.

Examples without control groups
Suppose you wanted to see if phonics training in kindergarten improved students’ ability to read in first grade … What would a posttest only design with no control group (XO) design entail (i.e., what ,whom, and when would you test)? How about a pretest-posttest design with no control group (OXO) entail?

Examples with control groups
Suppose you wanted to see if phonics training in kindergarten improved students’ ability to read in first grade … What would a pretest-posttest design with a control group (OXO/OCO) entail? How about a posttest only design with a control (XO/CO)? How would randomization help with each of the above? Is it practically feasible?

The Solomon four-group design
Suppose you wanted to see if phonics training in kindergarten improved students’ ability to read in first grade … How would you set up a randomized Solomon four-group design? R: OXO OCO XO CO

Using multiple methods
What are the weaknesses of qual & quant methods? Quantitative ________________ Qualitative ________________

Complementary multiplism
All research methods have weaknesses Complementary multiplism (a.k.a. critical multiplism) is the practice of deliberately choosing complementary methods with different weaknesses, so that the strengths of one make up for the weaknesses of another

Uses of mixed methods To bring “dry” statistics alive
To dig into puzzling results and try to understand them better To “triangulate” by getting multiple perspectives on one issue If qual and quant data point in the same direction, you can be more certain that your results are robust If they tell you something different, it’s time to dig again!

Similar presentations