Presentation on theme: "Quantitative Research Methods. Overview ? Intro to quantitative methods A Number “… the characteristic of an individual by which it is treated as a."— Presentation transcript:
Overview ? Intro to quantitative methods A Number “… the characteristic of an individual by which it is treated as a unit or of a collection by which it is treated in terms of units” A Variable “ A concept or characteristic that contains variation” Measurement “The assignment of numbers to indicate different values of a variable” Techniques or Instruments
Measurement ? Its purposes … to provide the basis for the results, conclusions, and significance of the research. Measurement Invalid Research ? … to provide information about the variables that are being studied. Measurement Variables The way in which the numbers are used to describe something determines the amount of information that is communicated A useful classification of this process is referred to as scales of measurement
Measurement Scale Nominal Ordinal Interval Ratio Or Levels of Measurement Numbers assigned to categories N O I R +“ Numbers ranked-ordered ““++ Equal intervals between numbers Numbers expressed as ratios ““++“
Descriptive Statistics Descriptive Statistics: Used to help describe a group of numbers 15 24 28 25 18 24 27 16 20 22 23 18 22 28 19 16 22 26 15 26 24 21 19 27 16 23 26 25 18 27 17 20 19 25 23 What can we say about the following set of numbers? Frequency distribution: It will indicates how often each score is obtained II III I III II I III II 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Measures of Variability Range: “Difference between the highest and lowest score” Standard Deviation: “Average distance of the scores from the mean” (28-15) = 13 SD=4 Symmetrical Distribution Positive SkewNegative Skew 0 +1σ -1σ+2σ -2σ Standard deviations x = individual scores M = mean N = number of scores in group (mean, median, mode)(mode, median, mean) (mean, median, mode)
Correlation Correlation: Measure of relationship between two or more quantitative variables Correlation Coefficient: A number between –1 and +1Which indicates the direction and strength of the relationship. r=.45 Pearson Product- Moment Correlation
Correlation vs. Causation Shoe size Reading Level
Measurement: The basics ? Construct = a characteristic that can’t be directly measured, e.g., intelligence ? Operational definition = a breakdown of what the elements are of that construct (e.g., verbal, quantitative, and analytical ability), or what that construct “looks like” in reality ? Measure = a numerical representation of part of the construct, e.g., items on an IQ test Measures have to be both reliable and valid
Reliability and Validity ? Reliability = consistency of results … No matter when something is measured No matter how it is measured (measured well!) No matter where it is measured ? Validity = accuracy … Measuring the right thing, and Measuring the thing right!
Construct Validity Construct validity (checking we have measured the right thing and have measured it right) consists of several elements, including: ? Content validity – the measure covers everything it needs to cover (e.g., intelligence test covers verbal, quantitative, & analytic abilities) ? Convergent and discriminant validity – it correlates with other related tests (e.g., other cognitive ability tests), and not with what it shouldn’t be related to (e.g., personality) ? Criterion-related validity – it predicts what it should predict (e.g., IQ score predicts GPA) ? Face validity – it looks valid to people
Reliability ? Internal consistency – all the items (single questions) within a scale (set of items added up) are measuring the same thing ? Equivalence – different forms of the test generate about the same scores (incl. split-half reliability, Cronbach’s alpha, and some others) ? Stability, a.k.a. test-retest reliability – people score about the same no matter when they take it (assuming no change has occurred in between)
Norm-referenced vs. criterion- referenced tests ? Norm-referenced tests tell you where someone is relative to everyone else, e.g., IQ tests: an IQ of 115 => 84 th percentile ? Criterion-referenced tests tell you whether someone has achieved a certain level of performance, e.g., written driver’s license test, and the test for this class!
Overview What are “ inferential statistics ” ? ? Error, confidence intervals, & statistical power ? Hypothesis testing ? Some of the basics: t-tests, chi-square, & ANOVA Correlation and regression analyses ? Synthesizing multiple findings The literature review Meta-analysis
What are “ inferential statistics ” ? ? Descriptive statistics Show us how a single variable is distributed (frequency graphs) Show us a picture of the relationship between two variables (correlations) ? Inferential statistics Allow us to get serious about checking hunches and hypotheses Usually look at the strength of relationship between two or more variables
Errors: Getting the inference wrong Unsuccessful in graduate school Successful in graduate school Low GRE => rejected True negatives False negatives (Type I error) High GRE => accepted False positives (Type II error) True positives Example: Deciding to let/not let someone into graduate school on the basis of GRE scores
Confidence intervals 1: “ margin of error ” for an individual score -2sd-1sd0+1sd+2sd-3sd+3sd 95% 99% 68% If we took a random individual from this population, there is a 95% chance that that person’s score will fall between –2sd and +2sd. For IQ, that’s a 95% chance of being between 70 and 130.
If we took 100,000 groups of nine (9) people each, the mean IQs of those groups would be distributed like this i.e., 95% of the means would lie between 90 and 110 (SE = 15/(SQRT(9)) = 5) Confidence intervals 2: means Distribution of IQ scores in the “normal population”: 95% of the individual scores lie between 70 and 130 (SD = 15) 708510011513055145 90 95 100 105 110 85115
Example: Intelligence (IQ) “The normal population” Mean = 100 708510011513055145 A sample of 9 Michigan residents Mean = 108 Question: Are Michigan folks unusually smart, or did we just accidentally end up with some particularly smart people in the sample?
Hypothesis testing ? We need to test two alternate hypotheses: 1. No cause for alarm, America – Michigan folks are just like other regular folks (i.e., the mean for this sample is not that ‘ off-the- wall ’ ) The “ null hypothesis ” (H 0 ) 2. Holy guacamole – it looks like Michigan folks really are more brilliant than the rest (i.e., the mean for this sample is wa~~y out there)!! The “ alternative hypothesis ” (H 1 )
Is our group significantly different? 90100110 Remember: If we took 100,000 groups of nine (9) people each, the means of those groups would be distributed like this i.e., 95% of the means would lie between 90 and 110 (SE = 15/(SQRT(9)) = 5) OK, so is our mean of 108 unusually high? No! Because it’s inside the 95% range (90 to 110). => We “fail to reject the null hypothesis.” 95%
What if we had a bigger sample? 90100110 95% 94100106 95% If we sample groups of 9 people, 95% of the means for those groups fall between IQ 90 and 110 If we sample groups of 25 people, 95% of the means for those groups fall between IQ 94 and 106 (narrower) => If our MI sample had been of 25 people with a mean IQ of 108, we could have been 95% certain that Michigan people were smarter (we’d have had more power). But with a sample of only 9, we just couldn’t be certain enough (even though it looked likely).
Why do we want to be 95% sure? We really are about the same We really are smarter We conclude MI folks are just like other Americans True negative (we are the same, and we got that right) False negative (we really are smarter, but didn ’ t figure it out – how smart is that??) We conclude MI folks are smarter than rest of USA False positive (we are really the same, but inferred we were smarter) True positive (we are smarter, and we got that right!) Consider the trade-off we make between errors:
Statistical vs. Practical Significance There ’ s a flip-side to the statistical power issue … with a big sample size, you can detect “ statistically significant ” effects that are trivial in the real world. ? Example: the Headstart program Tens of thousands of children A “ statistically significant ” effect => many researchers claimed “ it worked ” ! The size of the change was trivial
Choosing the right statistical test: 1 ? Figure out which is/are your independent variable(s) These are the “ predictors ” or the things that go on the X-axis of a graph ? Figure out which is your dependent variable This is usually the main thing you are interested in, the outcome, the thing that goes on the Y- axis of a graph
Choosing the right statistical test: 2 ? Dichotomous (just two options) Male vs. female Experimental group vs. control group Pretest vs. posttest time ? Categorical (multiple categories) Caucasian vs. African American vs. Hispanic vs. … Group 1 vs. 2 vs. 3 ? Continuous (interval & ratio scales) Age Test scores Shoe sizes The type of test we use depends on the types of variables we have:
Choosing the right test: 3 DV is DichotomousCategoricalContinuous IV is/are: Dichot- omous Chi-square t-test Cate- gorical Chi-square ANOVA Contin- uous Discriminant function analysis Correlation or regression
Synthesizing Studies: Two Methods ? The literature review: Reviewing, summarizing, and critiquing the main studies in a particular area, and drawing a conclusion about the strength of the evidence over multiple studies Use when many of the best studies in the area are qualitative, or when there are not enough quant studies for a meta-analysis ? Meta-analysis A statistical technique for combining information about “ effect sizes ” to come to an overall conclusion about the strength of the evidence over multiple studies Use when there are many quant studies out there already, some with conflicting results; use to look for meta-effects (e.g., type of sample, type of intervention, etc)
Overview ? Experimental methods Why use experimental methods? Ruling out rival explanations Some useful experimental designs ? Using multiple methods Balancing weaknesses in methods Uses of multiple methods
Why use experimental methods? ? Main point = to nail down causality ? Causality involves ruling out rival explanations for the effects observed, e.g., New kind of hearing aid Higher math test scores in hearing- impaired students
What kinds of rival explanations? ? What else could have accounted for the increase in math scores of students using the new hearing aid? The students were better on the posttest because of the practice they got on the pretest – testing/practice effect The students tested happened to score a bit low on the day of the pretest, so the “ improvement ” was just the posttest moving closer to the average – regression to the mean
Rival explanations (contd) … The approach to teaching math changed between the pretest and the posttest – history The lowest-performing students were absent the day of the posttest – mortality Students of that age naturally get better at math at about that age – maturation Using the new hearing aid needed parental consent, and only those parents with a strong interest in their child ’ s academic performance consented - selection
Rival explanations (contd) … Students using the hearing aid felt this was special treatment, so tried harder – Hawthorne effect The hearing aid is novel, so the students feel excited and more motivated about listening (though the novelty wears off later on, after the posttest) – novelty effect The teacher expected to see better results among these students, and subconsciously tended to grade their answers more favorably – researcher expectancy effect
Simple experimental designs ? XO = posttest only (no pretest) ? OXO = pretest and posttest ? OXO/OCO = pretest and posttest on both an experimental and a nonequivalent control group ? Hard to rule out any of the rival explanations ? Rules out selection and mortality ? Rules out several rival explanations, but weaker because control group not equivalent X = treatment, C = control, O = test/measure
Randomized designs ? R: OXO OCO ? R: XO CO ? R: OXO OCO XO CO ? randomly assigned experimental and control groups pre & post posttest only on randomly assigned exp ’ tal and control grps ? Solomon four-group design: two experimental and two control groups; half pretested, all posttested
What should a control group be? ? Think of the practical question you need to answer with your research, e.g., Is this treatment/method better than nothing? Is it better than what we are using now? Consider the option of using a “ Placebo ” In drug studies, this is the “ sugar pill ” In experimental designs, it is an intervention that is not expected to affect the DV, but make sure the control group doesn ’ t feel “ left out ” of the experimental group.
Examples without control groups Suppose you wanted to see if phonics training in kindergarten improved students ’ ability to read in first grade … What would a posttest only design with no control group (XO) design entail (i.e., what,whom, and when would you test)? How about a pretest-posttest design with no control group (OXO) entail?
Examples with control groups Suppose you wanted to see if phonics training in kindergarten improved students ’ ability to read in first grade … What would a pretest-posttest design with a control group (OXO/OCO) entail? How about a posttest only design with a control (XO/CO)? How would randomization help with each of the above? Is it practically feasible?
The Solomon four-group design Suppose you wanted to see if phonics training in kindergarten improved students ’ ability to read in first grade … How would you set up a randomized Solomon four-group design? R: OXO OCO XO CO
U sing multiple methods Quantitative ? ________________ Qualitative ? ________________ What are the weaknesses of qual & quant methods?
Complementary multiplism ? All research methods have weaknesses ? Complementary multiplism (a.k.a. critical multiplism) is the practice of deliberately choosing complementary methods with different weaknesses, so that the strengths of one make up for the weaknesses of another
Uses of mixed methods To bring “ dry ” statistics alive ? To dig into puzzling results and try to understand them better To “ triangulate ” by getting multiple perspectives on one issue If qual and quant data point in the same direction, you can be more certain that your results are robust If they tell you something different, it ’ s time to dig again!