2 Overview Intro to quantitative methods A Number “… the characteristic of an individual by which it is treated as a unit or of a collection by which it is treated in terms of units”A Variable“ A concept or characteristic that contains variation”Measurement“The assignment of numbers to indicate different values of a variable”Techniques or Instruments
3 Measurement Its purposes Measurement <=> Invalid Research ? … to provide the basis for the results, conclusions, and significance of the research.Measurement <=> Invalid Research ?… to provide information about the variables that are being studied.Measurement <=> VariablesThe way in which the numbers are used to describe something determines the amount of information that is communicatedA useful classification of this process is referred to as scales of measurement
4 Measurement Scale N O I R Ratio Interval Ordinal Nominal Or Levels of MeasurementIntervalOrdinalNominalNumbers assigned to categoriesNNumbers ranked-orderedO“+Equal intervals between numbersI“+“+Numbers expressed as ratiosR“+“+“
5 Descriptive Statistics Descriptive Statistics: Used to help describe a group of numbersWhat can we say about the following set of numbers?IIIIIIFrequency distribution:It will indicates how often each score is obtained
6 Symmetrical Distribution Measures of VariabilityRange: “Difference between the highest and lowest score”(28-15) = 13Standard Deviation: “Average distance of the scores from the mean”SD=4Symmetrical DistributionPositive SkewNegative Skew(mean, median, mode)(mode, median, mean)(mean, median, mode)Standard deviationsx = individual scores M = mean N = number of scores in group-2σ-1σ+1σ+2σ
7 Pearson Product-Moment Correlation Correlation: Measure of relationship between two or more quantitative variablesCorrelation Coefficient:A number between –1 and +1Which indicates the direction and strength of the relationship.Pearson Product-Moment Correlationr= .45
10 Measurement: The basics Construct = a characteristic that can’t be directly measured, e.g., intelligenceOperational definition = a breakdown of what the elements are of that construct (e.g., verbal, quantitative, and analytical ability), or what that construct “looks like” in realityMeasure = a numerical representation of part of the construct, e.g., items on an IQ testMeasures have to be both reliable and valid
11 Reliability and Validity Reliability = consistency of results …No matter when something is measuredNo matter how it is measured (measured well!)No matter where it is measuredValidity = accuracy …Measuring the right thing, andMeasuring the thing right!
15 Construct ValidityConstruct validity (checking we have measured the right thing and have measured it right) consists of several elements, including:Content validity – the measure covers everything it needs to cover (e.g., intelligence test covers verbal, quantitative, & analytic abilities)Convergent and discriminant validity – it correlates with other related tests (e.g., other cognitive ability tests), and not with what it shouldn’t be related to (e.g., personality)Criterion-related validity – it predicts what it should predict (e.g., IQ score predicts GPA)Face validity – it looks valid to people
16 ReliabilityInternal consistency – all the items (single questions) within a scale (set of items added up) are measuring the same thingEquivalence – different forms of the test generate about the same scores (incl. split-half reliability, Cronbach’s alpha, and some others)Stability, a.k.a. test-retest reliability – people score about the same no matter when they take it (assuming no change has occurred in between)
17 Norm-referenced vs. criterion-referenced tests Norm-referenced tests tell you where someone is relative to everyone else, e.g., IQ tests: an IQ of 115 => 84th percentileCriterion-referenced tests tell you whether someone has achieved a certain level of performance, e.g., written driver’s license test, and the test for this class!
19 Overview What are “inferential statistics”? Error, confidence intervals, & statistical powerHypothesis testingSome of the basics:t-tests, chi-square, & ANOVACorrelation and regression analysesSynthesizing multiple findingsThe literature reviewMeta-analysis
20 What are “inferential statistics”? Descriptive statisticsShow us how a single variable is distributed (frequency graphs)Show us a picture of the relationship between two variables (correlations)Inferential statisticsAllow us to get serious about checking hunches and hypothesesUsually look at the strength of relationship between two or more variables
21 Errors: Getting the inference wrong Example: Deciding to let/not let someone intograduate school on the basis of GRE scoresUnsuccessful in graduate schoolSuccessful in graduate schoolLow GRE=> rejectedTrue negativesFalse negatives(Type I error)High GRE=> acceptedFalse positives(Type II error)True positives
22 Confidence intervals 1: “margin of error” for an individual score 68%95%-3sd-2sd-1sd+1sd+2sd+3sd99%If we took a random individual from this population,there is a 95% chance that that person’s scorewill fall between –2sd and +2sd. For IQ, that’s a 95% chance of being between 70 and 130.
23 Confidence intervals 2: means Distribution of IQ scores in the“normal population”: 95%of the individual scoreslie between 70 and 130(SD = 15)557085100115130145If we took 100,000 groups of nine (9) people each, the mean IQs of those groups would be distributed like this i.e., 95% of the means would liebetween 90 and 110(SE = 15/(SQRT(9)) = 5)901001108595105115
24 Example: Intelligence (IQ) “The normal population”Mean = 100A sample of 9Michigan residentsMean = 108557085100115130145Question: Are Michigan folks unusually smart,or did we just accidentally end up with someparticularly smart people in the sample?
25 Hypothesis testing We need to test two alternate hypotheses: No cause for alarm, America – Michigan folks are just like other regular folks (i.e., the mean for this sample is not that ‘off-the-wall’)The “null hypothesis” (H0)Holy guacamole – it looks like Michigan folks really are more brilliant than the rest (i.e., the mean for this sample is wa~~y out there)!!The “alternative hypothesis” (H1)
26 Is our group significantly different? Remember: If we took 100,000groups of nine (9) people each,the means of those groupswould be distributed like this i.e., 95% of the means wouldlie between 90 and 110(SE = 15/(SQRT(9)) = 5)OK, so is our mean of 108 unusually high?No! Because it’s inside the 95% range (90 to 110).=> We “fail to reject the null hypothesis.”95%90100110
27 What if we had a bigger sample? If we sample groups of 9 people, 95% of the means for those groups fall between IQ 90 and 11095%90100110If we sample groups of 25 people, 95% of the means for those groups fall between IQ 94 and 106 (narrower)95%94100106=> If our MI sample had been of 25 people with a mean IQ of 108, we could have been 95% certain that Michigan people were smarter (we’d have had more power). But with a sample of only 9, we just couldn’t be certain enough (even though it looked likely).
28 Why do we want to be 95% sure? Consider the trade-off we make between errors:We really are about the sameWe really are smarterWe conclude MI folks are just like other AmericansTrue negative (we are the same, and we got that right)False negative(we really are smarter, but didn’t figure it out – how smart is that??)We conclude MI folks are smarter than rest of USAFalse positive(we are really the same, but inferred we were smarter)True positive(we are smarter, and we got that right!)
29 Statistical vs. Practical Significance There’s a flip-side to the statistical power issue … with a big sample size, you can detect “statistically significant” effects that are trivial in the real world.Example: the Headstart programTens of thousands of childrenA “statistically significant” effect => many researchers claimed “it worked”!The size of the change was trivial
30 Choosing the right statistical test: 1 Figure out which is/are your independent variable(s)These are the “predictors” or the things that go on the X-axis of a graphFigure out which is your dependent variableThis is usually the main thing you are interested in, the outcome, the thing that goes on the Y-axis of a graph
32 Choosing the right statistical test: 2 The type of test we use depends on the types of variables we have:Categorical (multiple categories)Caucasian vs. African American vs. Hispanic vs. …Group 1 vs. 2 vs. 3Continuous (interval & ratio scales)AgeTest scoresShoe sizesDichotomous (just two options)Male vs. femaleExperimental group vs. control groupPretest vs. posttest time
33 Choosing the right test: 3 DV is DichotomousCategoricalContinuousIV is/are:Dichot-omousChi-squaret-testCate-goricalANOVAContin-uousDiscriminant function analysisCorrelation or regression
34 Synthesizing Studies: Two Methods The literature review:Reviewing, summarizing, and critiquing the main studies in a particular area, and drawing a conclusion about the strength of the evidence over multiple studiesUse when many of the best studies in the area are qualitative, or when there are not enough quant studies for a meta-analysisMeta-analysisA statistical technique for combining information about “effect sizes” to come to an overall conclusion about the strength of the evidence over multiple studiesUse when there are many quant studies out there already, some with conflicting results; use to look for meta-effects (e.g., type of sample, type of intervention, etc)
36 Overview Experimental methods Using multiple methods Why use experimental methods?Ruling out rival explanationsSome useful experimental designsUsing multiple methodsBalancing weaknesses in methodsUses of multiple methods
37 Why use experimental methods? Main point = to nail down causalityCausality involves ruling out rival explanations for the effects observed, e.g.,New kind of hearing aidHigher math test scores in hearing-impaired students
38 What kinds of rival explanations? What else could have accounted for the increase in math scores of students using the new hearing aid?The students were better on the posttest because of the practice they got on the pretest – testing/practice effectThe students tested happened to score a bit low on the day of the pretest, so the “improvement” was just the posttest moving closer to the average – regression to the mean
39 Rival explanations (contd) … The approach to teaching math changed between the pretest and the posttest – historyThe lowest-performing students were absent the day of the posttest – mortalityStudents of that age naturally get better at math at about that age – maturationUsing the new hearing aid needed parental consent, and only those parents with a strong interest in their child’s academic performance consented - selection
40 Rival explanations (contd) … Students using the hearing aid felt this was special treatment, so tried harder – Hawthorne effectThe hearing aid is novel, so the students feel excited and more motivated about listening (though the novelty wears off later on, after the posttest) – novelty effectThe teacher expected to see better results among these students, and subconsciously tended to grade their answers more favorably – researcher expectancy effect
41 Simple experimental designs X = treatment, C = control, O = test/measureXO = posttest only (no pretest)OXO = pretest and posttestOXO/OCO = pretest and posttest on both an experimental and a nonequivalent control groupHard to rule out any of the rival explanationsRules out selection and mortalityRules out several rival explanations, but weaker because control group not equivalent
42 Randomized designs R: OXO OCO R: XO CO XO randomly assigned experimental and control groups pre & postposttest only on randomly assigned exp’tal and control grpsSolomon four-group design: two experimental and two control groups; half pretested, all posttested
43 What should a control group be? Think of the practical question you need to answer with your research, e.g.,Is this treatment/method better than nothing?Is it better than what we are using now?Consider the option of using a “Placebo”In drug studies, this is the “sugar pill”In experimental designs, it is an intervention that is not expected to affect the DV, but make sure the control group doesn’t feel “left out” of the experimental group.
44 Examples without control groups Suppose you wanted to see if phonics training in kindergarten improved students’ ability to read in first grade …What would a posttest only design with no control group (XO) design entail (i.e., what ,whom, and when would you test)?How about a pretest-posttest design with no control group (OXO) entail?
45 Examples with control groups Suppose you wanted to see if phonics training in kindergarten improved students’ ability to read in first grade …What would a pretest-posttest design with a control group (OXO/OCO) entail?How about a posttest only design with a control (XO/CO)?How would randomization help with each of the above? Is it practically feasible?
46 The Solomon four-group design Suppose you wanted to see if phonics training in kindergarten improved students’ ability to read in first grade …How would you set up a randomized Solomon four-group design?R: OXOOCOXOCO
47 Using multiple methods What are the weaknesses of qual & quant methods?Quantitative________________Qualitative________________
48 Complementary multiplism All research methods have weaknessesComplementary multiplism (a.k.a. critical multiplism) is the practice of deliberately choosing complementary methods with different weaknesses, so that the strengths of one make up for the weaknesses of another
49 Uses of mixed methods To bring “dry” statistics alive To dig into puzzling results and try to understand them betterTo “triangulate” by getting multiple perspectives on one issueIf qual and quant data point in the same direction, you can be more certain that your results are robustIf they tell you something different, it’s time to dig again!