Presentation on theme: "Experimental, Quasi-experimental, and Single Subject Research"— Presentation transcript:
1 Experimental, Quasi-experimental, and Single Subject Research 774/801 Sept 1, 2004John Hattie & Tony Hunt
2 It is simple: There is no perfect experiment in education There is nearly always a trade off between thePower to generalise- from sample to population- from items to the behaviour domain- from conditions in the study to all intended conditionsand thePower to convince- there are many audiencesPGPC
3 PGPower to GeneralisePCHow confident can be generalise from the study to all “similar” situationsIs the design replicable/reproducible/exchangeable?Is the evidence/conclusions unique to this study?Have the generalisations taken into account all possible competing views – plausible alternative rival explanations (PARE)
4 Power to convince Who are we trying to convince PCPGPower to convinceWho are we trying to convinceIf it is a colleague(s) then more situation specificity may be convincing (kids/classrooms/schools like mine)If it is the educational community, then situation needs to be less critical
5 Resolution: Linking Power Experimental design consists of a series of links:It is as strong as the weakest linkEach link influences the next linkDesirable to have equal strengthDoes each link have explanatory powerAre conclusions credible to the intended audience
6 History Stanley and Campbell (1963) Cook and Campbell (1979) Shadish, Cook & Campbell (2002)Evidence basedAll based on designing studies that can lead to explanation and claims of causality
7 Explanation and Cause Cause and effect must be related (e.g., self-concept & achievement)There needs to be temporal order (cause before effect)Need to rule out other explanations/ other Plausible Alternative Rival Explanations (PARE)
8 Campbell & Stanley (1963) Pretest-Posttest Control Group Design Pre Treatment PostR O X OR O ORandomisation – aiming for representativeness
9 But can we randomiseNo Child Left BehindTennessee Class Size Study
10 Quasi-experimentation: When you do not have so much control over allocation of treatment, conditions, sampleWhen you have non-equivalent groupsIn quasi-experimentation, the researcher has to enumerate alternative explanations one by one, decide which are plausible, and then use logic, design, and measurement to assess whether each one is operating in a way that might explain any observed effect (Shadish, Cook & Campbell, 2002, p. 14)Relates to the Popper notion of falsification: What evidence would you accept that you are wrong?
11 Examples of Quasi-experimental designs a. Time SeriesO1 O2 O3 O4 O5 X O6 O7 O8 O9Divorce LawsOzdowski, S.A. & Hattie, J.A. (1981). The impact of divorce laws on divorce rate in Australia: A time series analysis. Australian Journal of Social Issues, 16, 3-17.
12 ABA design. A. B. A. O1. O2. O3. X4. X5. X6. O7. O8. O9 Le Fevre, et ABA design A B A O1 O2 O3 X4 X5 X6 O7 O8 O9 Le Fevre, et.al. (2002). Adequate Decoders
13 Multilevel Design: Hierarchical Linear Modelling Students within classes within schoolsE.g., Tracking/StreamingSchool 1 School 2…Teacher 1 Teacher 2Class 1 Class 2 Class 1 Class 2
15 Minimal requirements for Studies SamplingItems to behaviour domainsPeople to all possible peopleConditions to all possible conditionsRepresentative sampling viaRandom samplingStratified random sampling
16 VariablesAt the end of your study, can I say “Aha, so that is what you mean, now I am clear”Open constructs NOT DefinitionsNo such thing as immaculate perceptionDependent - ManipulableIndependent - Nonmanipulable
17 DependabilityHow reliable/consistent/replicable are your measures/ observations
18 Validity = Interpretations Validity - "an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment".Not validity of a test, but validity of interpretations
19 Validity of your study … Is related to having ruled outPlausible Alternative Rival Explanations (PARE)CONTROL CONTROL CONTROL… some examples
20 1. PARE: PowerIs your study POWERFUL enough to detect the effect you are investigating
21 Do chickens have lips? 1. PARE: Power Is your study POWERFUL enough to detect the effect you are investigatingDo chickens have lips?
23 2. PARE: Chance Did the effect/conclusion occur by chance E.g., That two means are the same – the hypothesis of no differenceSetting a rejection level, say a =.05
24 3. PARE: Type II errorsType I errors – Rejecting a claim when it is true (a =.05)Type II errors – Accepting a claim when it is false (e.g., chickens do not have lips, if it is indeed true)
25 4. PARE Reliability of your measures If the reliability is low, then the scores “wobble” and no guarantee you will get same results using these instruments (tests, observations, interviews, etc.)Was the treatment “consistent” in the various classes/implementations?
26 5. PARE: Was the treatment implemented? Degree of implementationThe Hong Kong Practical Science Study (Cheung, Hattie, & Bucat, 1997)
27 6. PARE: MaturationShowing change may not be enough as kids improve anyway (e.g., by maturation)Method to measure change = Effect-sizesPost-Pre/spread = Effect-sizeX2 – X1sddiffe.g., Before = 12, After = 15, spread = 6= .56
28 Distribution of effects Average effectZero achievement
30 The disasters … 71 programmed instruction 801 .14 72 finances 1634 73 problem based learning41.1274diet25575gender (female-male)9020.0976inductive teaching570.0677team teaching78ability grouping3355.0579class size255980open vs. traditional3426-.0181summer vacation269-.0682retention3626-.1783transfer of school354-.2684disruptive students1511-.78
31 The also rans … 56 metacognitive intervention 921 .29 57 math programs 3326.2758audio-visual2699.2659gifted programs47.2560coaching1076.2461behavior objectives15762calculators23863mainstreaming1641.2164questioning493.2065learning hierarchies168.1966attitude to math112267desegregation1590.1868play129.1669television4337.15
32 Almost there … 42 tutoring 136 .35 43 activity-based programs 674 44 remedial programs143845classroom climate272646social skills training547247time1680.3448CAI18231.3249inquiry based teaching274050preschool24251whole language198.3152within class grouping235953testing146354problem solving1141.3055background692
37 7. PARE: TestingPeople become test wise and/or may respond different when under test conditionsWhite space and testing in asTTleTestwiseness
38 Test of Objective Evidence Each of the questions in the following set has a logical or "best" answer from its corresponding multiple choice answer set. Please record your eight answers.The purpose of the cluss in 2 Trassig is true whenfurmpaling is to removeA clump trasses the vonA cluss-prags B the viskal flans, if the viskal isB tr s donwil or zortilC cloughs C the belgo frulsD pluomots D dissels lisk easily3 The sigia frequently overfesks the 4. The fribbled breg will minter besttrelsum because with anA all sigias are mellious A derstB all sigias are always votial B morstC the trelsum is usually tarious C sortarD no trelsa are feskable D ignu
39 Test of Objective Evidence, Part II The reasons for tristal doss are 6 Which of the following is/arealways present when trossels arebeing gruven?A the sabs foped and the doths tinzed A rint and vostB the dredges roted with the crets B vostC few rakobs were accepted in sluth C shum and vostD most of the polats were thonced D vost and plone7 The mintering function of the ignu is most 8effectively carried out in connection withA a razma toi AB the groshing stantol BC the fribbled breg CD a frailly sush D
40 8. PARE Statistical Regression When taking extreme groups the means tend to move to the middle.Why do the tallest fathers have shorter sons, and the shortest fathers have taller sons?
41 …. Regression to the Mean Special Education (e.g., Sesame Street)Effective schoolsGifted education
42 9. PARE Response rates The returns of questionnaires/tests/interviews should be highWhat is typical?
43 Meta-analyses of Response Rates Typical return is 50%Three major factors:Salience (77% vs 42%)Number of follow ups (halve each time)Lack of clutter/ orderlinessNot length (ave 7 pages, 72 questions), colour,
44 10. Change scores The difference between post-pre scores Problems UnreliableAre you measuring same thing both timesRegression to the mean
45 11 PARE: Experimenter effects Hawthorne effect: Because we know we are in an experiment this alters our responsesHans the HorsePygmalion in the classroomChristine Rubie’s thesisStanley Milgrim’s experiment
46 12. PARE: Restriction of range When you choose/focus on a narrow range of abilities (etc.) this can be misleadingPicture …
49 13. PARE: Specification of target and accessible sample/population Most experiments are highly local but have general aspirationsOften, there are two groups you are generalising to: e.g., all secondary students in NZ, and to all secondary students you have access -- from which to sample
51 14: InteractionsThe model of individual differences indicates that we should modify our teaching methods to allow for individual differences in the class
52 Maximising the chance that the conclusions are defensible and The art of research design is to devise experiments to identify the explanation and cause of effects – byMaximising the chance that the conclusions are defensible andMinimising the PAREsSuch that you havePower to Generalise andPower to Convince
53 Unobtrusive measures Which painting do most people watch? Friendship in citiesRacism in suburbs/cities
54 Statistical Methods to assist … CorrelationAnalysis of variance (anova)Cross-tabulation
57 Comparing means: Magnitude and Chance Magnitude Effect-sizesChance Analysis of Variance
58 Well-beingWhat are the differences in levels of WELL-BEING among males and females, and between Australia and New ZealandCountry * GENDER Cross tabulationCountGENDERMALE FEMALE TotalCountry New ZealandAustraliaTotal