Presentation on theme: "Experimental, Quasi-experimental, and Single Subject Research"— Presentation transcript:
1Experimental, Quasi-experimental, and Single Subject Research 774/801 Sept 1, 2004John Hattie & Tony Hunt
2It is simple: There is no perfect experiment in education There is nearly always a trade off between thePower to generalise- from sample to population- from items to the behaviour domain- from conditions in the study to all intended conditionsand thePower to convince- there are many audiencesPGPC
3PGPower to GeneralisePCHow confident can be generalise from the study to all “similar” situationsIs the design replicable/reproducible/exchangeable?Is the evidence/conclusions unique to this study?Have the generalisations taken into account all possible competing views – plausible alternative rival explanations (PARE)
4Power to convince Who are we trying to convince PCPGPower to convinceWho are we trying to convinceIf it is a colleague(s) then more situation specificity may be convincing (kids/classrooms/schools like mine)If it is the educational community, then situation needs to be less critical
5Resolution: Linking Power Experimental design consists of a series of links:It is as strong as the weakest linkEach link influences the next linkDesirable to have equal strengthDoes each link have explanatory powerAre conclusions credible to the intended audience
6History Stanley and Campbell (1963) Cook and Campbell (1979) Shadish, Cook & Campbell (2002)Evidence basedAll based on designing studies that can lead to explanation and claims of causality
7Explanation and Cause Cause and effect must be related (e.g., self-concept & achievement)There needs to be temporal order (cause before effect)Need to rule out other explanations/ other Plausible Alternative Rival Explanations (PARE)
8Campbell & Stanley (1963) Pretest-Posttest Control Group Design Pre Treatment PostR O X OR O ORandomisation – aiming for representativeness
9But can we randomiseNo Child Left BehindTennessee Class Size Study
10Quasi-experimentation: When you do not have so much control over allocation of treatment, conditions, sampleWhen you have non-equivalent groupsIn quasi-experimentation, the researcher has to enumerate alternative explanations one by one, decide which are plausible, and then use logic, design, and measurement to assess whether each one is operating in a way that might explain any observed effect (Shadish, Cook & Campbell, 2002, p. 14)Relates to the Popper notion of falsification: What evidence would you accept that you are wrong?
11Examples of Quasi-experimental designs a. Time SeriesO1 O2 O3 O4 O5 X O6 O7 O8 O9Divorce LawsOzdowski, S.A. & Hattie, J.A. (1981). The impact of divorce laws on divorce rate in Australia: A time series analysis. Australian Journal of Social Issues, 16, 3-17.
12ABA design. A. B. A. O1. O2. O3. X4. X5. X6. O7. O8. O9 Le Fevre, et ABA design A B A O1 O2 O3 X4 X5 X6 O7 O8 O9 Le Fevre, et.al. (2002). Adequate Decoders
13Multilevel Design: Hierarchical Linear Modelling Students within classes within schoolsE.g., Tracking/StreamingSchool 1 School 2…Teacher 1 Teacher 2Class 1 Class 2 Class 1 Class 2
15Minimal requirements for Studies SamplingItems to behaviour domainsPeople to all possible peopleConditions to all possible conditionsRepresentative sampling viaRandom samplingStratified random sampling
16VariablesAt the end of your study, can I say “Aha, so that is what you mean, now I am clear”Open constructs NOT DefinitionsNo such thing as immaculate perceptionDependent - ManipulableIndependent - Nonmanipulable
17DependabilityHow reliable/consistent/replicable are your measures/ observations
18Validity = Interpretations Validity - "an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment".Not validity of a test, but validity of interpretations
19Validity of your study … Is related to having ruled outPlausible Alternative Rival Explanations (PARE)CONTROL CONTROL CONTROL… some examples
201. PARE: PowerIs your study POWERFUL enough to detect the effect you are investigating
21Do chickens have lips? 1. PARE: Power Is your study POWERFUL enough to detect the effect you are investigatingDo chickens have lips?
232. PARE: Chance Did the effect/conclusion occur by chance E.g., That two means are the same – the hypothesis of no differenceSetting a rejection level, say a =.05
243. PARE: Type II errorsType I errors – Rejecting a claim when it is true (a =.05)Type II errors – Accepting a claim when it is false (e.g., chickens do not have lips, if it is indeed true)
254. PARE Reliability of your measures If the reliability is low, then the scores “wobble” and no guarantee you will get same results using these instruments (tests, observations, interviews, etc.)Was the treatment “consistent” in the various classes/implementations?
265. PARE: Was the treatment implemented? Degree of implementationThe Hong Kong Practical Science Study (Cheung, Hattie, & Bucat, 1997)
276. PARE: MaturationShowing change may not be enough as kids improve anyway (e.g., by maturation)Method to measure change = Effect-sizesPost-Pre/spread = Effect-sizeX2 – X1sddiffe.g., Before = 12, After = 15, spread = 6= .56
28Distribution of effects Average effectZero achievement
377. PARE: TestingPeople become test wise and/or may respond different when under test conditionsWhite space and testing in asTTleTestwiseness
38Test of Objective Evidence Each of the questions in the following set has a logical or "best" answer from its corresponding multiple choice answer set. Please record your eight answers.The purpose of the cluss in 2 Trassig is true whenfurmpaling is to removeA clump trasses the vonA cluss-prags B the viskal flans, if the viskal isB tr s donwil or zortilC cloughs C the belgo frulsD pluomots D dissels lisk easily3 The sigia frequently overfesks the 4. The fribbled breg will minter besttrelsum because with anA all sigias are mellious A derstB all sigias are always votial B morstC the trelsum is usually tarious C sortarD no trelsa are feskable D ignu
39Test of Objective Evidence, Part II The reasons for tristal doss are 6 Which of the following is/arealways present when trossels arebeing gruven?A the sabs foped and the doths tinzed A rint and vostB the dredges roted with the crets B vostC few rakobs were accepted in sluth C shum and vostD most of the polats were thonced D vost and plone7 The mintering function of the ignu is most 8effectively carried out in connection withA a razma toi AB the groshing stantol BC the fribbled breg CD a frailly sush D
408. PARE Statistical Regression When taking extreme groups the means tend to move to the middle.Why do the tallest fathers have shorter sons, and the shortest fathers have taller sons?
41…. Regression to the Mean Special Education (e.g., Sesame Street)Effective schoolsGifted education
429. PARE Response rates The returns of questionnaires/tests/interviews should be highWhat is typical?
43Meta-analyses of Response Rates Typical return is 50%Three major factors:Salience (77% vs 42%)Number of follow ups (halve each time)Lack of clutter/ orderlinessNot length (ave 7 pages, 72 questions), colour,
4410. Change scores The difference between post-pre scores Problems UnreliableAre you measuring same thing both timesRegression to the mean
4511 PARE: Experimenter effects Hawthorne effect: Because we know we are in an experiment this alters our responsesHans the HorsePygmalion in the classroomChristine Rubie’s thesisStanley Milgrim’s experiment
4612. PARE: Restriction of range When you choose/focus on a narrow range of abilities (etc.) this can be misleadingPicture …
4913. PARE: Specification of target and accessible sample/population Most experiments are highly local but have general aspirationsOften, there are two groups you are generalising to: e.g., all secondary students in NZ, and to all secondary students you have access -- from which to sample
5114: InteractionsThe model of individual differences indicates that we should modify our teaching methods to allow for individual differences in the class
52Maximising the chance that the conclusions are defensible and The art of research design is to devise experiments to identify the explanation and cause of effects – byMaximising the chance that the conclusions are defensible andMinimising the PAREsSuch that you havePower to Generalise andPower to Convince
53Unobtrusive measures Which painting do most people watch? Friendship in citiesRacism in suburbs/cities
54Statistical Methods to assist … CorrelationAnalysis of variance (anova)Cross-tabulation
57Comparing means: Magnitude and Chance Magnitude Effect-sizesChance Analysis of Variance
58Well-beingWhat are the differences in levels of WELL-BEING among males and females, and between Australia and New ZealandCountry * GENDER Cross tabulationCountGENDERMALE FEMALE TotalCountry New ZealandAustraliaTotal