Presentation is loading. Please wait.

Presentation is loading. Please wait.

Experimental, Quasi- experimental, and Single Subject Research 774/801 Sept 1, 2004 John Hattie & Tony Hunt.

Similar presentations

Presentation on theme: "Experimental, Quasi- experimental, and Single Subject Research 774/801 Sept 1, 2004 John Hattie & Tony Hunt."— Presentation transcript:

1 Experimental, Quasi- experimental, and Single Subject Research 774/801 Sept 1, 2004 John Hattie & Tony Hunt

2 It is simple: There is no perfect experiment in education There is nearly always a trade off between the Power to generalise - from sample to population - from items to the behaviour domain - from conditions in the study to all intended conditions and the Power to convince - there are many audiences PGPC

3 Power to Generalise How confident can be generalise from the study to all “similar” situations Is the design replicable/reproducible/exchangeable? Is the evidence/conclusions unique to this study? Have the generalisations taken into account all possible competing views – plausible alternative rival explanations (PARE) PG PC

4 Power to convince Who are we trying to convince If it is a colleague(s) then more situation specificity may be convincing (kids/classrooms/schools like mine) If it is the educational community, then situation needs to be less critical PC PG

5 Resolution: Linking Power Experimental design consists of a series of links: – It is as strong as the weakest link – Each link influences the next link – Desirable to have equal strength – Does each link have explanatory power – Are conclusions credible to the intended audience

6 History Stanley and Campbell (1963) Cook and Campbell (1979) Shadish, Cook & Campbell (2002) Evidence based All based on designing studies that can lead to explanation and claims of causality

7 Explanation and Cause 1. Cause and effect must be related (e.g., self-concept & achievement) 1. There needs to be temporal order (cause before effect) 2. Need to rule out other explanations/ other Plausible Alternative Rival Explanations (PARE)

8 Campbell & Stanley (1963) Pretest-Posttest Control Group Design PreTreatmentPost ROXO ROO Randomisation – aiming for representativeness

9 But can we randomise No Child Left Behind Tennessee Class Size Study

10 Quasi-experimentation: When you do not have so much control over allocation of treatment, conditions, sample When you have non-equivalent groups In quasi-experimentation, the researcher has to enumerate alternative explanations one by one, decide which are plausible, and then use logic, design, and measurement to assess whether each one is operating in a way that might explain any observed effect (Shadish, Cook & Campbell, 2002, p. 14) Relates to the Popper notion of falsification: What evidence would you accept that you are wrong?

11 Examples of Quasi-experimental designs Divorce Laws Ozdowski, S.A. & Hattie, J.A. (1981). The impact of divorce laws on divorce rate in Australia: A time series analysis. Australian Journal of Social Issues, 16, a. Time Series O 1 O 2 O 3 O 4 O 5 X O 6 O 7 O 8 O 9

12 ABA design ABA O 1 O 2 O 3 X 4 X 5 X 6 O 7 O 8 O 9 Le Fevre, (2002). Adequate Decoders

13 Multilevel Design: Hierarchical Linear Modelling Students within classes within schools E.g., Tracking/Streaming School 1School 2… Teacher 1Teacher 2 Class 1Class 2Class 1Class 2

14 Structural Equation Modelling

15 Minimal requirements for Studies Sampling – Items to behaviour domains – People to all possible people – Conditions to all possible conditions Representative sampling via – Random sampling – Stratified random sampling

16 Variables At the end of your study, can I say “Aha, so that is what you mean, now I am clear” Open constructs NOT Definitions No such thing as immaculate perception Dependent - Manipulable Independent - Nonmanipulable

17 Dependability How reliable/consistent/replicable are your measures/ observations

18 Validity = Interpretations Validity - "an integrated evaluative judgement of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores or other modes of assessment". Not validity of a test, but validity of interpretations

19 Validity of your study … Is related to having ruled out Plausible Alternative Rival Explanations (PARE) CONTROL CONTROL CONTROL … some examples

20 1. PARE: Power Is your study POWERFUL enough to detect the effect you are investigating

21 1. PARE: Power Is your study POWERFUL enough to detect the effect you are investigating Do chickens have lips?


23 2. PARE: Chance Did the effect/conclusion occur by chance E.g., That two means are the same – the hypothesis of no difference Setting a rejection level, say  =.05

24 3. PARE: Type II errors Type I errors – Rejecting a claim when it is true (  =.05) Type II errors – Accepting a claim when it is false (e.g., chickens do not have lips, if it is indeed true)

25 4. PARE Reliability of your measures If the reliability is low, then the scores “wobble” and no guarantee you will get same results using these instruments (tests, observations, interviews, etc.) Was the treatment “consistent” in the various classes/implementations?

26 5. PARE: Was the treatment implemented? Degree of implementation The Hong Kong Practical Science Study (Cheung, Hattie, & Bucat, 1997)

27 6. PARE: Maturation Showing change may not be enough as kids improve anyway (e.g., by maturation) Method to measure change = Effect-sizes Post-Pre/spread = Effect-size X2 – X1 sd diff e.g., Before = 12, After = 15, spread = =.5 6

28 Distribution of effects Zero achievement Average effect

29 Distribution of effects Maturation

30 The disasters … 71programmed instruction finances problem based learning diet gender (female-male) inductive teaching team teaching ability grouping class size open vs. traditional summer vacation retention transfer of school disruptive students

31 The also rans … 56metacognitive intervention math programs audio-visual gifted programs coaching behavior objectives calculators mainstreaming questioning learning hierarchies attitude to math desegregation play television

32 Almost there … 42tutoring activity-based programs remedial programs classroom climate social skills training time CAI inquiry based teaching preschool whole language within class grouping testing problem solving background692.30

33 In the middle … 29 parent involvement bilingual programs adjunct aids concept mapping advance organizers hypermedia instruction socio economic status perceptual-motor skills individualised instruction homework competitive learning simulations expectations912.36

34 Worth having … 14 self-assessment mastery learning creativity programs interactive video psycho-linguistics goals peer influence early intervention outdoor education science inservice ed acceleration motivation

35 The MAJOR Influences … Influence# effectsMean 1 Direct instruction Reciprocal teaching Feedback Cognitive strategy training Classroom behaviour Prior achievement Phonological awareness Home encouragement Piagetian programs Cooperative learning Reading programs Quality of teaching Study skills

36 Identifying that what matters

37 7. PARE: Testing People become test wise and/or may respond different when under test conditions White space and testing in asTTle Testwiseness

38 Test of Objective Evidence Each of the questions in the following set has a logical or "best" answer from its corresponding multiple choice answer set. Please record your eight answers. 1. The purpose of the cluss in 2 Trassig is true when furmpaling is to remove A clump trasses the von Acluss-prags B the viskal flans, if the viskal is B tr s donwil or zortil C cloughs C the belgo fruls D pluomots D dissels lisk easily 3 The sigia frequently overfesks the 4. The fribbled breg will minter best trelsum because with an A all sigias are melliousA derst B all sigias are always votialB morst C the trelsum is usually tariousC sortar D no trelsa are feskableDignu

39 Test of Objective Evidence, Part II 5 The reasons for tristal doss are6 Which of the following is/are always present when trossels are being gruven? A the sabs foped and the doths tinzedA rint and vost B the dredges roted with the cretsB vost C few rakobs were accepted in sluthC shum and vost D most of the polats were thoncedD vost and plone 7 The mintering function of the ignu is most8 effectively carried out in connection with A a razma toiA B the groshing stantolB C the fribbled bregC D a frailly sushD

40 8. PARE Statistical Regression When taking extreme groups the means tend to move to the middle. Why do the tallest fathers have shorter sons, and the shortest fathers have taller sons?

41 …. Regression to the Mean Special Education (e.g., Sesame Street) Effective schools Gifted education

42 9. PARE Response rates The returns of questionnaires/tests/interviews should be high What is typical?

43 Meta-analyses of Response Rates Typical return is 50% Three major factors: 1. Salience (77% vs 42%) 2. Number of follow ups (halve each time) 3. Lack of clutter/ orderliness Not length (ave 7 pages, 72 questions), colour,

44 10. Change scores The difference between post-pre scores Problems 1 Unreliable 2 Are you measuring same thing both times 3 Regression to the mean

45 11 PARE: Experimenter effects Hawthorne effect: Because we know we are in an experiment this alters our responses Hans the Horse Pygmalion in the classroom Christine Rubie’s thesis Stanley Milgrim’s experiment

46 12. PARE: Restriction of range When you choose/focus on a narrow range of abilities (etc.) this can be misleading Picture …



49 13. PARE: Specification of target and accessible sample/population Most experiments are highly local but have general aspirations Often, there are two groups you are generalising to: e.g., all secondary students in NZ, and to all secondary students you have access -- from which to sample


51 14: Interactions The model of individual differences indicates that we should modify our teaching methods to allow for individual differences in the class

52 The art of research design is to devise experiments to identify the explanation and cause of effects – by Maximising the chance that the conclusions are defensible and Minimising the PAREs Such that you have Power to Generalise and Power to Convince

53 Unobtrusive measures Which painting do most people watch? Friendship in cities Racism in suburbs/cities

54 Statistical Methods to assist … Correlation Analysis of variance (anova) Cross-tabulation



57 Comparing means: Magnitude and Chance MagnitudeEffect-sizes ChanceAnalysis of Variance

58 Well-being What are the differences in levels of WELL- BEING among males and females, and between Australia and New Zealand Country * GENDER Cross tabulation Count GENDER MALEFEMALETotal CountryNew Zealand Australia Total

59 AustraliaMnsdEffect-size Male Female Total New Zealand Male Female NZ - Australia0.89

60 anova SourcedfMSFp Country <.001 Gender Country * Gender Error

Download ppt "Experimental, Quasi- experimental, and Single Subject Research 774/801 Sept 1, 2004 John Hattie & Tony Hunt."

Similar presentations

Ads by Google