Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris Hulleman, PhD Joy Lesnick, PhD Vanderbilt University.

Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris Hulleman, PhD Joy Lesnick, PhD Vanderbilt University Presentation for the IES Research Conference Washington, DC June 12, 2008

Overview Session planned as an integrated set of presentations We’ll begin with: –Definitions and distinctions; –Conceptual foundation for assessing fidelity in RCTs, a special case. Two examples of assessing implementation fidelity: –Chris Hulleman will illustrate an assessment for an intervention with a single core component –Joy Lesnick illustrates additional consideration when fidelity assessment is applied to intervention models with multiple program components. Issues for the future Questions and discussion

Definitions and Distinctions

Dimensions Intervention Fidelity Little consensus on what is meant by the term “intervention fidelity”. But Dane & Schneider (1998) identify 5 aspects: –Adherence/compliance– program components are delivered/used/received, as prescribed; –Exposure – amount of program content delivered/received by participants; –Quality of the delivery – theory-based ideal in terms of processes and content; –Participant responsiveness – engagement of the participants; and –Program differentiation – unique features of the intervention are distinguishable from other programs (including the counterfactual)

Distinguishing Implementation Assessment from Implementation Fidelity Assessment Two models of intervention implementation, based on: –A purely descriptive model Answering the question “What transpired as the intervention was put in place (implemented). –An a priori intervention model, with explicit expectations about implementation of core program components. Fidelity is the extent to which the realized intervention (t Tx ) is “faithful” to the pre-stated intervention model (T Tx ) Fidelity = T Tx – t Tx We emphasize this model

What to Measure? Adherence to the intervention model: –(1) Essential or core components (activities, processes); –(2) Necessary, but not unique to the theory/model, activities, processes and structures (supporting the essential components of T); and –(3) Ordinary features of the setting (shared with the counterfactual groups (C) Essential/core and Necessary components are priority parts of fidelity assessment.

An Example of Core Components” Bransford’s HPL Model of Learning and Instruction John Bransford et al. (1999) postulate that a strong learning environment entails a combination of: –Knowledge-centered; –Learner-centered; –Assessment-centered; and –Community-centered components. Alene Harris developed an observation system (the VOS) that registered novel (components above) and traditional pedagogy in classes. The next slide focuses on the prevalence of Bransford’s recommended pedagogy.

Challenge-based Instruction in “Treatment” and Control Courses: The VaNTH Observation System (VOS) Percentage of Course Time Using Challenge- based Instructional Strategies Adapted from Cox & Cordray, in press

Implications Fidelity can be assessed even when there is no known benchmark (e.g., 10 Commandments) –In practice interventions can be a mixture of components with strong, weak or no benchmarks Control conditions can include core intervention components due to: –Contamination –Business as usual (BAU) contains shared components, different levels –Similar theories, models of action But to index “fidelity”, we need to measure components within the control condition

Linking Intervention Fidelity Assessment to Contemporary Models of Causality Rubin’s Causal Model: –True causal effect of X is (Y i Tx – Y i C ) –RCT methodology is the best approximation to the true effect Fidelity assessment within RCT-based causal analysis entails examining the difference between causal components in the intervention and counterfactual condition. Differencing causal conditions can be characterized as “achieved relative strength” of the contrast. –Achieved Relative Strength (ARS) = t Tx – t C –ARS is a default index of fidelity

Achieved Relative Strength =.15 100 90 85 80 75 70 65 60 55 50 Outcome Infidelity “Infidelity” T Tx TCTC.45.40.35.30.25.20.15.10.05.00 Treatment Strength (85)-(70) = 15 tx C t tx Expected Relative Strength =.25

In Practice…. Identify core components in both groups –e.g., via a Model of Change Establish bench marks for T TX and T C ; Measure core components to derive t Tx and t C –e.g., via a “Logic model” based on Model of Change With multiple components and multiple methods of assessment; achieved relative strength needs to be: –Standardized, and –Combined across: Multiple indicators Multiple components Multiple levels (HLM-wise) We turn to our examples….

Assessing Implementation Fidelity in the Lab and in Classrooms: The Case of a Motivation Intervention Chris S. Hulleman Vanderbilt University

PERCEIVED UTILITY VALUE INTEREST PERFORMANCE MANIPULATED RELEVANCE Adapted from: Hulleman (2008); Hulleman, Godes, Hendricks, & Harackiewicz (2008); Hulleman & Harackiewicz (2008); Hulleman, Hendricks, & Harackiewicz (2007); Eccles et al. (1983); Wigfield & Eccles (2002); Hulleman et al. (2008) The Theory of Change

Methods LaboratoryClassroom SampleN = 107 undergraduatesN = 182 ninth-graders 13 classes 8 teachers 3 high schools TaskMental Multiplication Technique Biology, Physical Science, Physics Treatment manipulationWrite about how the mental math technique is relevant to your life. Pick a topic from science class and write about how it relates to your life. Control manipulationWrite a description of a picture from the learning notebook. Pick a topic from science class and write a summary of what you have learned. Number of manipulations12 – 8 Length of Study1 hour1 semester Dependent VariablePerceived Utility Value

Motivational Outcome g = 0.05 (p =.67) ?

Fidelity Measurement and Achieved Relative Strength Simple intervention – one core component Intervention fidelity: –Defined as “quality of participant responsiveness” –Rated on scale from 0 (none) to 3 (high) –2 independent raters, 88% agreement

Quality of Responsiveness LaboratoryClassroom CTxC Quality of Responsiveness N%N%N%N% 04710071186963841 1001524444043 2002946001415 30012190000 Total47100631009010092100 Mean0.001.730.040.74 SD0.000.900.210.71

Indexing Fidelity Absolute –Compare observed fidelity (t Tx ) to absolute or maximum level of fidelity (T Tx ) Average –Mean levels of observed fidelity (t Tx ) Binary –Yes/No treatment receipt based on fidelity scores –Requires selection of cut-off value

Fidelity Indices ConceptualLaboratoryClassroom AbsoluteTx C AverageTx1.730.74 C0.000.04 BinaryTx C

Indexing Fidelity as Achieved Relative Strength Intervention Strength = Treatment – Control Achieved Relative Strength (ARS) Index Standardized difference in fidelity index across Tx and C Based on Hedges’ g (Hedges, 2007) Corrected for clustering in the classroom (ICC’s from.01 to.08)

Average ARS Index Where, = mean for group 1 (t Tx ) = mean for group 2 (t C ) S T = pooled within groups standard deviation n Tx = treatment sample size n C = control sample size n = average cluster size p = Intra-class correlation (ICC) N = total sample size Group DifferenceSample Size Adjustment Clustering Adjustment

Absolute and Binary ARS Indices Where, p Tx = proportion for the treatment group (t Tx ) p C = proportion for the control group (t C ) n Tx = treatment sample size n C = control sample size n = average cluster size p = Intra-class correlation (ICC) N = total sample size Group DifferenceSample Size Adjustment Clustering Adjustment

Achieved Relative Strength = 1.32 Fidelity Infidelity “Infidelity” T Tx TCTC 100 66 33 0 Treatment Strength tCtC t tx 32103210 Average ARS Index (0.74)-(0.04) = 0.70

Achieved Relative Strength Indices Observed Fidelity Lab vs. Class Contrasts LabClassLab - Class AbsoluteTx0.580.25 C0.000.01 g1.720.80 0.92 AverageTx 1.730.74 C 0.000.04 g 2.521.32 1.20 BinaryTx0.650.15 C0.00 g1.880.801.08

Linking Achieved Relative Strength to Outcomes

Sources of Infidelity in the Classroom Student behaviors were nested within teacher behaviors Teacher dosage Frequency of responsiveness Student and teacher behaviors were used to predict treatment fidelity (i.e., quality of responsiveness).

Sources of Infidelity: Multi-level Analyses Part I: Baseline Analyses Identified the amount of residual variability in fidelity due to students and teachers. –Due to missing data, we estimated a 2-level model (153 students, 6 teachers) Student:Y ij = b 0j + b 1j (TREATMENT) ij + r ij, Teacher:b 0j = γ 00 + u 0j, b 1j = γ 10 + u 10j

Sources of Infidelity: Multi-level Analyses Part II: Explanatory Analyses Predicted residual variability in fidelity (quality of responsiveness) with frequency of responsiveness and teacher dosage Student:Y ij = b 0j + b 1 (TREATMENT) ij + b 2 (RESPONSE FREQUENCY) ij + r ij Teacher:b 0j = γ 00 + u 0j b 1j = γ 10 + b 10 (TEACHER DOSAGE) j + u 10j b 2j = γ 20 + b 20 (TEACHER DOSAGE) j + u 20j

Sources of Infidelity: Multi-level Analyses Baseline ModelExplanatory Model Variance Component Residual Variance % of TotalVariance % Reduction Level 1 (Student) 0.15437*520.15346* < 1 Level 2 (Teacher) 0.13971*480.04924 65* Total 0.294080.20270 * p <.001.

Case Summary The motivational intervention was more effective in the lab (g = 0.45) than field (g = 0.05). Using 3 indices of fidelity and, in turn, achieved relative treatment strength, revealed that: –Classroom fidelity < Lab fidelity –Achieved relative strength was about 1 SD less in the classroom than the laboratory Differences in achieved relative strength = differences motivational outcome, especially in the lab. Sources of fidelity: teacher (not student) factors

Joy Lesnick Assessing Fidelity of Interventions with Multiple Components: A Case of Assessing Preschool Interventions

33 What Do We Mean By Multiple Components in Preschool Literacy Programs? How do you define preschool instruction? –Academic content, materials, student-teacher interactions, student-student interactions, physical development, schedules & routines, assessment, family involvement, etc. etc. How would you measure implementation? –Preschool Interventions: Are made up of components (e.g., sets of activities and processes) that can be thought of as constructs; These constructs vary in meaning, across actors (e.g., developers, implementers, researchers); They are of varying levels of importance within the intervention; and These constructs are made up of smaller parts that need to be assessed. –Multiple components makes assessing fidelity more challenging

34 Overview Four areas of consideration when assessing fidelity of programs with multiple components: 1. Specifying Multiple Components 2. Major Variations in Program Components 3. The ABCs of Item and Scale Construction 4. Aggregating Indices One caveat: Very unusual circumstances Goal of this work: –To build on the extensive evaluation work that had already been completed and use the case study to provide a framework for future efforts to measure fidelity of implementation.

35 1. Specifying Multiple Components Our Process Extensive review of program materials Potentially hundreds of components How many indicators do we need to assess fidelity?

36 1. Specifying Multiple Components Interactions between teacher and child Physical Environment Routines and classroom management Instruction Assessment Family Involvement Materials Content Processes Social & Personal Development Healthful Living Scientific Thinking Social Studies Creative Arts Physical Development Technology Math Literacy Structured lessons Structured units Letter and word recognition Book and print awareness Phonemic awareness Language, comprehension, response to text Oral Language 12341234 12341234 12341234 12341234 12341234 Writing 12341234 Constructs Sub-Constructs Facets Elements Indicators

37 Conceptual differences between programs may happen at micro-levels Empirical differences between program implementation may happen at more macro levels Theoretically expected differences vs. empirically observed differences –Must identify conceptual differences between programs at the smallest grain size at the outset, although may be able to detect empirical differences once implemented at higher macro levels Grain Size is Important

38 2. Major Variations in Program Components One program often has some combination of these different types of components: –Scripted (highly structured) activities –Unscripted (unstructured) activities Nesting of activities –Micro-level (discrete) activities –Macro-level (extended) activities What you’re trying to measure will influence how to measure it -- and how often it needs to be measured.

39 2. Major Variations in Program Components Type of Program Component Example from the Case Study ImplicationsAbsAvgBinARS Scripted (highly structured) activities In the first treatment condition, four scripted literacy circles are required. There is known criteria for assessing fidelity. Fidelity is the difference between the expected and observed values T Tx – t Tx Yes ? Unscripted (unstructured) activities In the second treatment condition, literacy circles are required, but the specific content of those group meetings is not specified. There is unknown criteria for assessing fidelity. We can only record what was done, or in comparison to control t Tx No?Yes??Yes Abs:“Absolute Fidelity” Index: what happened as compared to what should have happened – highest standard Avg: Magnitude or exposure level; indicates what happened, but it’s not very meaningful – how do we know if level is good or bad? Bin:Binary Complier: Can we set a benchmark to determine whether or not program component was successfully implemented? >30% for example? Is that realistic? Meaningful? ARS :Difference in magnitude between Tx and C – relative strength – is there enough difference to warrant a treatment effect?

Dots under a microscope – what is it???

Starry Night, Vincent Van Gogh, 1889

42 We must measure the trees… and also the forest… Micro-level (discrete) activities –Depending on the condition, daily activities (i.e. whole group time, small group time, center activities) may be scripted or unscripted and take place within larger structure of theme under study. Macro-level (extended) activities –Month long thematic unit (is structured in treatment condition and unstructured in control) is underlying extended structure within which scripted or unscripted micro activities take place. In multi-component programs, many activities are nested within larger activity structures. This nesting has implications for fidelity analysis – what to measure and how to measure it.

43 3. The ABCs of Item and Scale Construction Aim for one-to-one correspondence of indicators to component of interest Balance items across components Coverage and quality are more important than the quantity of items

44 3.Aim for one-to-one correspondence Example of more than one component being assessed in one item: –[Does the teacher] Talk with children throughout the day, modeling correct grammar, teaching new vocabulary, and asking questions to encourage children to express their ideas in words? (Yes/No) Example of one component being measured in each item: –Teacher provides an environment wherein students can talk about what they are doing. –Teacher listens attentively to students’ discussions and responses. –Teacher models and/or encourages students to ask questions during class discussions. Diff bw T & C (Oral Lang)*: T: 1.80 (0.32) C: 1.36 (0.32) ARS ES: 1.38 T: 3.45 (0.87) C: 2.26 (0.57) ARS ES: 1.62 *Data for the case study comes from an evaluation conducted by Dale Farran, Mark Lipsey, Carol Blibrey, et al.

45 3.Balance items across components How many items are needed for each scale? Oral-language over represented Scales with α<0.80 not reliable Literacy Content#itemsα Oral language200.95 Language, comprehension, and response to text 70.70 Book and print awareness 20.80 Phonemic awareness30.68 Letter and word recognition 70.76 Writing60.67 Literacy Processes: Thematic Studies40.62 Structured Literacy Circles 20.62

46 3.Coverage and quality more important than quantity Literacy Content#itemsα Oral language200.95 Language, comprehension, and response to text 70.70 Book and print awareness 20.80 Phonemic awareness30.68 Letter and word recognition 70.76 Writing60.67 Literacy Processes: Thematic Studies40.62 Structured Literacy Circles 20.62 Two scales each have 2 items, but very different levels of reliability How many items are needed for each scale? Oral Language: 20 items. Randomly selected items and recalculated alpha: –10 items: α = 0.92 –8 items: α = 0.90 –6 items: α = 0.88 –5 items: α = 0.82 –4 items: α = 0.73

47 4.Aggregating Indices To weight or not to weight? How do we decide? Possibilities: –Theory –Consensus –$$ spent –Time spent Case study example – 2 levels of aggregation within and between: –Unit-weight within facet: “Instruction – Content – Literacy” –Hypothetical weight across sub-construct: “Instruction – Content”

48 YOU ARE HERE…. Interactions between teacher and child Physical Environment Routines and classroom management Instruction Assessment Family Involvement Materials Content Processes Social & Personal Development Healthful Living Scientific Thinking Social Studies Creative Arts Physical Development Technology Math Literacy Structured lessons Structured units Letter and word recognition Book and print awareness Phonemic awareness Language, comprehension, response to text Oral Language 12341234 12341234 12341234 12341234 12341234 Writing 12341234 UNIT WEIGHT THEORY WEIGHT HOW WEIGHT?

49 4.Aggregating Indices Literacy ContentAverage Fidelity Index: Tx Average Fidelity Index: C “Absolute Fidelity” Index: Tx “Absolute Fidelity” Index: C Achieved Relative Strength Fidelity Index (Average) Achieved Relative Strength Fidelity Index (Absolute) Oral language 1.821.4091%70%1.360.53 Language, comprehension, and response to text 1.741.3787%69%1.450.44 Book and print awareness 1.911.3996%70%1.380.73 Phonemic awareness 1.731.4887%74%0.740.32 Letter and word recognition 1.751.3688%68%1.910.50 Writing 1.681.3784%69%1.220.34 Average – unit weighting 1.771.3889%75%1.340.48 **clustering is ignored  Unit-weight within facet: Instruction – Content – Literacy

4.Aggregating Indices Instruction - ContentTreatmentControlHypothetical Weight Literacy1.771.3840% Math1.511.805% Social and Personal Development1.791.5835% Scientific Thinking1.571.715% Social Studies1.841.415% Creative Arts1.661.325% Physical Development1.451.503% Technology1.451.572% 100% Unweighted Average1.631.53 Weighted Average1.741.49  Theory-weight across sub-construct (hypothetical)

51 YOU ARE HERE … Interactions between teacher and child Physical Environment Routines and classroom management Instruction Assessment Family Involvement Materials Content Processes Social & Personal Development Healthful Living Scientific Thinking Social Studies Creative Arts Physical Development Technology Math Literacy Structured lessons Structured units Letter and word recognition Book and print awareness Phonemic awareness Language, comprehension, response to text Oral Language 12341234 12341234 12341234 12341234 12341234 Writing 12341234 UNIT WEIGHT THEORY WEIGHT HOW WEIGHT?

Key Points and Future Issues Identifying and measuring, at a minimum, should include model-based core and necessary components; Collaborations among researchers, developers and implementers is essential for specifying: –Intervention models; –Core and essential components; –Benchmarks for T Tx (e.g., an educationally meaningful dose; what level of X is needed to instigate change); and –Tolerable adaptation

Points and Issues Fidelity assessment serves two roles: –Average causal difference between conditions; and –Using fidelity measures to assess the effects of variation in implementation on outcomes. Should minimize “infidelity” and weak ARS: –Pre-experimental assessment of T Tx in the counterfactual condition…Is T Tx > T C ? –Build operational models with positive implementation drivers Post-experimental (re)specification of the intervention: For example: – MAP ARS =.3(planned prof.development)+.6(planned use of data for differentiated instruction)

Points and Issues What does an ARS of 1.20 mean? We need experience and a normative framework: –Cohen defined a small effect on outcomes as 0.20; medium as 0.50, and large as 0.80 –Overtime this may emerge for ARS

Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris Hulleman, PhD Joy Lesnick, PhD Vanderbilt University.

Similar presentations

Presentation on theme: "Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris Hulleman, PhD Joy Lesnick, PhD Vanderbilt University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris Hulleman, PhD Joy Lesnick, PhD Vanderbilt University.

Similar presentations

Presentation on theme: "Assessing Intervention Fidelity in RCTs: Concepts and Methods Panelists: David S. Cordray, PhD Chris Hulleman, PhD Joy Lesnick, PhD Vanderbilt University."— Presentation transcript:

Similar presentations

About project

Feedback