Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.

Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008

2 The Sample of Projects First level of screening 1. Final year – APR start and end dates. 2. Type of evaluation design: Experimental (random assignment) Quasi-experimental – Comparison group study with equating – Regression-discontinuity study

3 Findings – Final Year Started with 124 projects. Ended with 88 projects (results are not final). Projects eliminated based on: – Evidence in project narrative or evaluation report. – Information provided by the Project Director.

4 Evaluation Design Type of DesignStarting NumberEnding Number Experimental3*0 Quasi- Experimental 47* 19* *Results are not final.

5 Findings - Quasi-Experimental Designs Many studies had a one-group pre-/post design (eliminated). In many treatment/comparison group studies, the comparison teachers were in the same school at the same grade level as the treatment teachers (not eliminated).

6 Applying the Rubric Challenges Projects used different designs to evaluate different outcomes (e.g., content knowledge, pedagogy, efficacy) Projects used different designs to evaluate different participant groups (e.g., teachers, students) Projects used different designs at different grade levels or for different instruments.

7 Applying the Rubric Solution Identify each measured outcome and group (e.g., 5 th grade teachers – earth science content knowledge). Apply the rubric to each outcome/group combination that was evaluated using an experimental or a quasi- experimental design

8 Applying the Rubric A. Baseline Equivalence of Groups (Quasi- Experimental Only) Criterion: No significant pre-intervention differences between treatment and comparison on variables related to the study’s key outcomes; or Adequate steps were taken to address the lack of baseline equivalence in the statistical analysis.

9 Applying the Rubric Common Issues: No pre-test information on outcome-related measures. Within groups pre-test results given for the treatment and comparison groups, but no tests of between groups differences. Projects match groups on unit of assignment (e.g., schools, teachers), but do not provide data on unit of assessment (e.g., teachers, students).

10 Applying the Rubric Recommendation: Baseline Equivalence Participant Group and Outcome Treatment Pre-test Comparison Pre-test p-value mean or percent

11 Applying the Rubric B. Sample Size Criterion: Sample size was adequate – Based on a power analysis with recommended: significance level = 0.05 power = 0.8 minimum detectable effect informed by the literature or otherwise justified

12 Applying the Rubric Common Issues: Power analyses rarely conducted. Different sample sizes given throughout the APR and Evaluation Report. Sample sizes in the APR and Evaluation Report do not match. Report sample size for teachers but not for students, or for students but not for teachers. Subgroup sizes: – are not reported – reported inconsistently – vary by discipline, subdiscipline (e.g., earth science, physical science), and/or grade level

13 Applying the Rubric Recommendation: Sample Size Participant Group and Outcome Treatment (Final sample size) Comparison (Final sample size) Power Calculation Assumptions (if available) NNAlpha = Power = MDE = NN Recommended significance levels: alpha = 0.05, power = 0.8, Minimal Detectable Effect (MDE) = informed by the literature.

14 C. Quality of the Data Collection Methods Criterion: The study used existing data collection instruments that had already been deemed valid and reliable to measure key outcomes; or The study used data collection instruments developed specifically for the study that were sufficiently pre-tested with subjects who were comparable to the study sample.

15 Applying the Rubric Common Issues: Locally developed instruments not tested for validity or reliability. Identify an instrument in the APR and select “not tested for validity or reliability,” but a Google search shows that the instrument has been tested for validity and reliability. Use many instruments but do not report validity or reliability for all of them. Do not provide results for all instruments.

16 Applying the Rubric Recommendation: Data Collection Instruments Participant Group and Outcome Name of InstrumentEvidence for Validity and Reliability Teacher content knowledge – math DTAMS{cite website or other reference were evidence can be found} Teacher content knowledge – marine biology Locally developed instrument Narrative description of the evidence Teacher content knowledge - science Borrowed items from [instrument name(s)]. Total # of items. # of items borrowed from each instrument.

17 Applying the Rubric D. Quality of the Data Collection Methods Criterion: The methods, procedures, and timeframes used to collect the key outcome data from treatment and comparison groups were the same.

18 Applying the Rubric Common Issues: Little to no information is provided in general about data collection. Information is provided for the treatment group but for not the comparison group. Treatment teachers typically receive the pre-test before the summer institute and a post-test at the end of the summer institute, and sometime another post-test at the end of the school year. Comparison teachers receive a pre-test at the beginning of the school year and a post-test at the end of the school year. Comparison teachers receive a single test at the beginning of the year.

19 Applying the Rubric Recommendation: Quality of Data Collection Methods 1. Participant Group and Outcome ______________ A. Method/procedure for collecting data from treatment group (describe): B. Was the same method/procedure used to collect data from the comparison group? ___ Yes ___ No If no, please describe how the method/procedure was different: (continued)

20 Applying the Rubric C. Time Frame Participant Group and Outcome Month and Year Pre-testPost-testRepeated Post-test Treatment group Comparison Group

21 Applying the Rubric E. Data Reduction Rates Criterion: The study measured the key outcome variable(s) in the post-tests for at least 70% of the original study sample (treatment and comparison groups combined) 0r there is evidence that the high rates of data reduction were unrelated to the intervention; AND The proportion of the original study sample that was retained in the follow- up data collection activities (e.g., post-intervention surveys) and/or for whom post-intervention data were provided (e.g., test scores) was similar for both the treatment and comparison groups (i.e., less than or equal to a 15% difference), Or the proportion of the original study sample that was retained in the follow-up data collection was different for the treatment and comparison groups, and sufficient steps were taken to address this differential attrition were not taken in the statistical analysis.

22 Applying the Rubric Common Issues: Attrition information is typically not reported. Abt can sometimes calculate attrition, but it is difficult because sample and subsample sizes are not reported consistently. If projects provide data on attrition or if Abt can calculate it, it is usually for the treatment group only. Projects rarely provide data on student attrition, though some occasionally mention that there is a lot of student mobility, but it is not quantified.

23 Applying the Rubric Recommendation: Data Reduction Rates Participant Group and Outcome Original Sample Size Pre-test sample size Post- test sample size Post-test N/Pre-test N Post-test N/ Original N Treatment Comparison

24 Applying the Rubric E. Relevant Data Criterion: The final report includes treatment and comparison group post-test means and tests of significance for key outcomes; or Provides sufficient information for calculation of statistical significance (e.g., mean, sample size, standard deviation/standard error).

25 Applying the Rubric Common Issues: Projects reports that the results were significant or non-significant but do not provide supporting data. Projects provide p-values but do not provide means or percents. Projects provide means/percents, p-values, but not standard deviations. Projects provide within group data for the treatment and comparison groups but do not provide between-group tests of significance. Projects with treatment and comparison groups provide data for the treatment group only. Projects provide significant results but do not identify the type of statistical test they performed. Projects provide an overwhelming amount of data for a large number of subgroups (e.g., on individual test or survey items).

26 Applying the Rubric Recommendation: Relevant Data Participant Group and Outcome Mean or Percent SD or SEt, F, or Chi square p-value Treatment Comparison

Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.

Similar presentations

Presentation on theme: "Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008.

Similar presentations

Presentation on theme: "Classifying Designs of MSP Evaluations Lessons Learned and Recommendations Barbara E. Lovitts June 11, 2008."— Presentation transcript:

Similar presentations

About project

Feedback