Presentation is loading. Please wait.

Presentation is loading. Please wait.

EDUCATION RESEARCH MEETS THE GOLD STANDARD: STATISTICS, EDUCATION, AND RESEARCH METHODS AFTER NO CHILD LEFT BEHIND Mack C. Shelley, II Iowa State University.

Similar presentations


Presentation on theme: "EDUCATION RESEARCH MEETS THE GOLD STANDARD: STATISTICS, EDUCATION, AND RESEARCH METHODS AFTER NO CHILD LEFT BEHIND Mack C. Shelley, II Iowa State University."— Presentation transcript:

1 EDUCATION RESEARCH MEETS THE GOLD STANDARD: STATISTICS, EDUCATION, AND RESEARCH METHODS AFTER NO CHILD LEFT BEHIND Mack C. Shelley, II Iowa State University mshelley@iastate.edu Presented at the Joint Statistical Meetings, August 7-11, 2005, Minneapolis, MN

2 Background This session is meant to help inform the national debate over the role of scientific standards for research in education, particularly as those research standards are influenced by statistical methods and theory. This session is meant to help inform the national debate over the role of scientific standards for research in education, particularly as those research standards are influenced by statistical methods and theory. This session builds on a National Science Foundation award to myself and Brian Hand (University of Iowa). This session builds on a National Science Foundation award to myself and Brian Hand (University of Iowa).

3 Background The panel is designed to meld research interests in statistics, education, and related disciplines, and to discuss the dramatically changing context of contemporary education research. The panel is designed to meld research interests in statistics, education, and related disciplines, and to discuss the dramatically changing context of contemporary education research. Why, exactly, is the context changing for statistical research in education? Why, exactly, is the context changing for statistical research in education?

4 Background Standards for acceptable research in education are affected greatly by: Standards for acceptable research in education are affected greatly by: the recent creation of the Institute of Education Sciences in the U.S. Department of Education the recent creation of the Institute of Education Sciences in the U.S. Department of Education passage of the No Child Left Behind Act of 2001, and passage of the No Child Left Behind Act of 2001, and Passage of the Education Sciences Reform Act (H.R. 3801) in 2002 Passage of the Education Sciences Reform Act (H.R. 3801) in 2002

5 Background Together, these developments Together, these developments have reconstituted federal support for research and dissemination of information in education have reconstituted federal support for research and dissemination of information in education are meant to foster scientifically valid research, and are meant to foster scientifically valid research, and have established what is referred to as the gold standard for research in education. have established what is referred to as the gold standard for research in education.

6 Background These and other developments denote that greater education research emphasis now is placed on These and other developments denote that greater education research emphasis now is placed on quantification, quantification, the use of randomized trials, and the use of randomized trials, and the selection of valid control groups the selection of valid control groups

7 Background This panel is intended to be part of a sustained and expanded dialogue This panel is intended to be part of a sustained and expanded dialogue between the statistical community and those who implement the education research agenda between the statistical community and those who implement the education research agenda through a discussion of whether and how to implement the new standards for statistical work in the field of education research through a discussion of whether and how to implement the new standards for statistical work in the field of education research

8 What Is The Gold Standard? U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance U.S. Department of Education, Institute of Education Sciences, National Center for Education Evaluation and Regional Assistance Identifying and implementing educational practices supported by rigorous evidence: A user friendly guide Identifying and implementing educational practices supported by rigorous evidence: A user friendly guide http://www.ed.gov/about/offices/list/ies/ news.html#guide http://www.ed.gov/about/offices/list/ies/ news.html#guide http://www.ed.gov/about/offices/list/ies/ news.html#guide http://www.ed.gov/about/offices/list/ies/ news.html#guide

9 What Is The Gold Standard? This publication emphasizes: This publication emphasizes: evidence-based interventions evidence-based interventions educational outcomes that have been found to be effective in randomized controlled trials educational outcomes that have been found to be effective in randomized controlled trials researchs gold standard for establishing what works researchs gold standard for establishing what works following patterns of evidence use in medicine and welfare policy following patterns of evidence use in medicine and welfare policy

10 What Is The Gold Standard? The quality of studies needed to establish strong evidence requires The quality of studies needed to establish strong evidence requires randomized controlled trials that are well- designed and implemented randomized controlled trials that are well- designed and implemented that the quantity of evidence needed spans trials showing effectiveness in two or more typical school settings that the quantity of evidence needed spans trials showing effectiveness in two or more typical school settings including a setting similar to that of schools/classrooms including a setting similar to that of schools/classrooms

11 What Is The Gold Standard? Possible evidence may include Possible evidence may include randomized controlled trials whose quality/quantity are good but fall short of strong evidence randomized controlled trials whose quality/quantity are good but fall short of strong evidence and/or comparison-group studies in which the intervention and comparison groups are very closely matched and/or comparison-group studies in which the intervention and comparison groups are very closely matched in academic achievement, demographics, and other characteristics in academic achievement, demographics, and other characteristics

12 What Is The Gold Standard? Evaluating whether an intervention is backed by strong evidence of effectiveness hinges on Evaluating whether an intervention is backed by strong evidence of effectiveness hinges on well-designed and well-implemented randomized controlled trials well-designed and well-implemented randomized controlled trials demonstrating that there are no systematic differences between intervention and control groups before the intervention demonstrating that there are no systematic differences between intervention and control groups before the intervention the use of measures and instruments of proven validity the use of measures and instruments of proven validity real-world objective measures of the outcomes the intervention is designed to affect real-world objective measures of the outcomes the intervention is designed to affect attrition of no more than 25% of the original sample attrition of no more than 25% of the original sample effect size combined with statistical significance effect size combined with statistical significance an adequate sample size to achieve statistical significance an adequate sample size to achieve statistical significance controlled trials implemented in more than one site in schools that represent a cross-section of all schools controlled trials implemented in more than one site in schools that represent a cross-section of all schools

13 No Child Left Behind Public Law 107–110 [H.R. 1] Public Law 107–110 [H.R. 1] passed on January 8, 2002 passed on January 8, 2002 An Act to close the achievement gap with accountability, flexibility, and choice, so that no child is left behind An Act to close the achievement gap with accountability, flexibility, and choice, so that no child is left behind the No Child Left Behind Act of 2001 (NCLB) the No Child Left Behind Act of 2001 (NCLB) established standards for academic assessments in mathematics, reading or language arts, and science established standards for academic assessments in mathematics, reading or language arts, and science multiple up-to-date measures of student academic achievement, including measures that assess higher-order thinking skills and understanding multiple up-to-date measures of student academic achievement, including measures that assess higher-order thinking skills and understanding These requirements for program assessment lead to many opportunities and circumstances for the application of statistical methods. These requirements for program assessment lead to many opportunities and circumstances for the application of statistical methods.

14 No Child Left Behind The research program under NCLB was designed to examine the effect of the assessment and accountability systems on students, teachers, parents, families, schools, school districts, and States, including correlations between such systems and The research program under NCLB was designed to examine the effect of the assessment and accountability systems on students, teachers, parents, families, schools, school districts, and States, including correlations between such systems and student academic achievement student academic achievement progress toward meeting the State-defined level of proficiency progress toward meeting the State-defined level of proficiency progress toward closing achievement gap changes in course offerings, teaching practices, course content, and instructional material progress toward closing achievement gap changes in course offerings, teaching practices, course content, and instructional material teacher, principal, and pupil-services personnel turnover rates teacher, principal, and pupil-services personnel turnover rates student dropout, grade-retention, and graduation rates student dropout, grade-retention, and graduation rates students with disabilities students with disabilities student socioeconomic status student socioeconomic status level of student English proficiency level of student English proficiency student ethnicity and race student ethnicity and race

15 The Education Sciences Reform Act and IES The Education Sciences Reform Act The Education Sciences Reform Act An Act to provide for improvement of Federal education research, statistics, evaluation, information, and dissemination, and for other purposes An Act to provide for improvement of Federal education research, statistics, evaluation, information, and dissemination, and for other purposes H.R. 3801, passed January 23, 2002 H.R. 3801, passed January 23, 2002 reconstituted federal support for research and dissemination of information in education, to foster scientifically valid research reconstituted federal support for research and dissemination of information in education, to foster scientifically valid research established the Institute of Education Sciences (IES) established the Institute of Education Sciences (IES) replacing the Office of Educational Research and Improvement replacing the Office of Educational Research and Improvement part of the Department of Education but functioning separately from it part of the Department of Education but functioning separately from it

16 The Education Sciences Reform Act and IES IES is the research arm of the Department of Education IES is the research arm of the Department of Education Mission is to expand knowledge and provide information on Mission is to expand knowledge and provide information on the condition of education the condition of education practices that improve academic achievement practices that improve academic achievement the effectiveness of Federal and other education programs the effectiveness of Federal and other education programs Goal Goal the transformation of education into an evidence-based field in which decision makers routinely seek out the best available research and data before adopting programs or practices that will affect significant numbers of students the transformation of education into an evidence-based field in which decision makers routinely seek out the best available research and data before adopting programs or practices that will affect significant numbers of students Consists of Consists of Grover J. (Russ) Whitehurst, first Director, since November 2002 Grover J. (Russ) Whitehurst, first Director, since November 2002 Office of the Director Office of the Director National Center for Education Research National Center for Education Research National Center for Education Statistics National Center for Education Statistics National Center for Education Evaluation and Regional Assistance National Center for Education Evaluation and Regional Assistance National Center for Special Education Research National Center for Special Education Research

17 The Education Sciences Reform Act and IES HR 3801 defined Scientifically based research standards to HR 3801 defined Scientifically based research standards to apply rigorous, systematic, and objective methodology to obtain reliable and valid knowledge relevant to education activities and programs apply rigorous, systematic, and objective methodology to obtain reliable and valid knowledge relevant to education activities and programs present findings and make claims that are appropriate to and supported by the methods that have been employed present findings and make claims that are appropriate to and supported by the methods that have been employed

18 The Education Sciences Reform Act and IES Scientifically based research also includes Scientifically based research also includes employing systematic, empirical methods that draw on observation or experiment employing systematic, empirical methods that draw on observation or experiment involving data analyses that are adequate to support the general findings involving data analyses that are adequate to support the general findings relying on measurements or observational methods that provide reliable data relying on measurements or observational methods that provide reliable data making claims of causal relationships only in random assignment experiments or other designs (to the extent such designs substantially eliminate plausible competing explanations for the obtained results) making claims of causal relationships only in random assignment experiments or other designs (to the extent such designs substantially eliminate plausible competing explanations for the obtained results) ensuring that studies and methods are presented in sufficient detail and clarity to allow for replication or, at a minimum, to offer the opportunity to build systematically on the findings of the research ensuring that studies and methods are presented in sufficient detail and clarity to allow for replication or, at a minimum, to offer the opportunity to build systematically on the findings of the research obtaining acceptance by a peer-reviewed journal or approval by a panel of independent experts through a comparably rigorous, objective, and scientific review obtaining acceptance by a peer-reviewed journal or approval by a panel of independent experts through a comparably rigorous, objective, and scientific review using research designs and methods appropriate to the research question posed using research designs and methods appropriate to the research question posed

19 The Education Sciences Reform Act and IES Scientifically valid education evaluation means an evaluation that Scientifically valid education evaluation means an evaluation that adheres to the highest possible standards of quality with respect to research design and statistical analysis adheres to the highest possible standards of quality with respect to research design and statistical analysis provides an adequate description of the programs evaluated and, to the extent possible, examines the relationship between program implementation and program impacts provides an adequate description of the programs evaluated and, to the extent possible, examines the relationship between program implementation and program impacts provides an analysis of the results achieved by the program with respect to its projected effects provides an analysis of the results achieved by the program with respect to its projected effects employs experimental designs using random assignment, when feasible, and other research methodologies that allow for the strongest possible causal inferences when random assignment is not feasible employs experimental designs using random assignment, when feasible, and other research methodologies that allow for the strongest possible causal inferences when random assignment is not feasible may study program implementation through a combination of scientifically valid and reliable methods may study program implementation through a combination of scientifically valid and reliable methods

20 What Works What Works Clearinghouse (WWC) What Works Clearinghouse (WWC) established in 2002 by IES established in 2002 by IES to provide educators, policymakers, and the public with a central and trusted source of scientific evidence of what works in education to provide educators, policymakers, and the public with a central and trusted source of scientific evidence of what works in education administered by the U.S. Department of Education, through a contract to a joint venture of the American Institutes for Research and the Campbell Collaboration administered by the U.S. Department of Education, through a contract to a joint venture of the American Institutes for Research and the Campbell Collaboration reviews and reports on existing studies of interventions (education programs, products, practices, and policies) in selected topic areas reviews and reports on existing studies of interventions (education programs, products, practices, and policies) in selected topic areas apply standards that follow scientifically valid criteria for determining the effectiveness of these interventions apply standards that follow scientifically valid criteria for determining the effectiveness of these interventions Technical Advisory Group (TAG) Technical Advisory Group (TAG) leading experts in research design, program evaluation, and research synthesis leading experts in research design, program evaluation, and research synthesis advises on the standards for evaluation research reviews advises on the standards for evaluation research reviews monitors and informs the methodological aspects of WWC reviews and reports monitors and informs the methodological aspects of WWC reviews and reports www.whatworks.ed.gov

21 What Works - TAG Dr. Larry V. Hedges, Chairperson, Stella M. Rowley Professor of Education, Psychology, Public Policy Studies, and Sociology, University of Chicago, and editorial board member of the American Journal of Sociology, the Review of Educational Research, and Psychological Bulletin. Dr. Larry V. Hedges, Chairperson, Stella M. Rowley Professor of Education, Psychology, Public Policy Studies, and Sociology, University of Chicago, and editorial board member of the American Journal of Sociology, the Review of Educational Research, and Psychological Bulletin. Dr. Larry V. Hedges, Chairperson Dr. Larry V. Hedges, Chairperson Dr. Betsy Jane Becker, Professor of Measurement and Quantitative Methods, College of Education, Michigan State University. Dr. Betsy Jane Becker, Professor of Measurement and Quantitative Methods, College of Education, Michigan State University. Dr. Betsy Jane Becker Dr. Betsy Jane Becker Dr. Jesse A. Berlin, Professor of Biostatistics, University of Pennsylvania School of Medicine, and Director of Biostatistics at the university's Comprehensive Cancer Center. Dr. Jesse A. Berlin, Professor of Biostatistics, University of Pennsylvania School of Medicine, and Director of Biostatistics at the university's Comprehensive Cancer Center. Dr. Jesse A. Berlin Dr. Jesse A. Berlin Dr. Douglas Carnine, Professor of Education, University of Oregon, and Director of the National Center to Improve the Tools of Educators. Dr. Douglas Carnine, Professor of Education, University of Oregon, and Director of the National Center to Improve the Tools of Educators. Dr. Douglas Carnine Dr. Douglas Carnine Dr. Thomas D. Cook, Professor of Sociology, Psychology, Education and Social Policy, Northwestern University, and Faculty Fellow at the Institute for Policy Research. Dr. Thomas D. Cook, Professor of Sociology, Psychology, Education and Social Policy, Northwestern University, and Faculty Fellow at the Institute for Policy Research. Dr. Thomas D. Cook Dr. Thomas D. Cook Dr. David J. Francis, Professor of Quantitative Methods, Chairman of the Department of Psychology, and Director of the Texas Institute for Measurement, Evaluation, and Statistics, University of Houston. Dr. David J. Francis, Professor of Quantitative Methods, Chairman of the Department of Psychology, and Director of the Texas Institute for Measurement, Evaluation, and Statistics, University of Houston. Dr. David J. Francis Dr. David J. Francis Dr. Robert L. Linn, distinguished Professor of Education, University of Colorado at Boulder, and Co-Director of the National Center for Research on Evaluation, Standards, and Student Testing. Dr. Robert L. Linn, distinguished Professor of Education, University of Colorado at Boulder, and Co-Director of the National Center for Research on Evaluation, Standards, and Student Testing. Dr. Robert L. Linn Dr. Robert L. Linn Dr. Mark W. Lipsey, Senior Research Associate, Vanderbilt Institute for Public Policy Studies, and Director of the Center for Evaluation Research and Methodology. Dr. Mark W. Lipsey, Senior Research Associate, Vanderbilt Institute for Public Policy Studies, and Director of the Center for Evaluation Research and Methodology. Dr. Mark W. Lipsey Dr. Mark W. Lipsey Dr. David Myers, Senior Fellow, Mathematica Policy Research, and former Director of the U.S. Department of Education's national evaluation of Upward Bound. Dr. David Myers, Senior Fellow, Mathematica Policy Research, and former Director of the U.S. Department of Education's national evaluation of Upward Bound. Dr. David Myers Dr. David Myers Dr. Andrew C. Porter, Patricia and Rodes Hart Professor of Educational Leadership and Policy and Director of the Learning Sciences Institute at Vanderbilt University. Dr. Andrew C. Porter, Patricia and Rodes Hart Professor of Educational Leadership and Policy and Director of the Learning Sciences Institute at Vanderbilt University. Dr. Andrew C. Porter Dr. Andrew C. Porter Dr. David Rindskopf, Professor of Psychology and Educational Psychology, City University of New York Graduate Center, and elected Fellow of the American Statistical Association. Dr. David Rindskopf, Professor of Psychology and Educational Psychology, City University of New York Graduate Center, and elected Fellow of the American Statistical Association. Dr. David Rindskopf Dr. David Rindskopf Dr. Cecilia E. Rouse, Professor of Economics and Public Affairs, and joint appointee in the Economics Department and Woodrow Wilson School, Princeton University. Dr. Cecilia E. Rouse, Professor of Economics and Public Affairs, and joint appointee in the Economics Department and Woodrow Wilson School, Princeton University. Dr. Cecilia E. Rouse Dr. Cecilia E. Rouse Dr. William R. Shadish, Founding Faculty and Professor of Social Sciences, Humanities, and Arts at the University of California, Merced. Dr. William R. Shadish, Founding Faculty and Professor of Social Sciences, Humanities, and Arts at the University of California, Merced. Dr. William R. Shadish Dr. William R. Shadish

22 What Works Current Topics The What Works Clearinghouse (WWC) prioritizes topics based on the following criteria: potential to improve important student outcomes; potential to improve important student outcomes; applicability to a broad range of students or to particularly important subpopulations; applicability to a broad range of students or to particularly important subpopulations; policy relevance and perceived demand within the education community; and policy relevance and perceived demand within the education community; and likely availability of scientific studies. likely availability of scientific studies. Specifically, the topics were selected from nominations received through: Specifically, the topics were selected from nominations received through: emails from the public; emails from the public; meetings and presentations sponsored by the What Works Clearinghouse; meetings and presentations sponsored by the What Works Clearinghouse; the What Works Network; the What Works Network; suggestions presented by senior members of education associations, policymakers, and the U.S. Department of Education; and suggestions presented by senior members of education associations, policymakers, and the U.S. Department of Education; and reviews of existing research. reviews of existing research.

23 What Works Current Topics Topics include: MathCurriculum-Based Interventions for Increasing Middle School Math MathCurriculum-Based Interventions for Increasing Middle School Math MathCurriculum-Based Interventions for Increasing Middle School Math Curriculum-Based Interventions for Increasing Middle School Math ReadingInterventions for Beginning Reading ReadingInterventions for Beginning Reading Character EducationComprehensive Schoolwide Character Education Interventions: Benefits for Character Traits, Behavioral, and Academic Outcomes Character EducationComprehensive Schoolwide Character Education Interventions: Benefits for Character Traits, Behavioral, and Academic Outcomes Dropout PreventionInterventions for Preventing High School Dropout Dropout PreventionInterventions for Preventing High School Dropout English Language LearningInterventions for Elementary School English Language Learners: Increasing English Language Acquisition and Academic Achievement English Language LearningInterventions for Elementary School English Language Learners: Increasing English Language Acquisition and Academic Achievement MathCurriculum-Based Interventions for Increasing Elementary School Math MathCurriculum-Based Interventions for Increasing Elementary School Math Early ChildhoodInterventions for Improving Preschool Childrens School Readiness Early ChildhoodInterventions for Improving Preschool Childrens School Readiness Delinquent, Disorderly, and Violent BehaviorInterventions to Reduce Delinquent, Disorderly, and Violent Behavior in Middle and High Schools Delinquent, Disorderly, and Violent BehaviorInterventions to Reduce Delinquent, Disorderly, and Violent Behavior in Middle and High Schools Adult LiteracyInterventions for Increasing Adult Literacy Adult LiteracyInterventions for Increasing Adult Literacy Peer-Assisted LearningPeer-Assisted Learning Interventions in Elementary Schools: Reading, Mathematics, and Science Gains Peer-Assisted LearningPeer-Assisted Learning Interventions in Elementary Schools: Reading, Mathematics, and Science Gains

24 Does Not Meet Evidence Screens Studies may not pass WWC screening requirements for the following reasons: Evaluation research design. The study did not meet certain design standards. Study designs that provide the strongest evidence of effects include Evaluation research design. The study did not meet certain design standards. Study designs that provide the strongest evidence of effects include randomized controlled trials randomized controlled trials randomized controlled trials randomized controlled trials regression discontinuity designs regression discontinuity designs regression discontinuity designs regression discontinuity designs quasi-experimental designs (must use a similar comparison group and have no attrition or disruption problems) quasi-experimental designs (must use a similar comparison group and have no attrition or disruption problems) quasi-experimental designs quasi-experimental designs single subject designs single subject designs single subject designs single subject designs Topic area definition. The study did not meet the intervention definition developed by the WWC for a particular topic. Topic area definition. The study did not meet the intervention definition developed by the WWC for a particular topic. Time period definition (generally, the last 20 years) Time period definition (generally, the last 20 years) Relevant outcome Relevant outcome academic outcomes, not, for example, student self-confidence academic outcomes, not, for example, student self-confidence needs to have only one relevant outcome to pass this screen needs to have only one relevant outcome to pass this screen test reliability or validity test reliability or validity sample or description of relevant test items if a study outcome test is not known or available sample or description of relevant test items if a study outcome test is not known or available Relevant student sample Relevant student sample

25 A Real Live Current Example MATHEMATICS AND SCIENCE EDUCATION RESEARCH GRANTS PROGRAM CFDA (Catalog of Federal Domestic Assistance) NUMBER: 84.305 CFDA (Catalog of Federal Domestic Assistance) NUMBER: 84.305 RELEASE DATE: May 6, 2005 RELEASE DATE: May 6, 2005 REQUEST FOR APPLICATIONS NUMBER: NCER- 06-02 Mathematics and Science Education Research Grants Program REQUEST FOR APPLICATIONS NUMBER: NCER- 06-02 Mathematics and Science Education Research Grants Program http://www.ed.gov/about/offices/list/ies/programs.html http://www.ed.gov/about/offices/list/ies/programs.html http://www.ed.gov/about/offices/list/ies/programs.html http://www.ed.gov/about/offices/list/ies/programs.html LETTER OF INTENT RECEIPT DATE: September 12, 2005 LETTER OF INTENT RECEIPT DATE: September 12, 2005 APPLICATION RECEIPT DATE: November 3, 2005, 8:00 p.m. Eastern time APPLICATION RECEIPT DATE: November 3, 2005, 8:00 p.m. Eastern time

26 A Real Live Current Example REVIEW CRITERIA FOR SCIENTIFIC MERIT Significance Significance Does applicant make a compelling case for the potential contribution of the project to the solution of an education problem? Does applicant make a compelling case for the potential contribution of the project to the solution of an education problem? Does the applicant present a strong rationale justifying the need to evaluate the selected intervention (e.g., does prior evidence suggest that the intervention is likely to substantially improve student learning and achievement)? Does the applicant present a strong rationale justifying the need to evaluate the selected intervention (e.g., does prior evidence suggest that the intervention is likely to substantially improve student learning and achievement)? Research Plan Research Plan Does the applicant present Does the applicant present (a) clear hypotheses or research questions (a) clear hypotheses or research questions (b) clear descriptions of and strong rationales for the sample, measures (including information on reliability and validity), data collection procedures, and research design (b) clear descriptions of and strong rationales for the sample, measures (including information on reliability and validity), data collection procedures, and research design (c) a detailed and well-justified data analysis plan? (c) a detailed and well-justified data analysis plan? Does the research plan meet the requirements described in the section on the Requirements of the Proposed Research? Does the research plan meet the requirements described in the section on the Requirements of the Proposed Research? Is the research plan appropriate for answering the research questions or testing the proposed hypotheses? Is the research plan appropriate for answering the research questions or testing the proposed hypotheses?

27 A Real Live Current Example Applications under Goal Three (Efficacy and Replication Trials) Applications under Goal Three (Efficacy and Replication Trials) Under Goal Three, the Institute requests proposals to test the efficacy of fully developed interventions that already have evidence of potential efficacy. Under Goal Three, the Institute requests proposals to test the efficacy of fully developed interventions that already have evidence of potential efficacy. By efficacy, the Institute means the degree to which an intervention has a net positive impact on the outcomes of interest in relation to the program or practice to which it is being compared. By efficacy, the Institute means the degree to which an intervention has a net positive impact on the outcomes of interest in relation to the program or practice to which it is being compared.

28 A Real Live Current Example Methodological requirements (i) Sample (i) Sample The applicant should define, as completely as possible, the sample to be selected and sampling procedures to be employed for the proposed study. Additionally, the applicant should describe strategies to insure that participants will remain in the study over the course of the evaluation. The applicant should define, as completely as possible, the sample to be selected and sampling procedures to be employed for the proposed study. Additionally, the applicant should describe strategies to insure that participants will remain in the study over the course of the evaluation.

29 A Real Live Current Example (ii)Design (ii)Design Applicants should describe how potential threats to internal and external validity will be addressed. Applicants should describe how potential threats to internal and external validity will be addressed. Studies using randomized assignment to treatment and comparison conditions are strongly preferred. Studies using randomized assignment to treatment and comparison conditions are strongly preferred. When a randomized trial is used, the applicant should clearly state the unit of randomization (e.g., students, classroom, teacher, or school). When a randomized trial is used, the applicant should clearly state the unit of randomization (e.g., students, classroom, teacher, or school). Choice of randomizing unit or units should be grounded in a theoretical framework. Choice of randomizing unit or units should be grounded in a theoretical framework. Applicants should explain the procedures for assignment of groups (e.g., schools, classrooms) or participants to treatment and comparison conditions. Applicants should explain the procedures for assignment of groups (e.g., schools, classrooms) or participants to treatment and comparison conditions.

30 A Real Live Current Example (ii) Design (continued) Only in circumstances in which a randomized trial is not possible may alternatives that substantially minimize selection bias or allow it to be modeled be employed. Applicants … must make a compelling case that randomization is not possible. Only in circumstances in which a randomized trial is not possible may alternatives that substantially minimize selection bias or allow it to be modeled be employed. Applicants … must make a compelling case that randomization is not possible. Acceptable alternatives include appropriately structured regression-discontinuity designs or other well-designed quasi-experimental designs that come close to true experiments in minimizing the effects of selection bias on estimates of effect size. Acceptable alternatives include appropriately structured regression-discontinuity designs or other well-designed quasi-experimental designs that come close to true experiments in minimizing the effects of selection bias on estimates of effect size.

31 A Real Live Current Example (ii) Design (continued) A well-designed quasi-experiment reduces substantially the potential influence of selection bias on membership in the intervention or comparison group. This involves: A well-designed quasi-experiment reduces substantially the potential influence of selection bias on membership in the intervention or comparison group. This involves: demonstrating equivalence between the intervention and comparison groups at program entry on the variables measuring program outcomes (e.g., math achievement test scores), or obtaining such equivalence through statistical procedures such as propensity score balancing or regression demonstrating equivalence between the intervention and comparison groups at program entry on the variables measuring program outcomes (e.g., math achievement test scores), or obtaining such equivalence through statistical procedures such as propensity score balancing or regression demonstrating equivalence or removing statistically the effects of other variables on which the groups may differ and that may affect intended outcomes of the program being evaluated (e.g., demographic variables, experience and level of training of teachers, motivation of parents or students) demonstrating equivalence or removing statistically the effects of other variables on which the groups may differ and that may affect intended outcomes of the program being evaluated (e.g., demographic variables, experience and level of training of teachers, motivation of parents or students) a design for the initial selection of the intervention and comparison groups that minimizes selection bias or allows it to be modeled a design for the initial selection of the intervention and comparison groups that minimizes selection bias or allows it to be modeled

32 A Real Live Current Example (iii)Power (iii)Power Applicants should clearly address the power of the evaluation design to detect a reasonably expected and minimally important effect. Applicants should clearly address the power of the evaluation design to detect a reasonably expected and minimally important effect. For determining the sample size, applicants need to consider the number of clusters, the number of individuals within clusters, the potential adjustment from covariates, the desired effect, the intraclass correlation (i.e., the variance between clusters relative to the total variance between and within clusters), the desired power of the design, one- tailed vs. two-tailed tests, repeated observations, attrition of participants, etc. For determining the sample size, applicants need to consider the number of clusters, the number of individuals within clusters, the potential adjustment from covariates, the desired effect, the intraclass correlation (i.e., the variance between clusters relative to the total variance between and within clusters), the desired power of the design, one- tailed vs. two-tailed tests, repeated observations, attrition of participants, etc. Applicants should anticipate the degree to which the magnitude of the expected effect may vary across the primary outcomes of interest. Applicants should anticipate the degree to which the magnitude of the expected effect may vary across the primary outcomes of interest.

33 A Real Live Current Example (iv) Measures (iv) Measures Investigators should include Investigators should include relevant standardized measures of student achievement (e.g., standardized measures of mathematics achievement) relevant standardized measures of student achievement (e.g., standardized measures of mathematics achievement) other measures of student learning and achievement (e.g., researcher-developed measures) other measures of student learning and achievement (e.g., researcher-developed measures) measures of teacher practices measures of teacher practices information on the reliability, validity, and appropriateness of proposed measures information on the reliability, validity, and appropriateness of proposed measures

34 A Real Live Current Example (v) Fidelity of implementation of the intervention (v) Fidelity of implementation of the intervention The applicant should The applicant should specify how the implementation of the intervention will be documented and measured specify how the implementation of the intervention will be documented and measured either indicate how the intervention will be maintained consistently across multiple groups (e.g., classrooms and schools) over time or describe the parameters under which variations in the implementation may occur either indicate how the intervention will be maintained consistently across multiple groups (e.g., classrooms and schools) over time or describe the parameters under which variations in the implementation may occur propose research designs that permit the identification and assessment of factors impacting the fidelity of implementation propose research designs that permit the identification and assessment of factors impacting the fidelity of implementation

35 A Real Live Current Example (vi) Comparison group, where applicable (vi) Comparison group, where applicable The applicant should The applicant should describe strategies to avoid contamination between treatment and comparison groups describe strategies to avoid contamination between treatment and comparison groups include procedures for describing practices in the comparison groups include procedures for describing practices in the comparison groups be able to compare intervention and comparison groups on the implementation of key features of the intervention be able to compare intervention and comparison groups on the implementation of key features of the intervention using a business-as-usual comparison group is acceptable using a business-as-usual comparison group is acceptable applicants should specify the treatment or treatments received in the comparison group applicants should specify the treatment or treatments received in the comparison group applicants should account for the ways in which what happens in the comparison group are important to understanding the net impact of the experimental treatment applicants should account for the ways in which what happens in the comparison group are important to understanding the net impact of the experimental treatment

36 A Real Live Current Example (vii) Mediating and moderating variables (vii) Mediating and moderating variables Mediating and moderating variables that are measured in the intervention condition that are also likely to affect outcomes in the comparison condition should be measured in the comparison condition (e.g., student time-on-task, teacher experience/time in position). Mediating and moderating variables that are measured in the intervention condition that are also likely to affect outcomes in the comparison condition should be measured in the comparison condition (e.g., student time-on-task, teacher experience/time in position). The evaluation should account for sources of variation in outcomes across settings (i.e., to account for what might otherwise be part of the error variance). The evaluation should account for sources of variation in outcomes across settings (i.e., to account for what might otherwise be part of the error variance). (viii) Data analysis (viii) Data analysis specific statistical procedures should be described specific statistical procedures should be described the relation between hypotheses, measures, and independent and dependent variables should be clear the relation between hypotheses, measures, and independent and dependent variables should be clear the effects of clustering must be accounted for in the analyses, even when individuals are randomly assigned to condition the effects of clustering must be accounted for in the analyses, even when individuals are randomly assigned to condition


Download ppt "EDUCATION RESEARCH MEETS THE GOLD STANDARD: STATISTICS, EDUCATION, AND RESEARCH METHODS AFTER NO CHILD LEFT BEHIND Mack C. Shelley, II Iowa State University."

Similar presentations


Ads by Google