Presentation is loading. Please wait.

Presentation is loading. Please wait.

DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1 Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments.

Similar presentations


Presentation on theme: "DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1 Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments."— Presentation transcript:

1 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: -A Modern Psychometric Perspective – André A. Rupp, EDMS Department, University of Maryland

2 Toward a Definition of “Diagnostic Assessment Systems”

3 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Proposed Panel Definition The term "diagnostic” comes from a combination of dia, to split apart, and gnosi, to learn, or knowledge. We use “diagnostic assessment (system)” to refer to assessment processes based on an explicit cognitive model, itself supported by empirical study, of proficient reasoning in a particular domain. The cognitive model must support delineation of students’ and / or teachers’ strengths and weaknesses that can be traced as they move from less to more proficient reasoning in the domain. The principled assessment design process should specify how observed behaviors are used to make inferences about what students or teachers know as they progress. We believe that diagnostic assessment has the potential to inform and assess the outcomes of instruction.

4 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Conceptualization of Problem Space from Stevens, Beal, & Sprang (2009)

5 Toward an Understanding of Frameworks & Models

6 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, The Evidence-centered Design Framework adapted from Mislevy, Steinberg, Almond, & Lukas (2006)

7 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Frameworks vs. Models A “principled assessment design framework” for diagnostic assessment such as evidence-centered design is NOT a “model”. It does NOT prescribe a particular statistical modeling approach. A “statistical / psychometric model” is a mathematical tool that plays a supporting role for generating evidence-based narratives about students’ and / or teachers’ strenghts and weaknesses. Its parameters do NOT have inherent meanings. A “cognitive model” for diagnostic assessment is a theory and data-driven description of how emergent understandings and misconceptions in a domain develop and how these can be traced back to unobservable cognitive underpinnings. It does NOT prescribe a singular assessment approach.

8 Evidence-based Reasoning for “Traditional” Assessments

9 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Test Score I1 I2 Ik : Test Score I1 I2 Ik : Test Score I1 I2 Ik : Construct Traditional Construct Operationalization Theoretical RealmEmpirical Realm

10 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Feedback Utility (Part I – Scoring Card)

11 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Feedback Utility (Part II – Simple Progress Mapping) Level 3 Level 4

12 Evidence-based Reasoning for “Modern” Assessments

13 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Complex Assessment Tasks for Diagnosis (Part I) from Seeratan & Mislevy (2008)

14 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Complex Assessment Tasks for Diagnosis (Example II) from Behrens et al. (2009)

15 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Evidence Identification, Aggregation, & Synthesis from Stevens, Beal, & Sprang (2009)

16 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Proficiency Pathways from Stevens, Beal, & Sprang (2009)

17 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Interventional Pathways from Stevens, Beal, & Sprang (2009)

18 Selected Statistical Tools for Evidence-based Reasoning

19 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Selected Modeling Approaches for Diagnostic Assessments Approaches Resulting in Continuous Proficiency Scales 1.Unidimensional explanatory IRT or FA models (e.g., de Boeck & Wilson, 2004) 2. Multidimensional CTT sumscores (e.g., Henson, Templin, & Douglas, 2007) 3.Multidimensional explanatory IRT or FA models (e.g., Reckase, 2009) 4.Structural equation models (e.g., Kline, 2010) Approaches Resulting in Classifications of Respondents based on Discrete Scales 1. Bayesian inference networks (e.g., Almond, Williamson, Mislevy, & Yan, in press) 2.Parametric diagnostic classification models (e.g., Rupp, Templin, & Henson, 2010) 3.Non- / Semi-parametric classification approaches (e.g., Tatsuoka, 2009) 4. Adapted clustering algorithms (e.g., Nugent, Dean, & Ayers, 2010)

20 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Psychometric Tools for Diagnostic Assessments New frontiers of educational measurement 1. Educational data mining for simulation- / games-based assessment (e.g., Rupp et al., 2010; Soller & Stevens, 2007; West et al., 2009) 2. Diagnostic multiple-choice items / selected-response items (e.g., Briggs et al., 2006; de la Torre, 2009) 3. Computerized diagnostic adaptive assessment (e.g., Cheng, 2009; McGlohen & Chang, 2008) Useful ideas from large-scale assessment 1. Modeling dependencies in nested response data (e.g., Jiao, von Davier, & Wang, 2010; Wainer, Bradlow, & Wang, 2007) 2. Item families / task variants & automatic test / form assembly (e.g., Embretson & Daniel, 2008; Geerlings, Glas, & van der Linden, in press) 3. Survey designs using multiple test forms / booklets (e.g., Frey, Hartig, & Rupp, 2009; Rutkowski, Gonzalez, Joncas, & von Davier, 2010)

21 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments in STEM Education: -A Modern Psychometric Perspective – André A. Rupp EDMS Department, University of Maryland 1230-A Benjamin Building College Park, MD Phone: (301) 405 –

22 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, References (Part I) Almond, R. G., Williamson, D. M., Mislevy, R. J., & Yan, D. (in press). Bayes nets in educational assessment. New York: Springer. Beaton, A. E., & Allen, N. L. (1992). Interpreting scales through scale anchoring. Journal of Educational Statistics, 17, Borsboom, D., & Mellenbergh, G. J. (2007). Test validity in cognitive assessment. In J. P. Leighton & M. J. Gierl (Eds.), Cognitive diagnostic assessment for education: Theory and applications (pp. 85–118). Cambridge, UK: Cambridge University Press. Briggs, D. C., Alonzo, A. C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11, Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, de Boeck, P., & Wilson, M. (2004). Explanatory item response theory models: A generalized linear and nonlinear approach. New York: Springer. de la Torre, J. (2009). A cognitive diagnosis model for cognitively based multiple-choice options. Applied Psychological Measurement, 33, Embretson, S. E., & Daniel, R. C. (2008). Understanding and quantifying cognitive complexity level in mathematical problem-solving items. Psychology Science Quarterly, 50, Frey, A., Hartig, J., & Rupp, A. A. (2009). An NCME instructional module on booklet designs in large-scale assessments of student achievement. Educational Measurement: Issues and Practice, 28(3), Geerlings, H., Glas, C. A. W., & van der Linden, W. (in press). Modeling rule-based item generation. Psychometrika.

23 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, References (Part II) Gomez, P. G., Noah, A., Schedl, M., Wright, C., & Yolkut, A. (2007). Proficiency descriptors based on a scale-anchoring study of the new TOEFL iBT reading test. Language Testing, 24, Haberman, S., & Sinharay, S. (2010). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, Jiao, H., von Davier, M., & Wang, S. (2010, April). Polytomous mixture Rasch testlet model. Presented at the annual meeting of the National Council for Measurement in Education, Denver, CO. Kane, M. T. (2006). Validation. In R L. Brennan (Ed.), Educational measurement (4th ed., pp. 17– 64). Portsmouth, NH: Greenwood. Kline, R. (2010). Principles and practice of structural equation modeling (2 nd ed.). New York: Guilford Press. Leighton, J., & Gierl, M. (2007). Cognitive diagnostic assessment for education: Theory and applications. Cambridge, UK: Cambridge University Press. McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749. Mislevy, R. J., Steinberg, L. S., Almond, R. G., & Lukas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In D. M. Williamson, I. I. Bejar, & R. J. Mislevy (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 15–48). Mahwah, NJ: Erlbaum.

24 DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, References (Part III) Nugent, R., Dean, N., & Ayers, B. (2010, July). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers. Presented at the International Educational Data Mining Conference, Pittsburgh, PA. Reckase, M. (2009). Multidimensional item response theory. New York: Springer. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications. New York: Guildford Press. Rupp, A. A., Gushta, M., Mislevy, R. J., & Shaffer, D. W. (2010). Evidence-centered design of epistemic games: Measurement principles for complex learning environments. Journal of Technology, Learning, & Assessment, 8(4). Available online at Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in secondary analysis and reporting. Educational Researcher, 39, Tatsuoka, K. K. (2009). Cognitive assessment: An introduction to the rule-space method. Florence, KY: Routledge. Stevens, R., Beal, C., & Sprang, M. (2009, August). Developing versatile automated assessments of scientific problem-solving. Presented at the NSF conference on games- and simulation-based assessment, Washington, DC. Templin, J., & Henson, R. (2009, April). Practical issues in using diagnostic estimates: Measuring the reliability and validity of diagnostic estimates. Presented at the annual meeting of the National Council of Measurement in Education, San Diego, CA. Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. New York: Cambridge University Press. West, P., Rutstein, D. W., Mislevy, R. J., Liu, J., Levy, R., DiCerbo, K. E., et al. (2009, June). A Bayes net approach to modeling learning progressions and task performances. Paper presented at the Learning Progressions in Science conference, Iowa City, IA.


Download ppt "DRK-12 Diagnostic Assessment Panel, Prof. André A. Rupp, Dec 3, 2010 1 Opportunities and Challenges for Developing and Evaluating Diagnostic Assessments."

Similar presentations


Ads by Google