Do your results really say what you think they say? Issues of reliability and validity in evaluation measuring instruments Krista S. Schumacher, PhD student.

Do your results really say what you think they say? Issues of reliability and validity in evaluation measuring instruments Krista S. Schumacher, PhD student & Program Evaluator Oklahoma State University | JCCI Resource Development Services Krista S. Schumacher, PhD student & Program Evaluator Oklahoma State University | JCCI Resource Development Services AEA Meeting, October 17, 2013 Assessment in Higher Education TIG AEA Meeting, October 17, 2013 Assessment in Higher Education TIG

Key Issue “Unfortunately, many readers and researchers fail to realize that no matter how profound the theoretical formulations, how sophisticated the design, and how elegant the analytic techniques, they cannot compensate for poor measures” (Pedhazur & Pedhazur Schmelkin, 1991). “Unfortunately, many readers and researchers fail to realize that no matter how profound the theoretical formulations, how sophisticated the design, and how elegant the analytic techniques, they cannot compensate for poor measures” (Pedhazur & Pedhazur Schmelkin, 1991).

The Problem Review of 52 educational evaluation studies: 1971 to 1999 (Brandon & Singh, 2009) None adequately addressed measurement Lacking in research on practice of evaluation Literature on validity in evaluation studies ≠ measurement validity (Chen, 2010; Mark, 2011) Review of 52 educational evaluation studies: 1971 to 1999 (Brandon & Singh, 2009) None adequately addressed measurement Lacking in research on practice of evaluation Literature on validity in evaluation studies ≠ measurement validity (Chen, 2010; Mark, 2011)

The Problem (cont.) Federal emphasis on “scientifically based research” Experimental design Quasi-experimental design Regression discontinuity design, etc. Federal emphasis on “scientifically based research” Experimental design Quasi-experimental design Regression discontinuity design, etc. Where is measurement validity? How can programs be compared? How can we justify requests for continued funding?

Program Evaluation Standards: Accuracy Standard A2: Valid Information “Evaluation information should serve the intended purposes and support valid interpretation” (p. 171). Standard A3: Reliable Information “Evaluation procedures should yield sufficiently dependable and consistent information for the intended users” (p. 179). Standard A2: Valid Information “Evaluation information should serve the intended purposes and support valid interpretation” (p. 171). Standard A3: Reliable Information “Evaluation procedures should yield sufficiently dependable and consistent information for the intended users” (p. 179). (Yarbrough, Shulha, Hopson, & Caruthers, 2011)

Measurement Validity & Reliability Defined Valid Inferences = Validity Instrument measures intended construct Reliability Instrument consistently measures a construct But perhaps not the construct Reliability ≠ Validity Consistent scores across administrations Valid Inferences = Validity Instrument measures intended construct Reliability Instrument consistently measures a construct But perhaps not the construct Reliability ≠ Validity Consistent scores across administrations

Validity Types (basic for evaluation) Face On its face, instrument seems to measure intended construct Assessment: Subject Matter Experts (SME) ratings Content Items representative of domain of interest. Assessment: SME ratings Provides no information for validity of inferences about scores Construct Instrument content reflects intended construct Assessment: Exploratory factor analysis (EFA), principal components analysis (PCA) Face On its face, instrument seems to measure intended construct Assessment: Subject Matter Experts (SME) ratings Content Items representative of domain of interest. Assessment: SME ratings Provides no information for validity of inferences about scores Construct Instrument content reflects intended construct Assessment: Exploratory factor analysis (EFA), principal components analysis (PCA)

Understanding Construct Validity Pumpkin Pie Example Construct Pie Factors Crust and filling Variables (items) Individual ingredients Pumpkin Pie Example Construct Pie Factors Crust and filling Variables (items) Individual ingredients (Nassif & Khalil, 2006)

Validity Types (more advanced) Criterion Establishes relationship or discrimination Assessment: Correlation of scores with other test or with outcome variable Types of criterion validity evidence Concurrent validity Positive correlation with scores from another instrument measuring same construct Discriminant validity Negative correlation with scores from another instrument measuring opposite construct; comparing scores from different groups Predictive validity Positive correlation of scores with criterion variable test is intended to predict E.g., SAT scores and undergraduate GPA Criterion Establishes relationship or discrimination Assessment: Correlation of scores with other test or with outcome variable Types of criterion validity evidence Concurrent validity Positive correlation with scores from another instrument measuring same construct Discriminant validity Negative correlation with scores from another instrument measuring opposite construct; comparing scores from different groups Predictive validity Positive correlation of scores with criterion variable test is intended to predict E.g., SAT scores and undergraduate GPA

Reliability (basic for evaluation) Measure of error (or results due to chance) Internal Consistency Reliability (one type of reliability) Cronbach’s coefficient alpha (most common) Correlation coefficient: +1 = high reliability, no error 0 = no reliability, high error ≥.70 desired (Nunnally, 1978) Not a measure of dimensionality If multiple scales (or factors), compute alpha for each scale Measure of error (or results due to chance) Internal Consistency Reliability (one type of reliability) Cronbach’s coefficient alpha (most common) Correlation coefficient: +1 = high reliability, no error 0 = no reliability, high error ≥.70 desired (Nunnally, 1978) Not a measure of dimensionality If multiple scales (or factors), compute alpha for each scale

Psychometrically Tested Instrument in Evaluation: Example Middle Schoolers Out to Save the World (Tyler-Wood, Knezek, & Christensen, 2010) $1.6 million NSF Innovative Technology Experiences for Students and Teachers (ITEST) STEM attitudes & career interest surveys Process Adapted existing psychometrically tested instruments Instrument development discussed Validity and reliability evidence included Instruments published in article Middle Schoolers Out to Save the World (Tyler-Wood, Knezek, & Christensen, 2010) $1.6 million NSF Innovative Technology Experiences for Students and Teachers (ITEST) STEM attitudes & career interest surveys Process Adapted existing psychometrically tested instruments Instrument development discussed Validity and reliability evidence included Instruments published in article

Middle Schoolers Out to Save the World: Validity & Reliability Content validity Subject matter experts Teachers; advisory board members Construct validity Principal components analysis Criterion-related validity Concurrent: Correlated scores with other instruments tested for validity and reliability Discriminant: Compared scores among varying groups (e.g., 6 th graders vs. ITEST PIs) Content validity Subject matter experts Teachers; advisory board members Construct validity Principal components analysis Criterion-related validity Concurrent: Correlated scores with other instruments tested for validity and reliability Discriminant: Compared scores among varying groups (e.g., 6 th graders vs. ITEST PIs)

Middle Schoolers Out to Save the World: Construct Validity Career Interest Survey Items Component 1: Supportive environment Component 2: Science education interest Component 3: Perceived importance of science career Item 1.781 (component loading) Item 2.849 Item 3.759 Item 4.900 Item 5.851 Item 6.921 Item 7.852 Item 8.736 Item 9.844 Item 10.670 Item 11.888 Item 12.886

Middle Schoolers Out to Save the World: Reliability Scale# ItemsCronbach’s alpha Perception of supportive environment for pursuing a career in science 4.86 Interest in pursuing educational opportunities that would lead to a career in science 5.94 Perceived importance of a career in science3.78 All items12.94 Internal Consistency Reliabilities for Career Interest Scales

Evaluations Lacking Instrument Validity & Reliability Six evaluations reviewed Approx. $9 million in federal funding NSF programs: STEM Talent Expansion Program (STEP) Innovative Technology Experiences for Science Teachers (ITEST) Research in Disabilities Education All used evaluator-developed instruments Six evaluations reviewed Approx. $9 million in federal funding NSF programs: STEM Talent Expansion Program (STEP) Innovative Technology Experiences for Science Teachers (ITEST) Research in Disabilities Education All used evaluator-developed instruments

Purpose of Sample Evaluation Instruments Instruments intended to measure: Attitudes toward science, technology, engineering & math (STEM) Anxiety related to STEM education Interest in STEM careers Confidence regarding success in STEM major Program satisfaction Instruments intended to measure: Attitudes toward science, technology, engineering & math (STEM) Anxiety related to STEM education Interest in STEM careers Confidence regarding success in STEM major Program satisfaction

Measurement Fatal Flaws in Sample Evaluations Failed to: Discuss process of instrument development How were items developed? Were they reviewed by anyone other than evaluators? Report reliability or validity information Evaluations that included existing instruments did not report results of psychometric testing One used different instruments for pre/post tests How can claims of increases or decreases be made when different items are used? Failed to: Discuss process of instrument development How were items developed? Were they reviewed by anyone other than evaluators? Report reliability or validity information Evaluations that included existing instruments did not report results of psychometric testing One used different instruments for pre/post tests How can claims of increases or decreases be made when different items are used?

Reported Findings of Sample Evaluations IEP students less likely than non-IEP peers to be interested in STEM fields (Lam et al., 2008) Freshman seminar increased perceived readiness for following semester (Raines, 2012) Residential program increased STEM attitudes and career interests (Lenaburg et al., 2012) Participants satisfied with program (Russomanno et al, 2010) Increased perceived self-competence re: information technology (IT) (Hayden et al., 2011) Improved perceptions of IT professionals among high school faculty (Forssen et al., 2011) IEP students less likely than non-IEP peers to be interested in STEM fields (Lam et al., 2008) Freshman seminar increased perceived readiness for following semester (Raines, 2012) Residential program increased STEM attitudes and career interests (Lenaburg et al., 2012) Participants satisfied with program (Russomanno et al, 2010) Increased perceived self-competence re: information technology (IT) (Hayden et al., 2011) Improved perceptions of IT professionals among high school faculty (Forssen et al., 2011)

Implications for Evaluation Funding and other program decisions Findings based on valid and reliable data provide strong justifications Use existing (tested) instruments when possible Assessment Tools in Informal Science http://www.pearweb.org/atis/dashboard/index Buros Center for Testing (Mental Measurements Yearbook) http://buros.org/ For newly created instruments Discuss process of instrument creation Report evidence of validity and reliability Funding and other program decisions Findings based on valid and reliable data provide strong justifications Use existing (tested) instruments when possible Assessment Tools in Informal Science http://www.pearweb.org/atis/dashboard/index Buros Center for Testing (Mental Measurements Yearbook) http://buros.org/ For newly created instruments Discuss process of instrument creation Report evidence of validity and reliability

Conclusion No more missing pieces Measurement deserves a place of priority Continually ask... No more missing pieces Measurement deserves a place of priority Continually ask... Are the data trustworthy? Are my conclusions justifiable? How do we know these results really say what we think they say?

References Brandon, P. R., & Singh, J. M. (2009). The strength of the methodological warrants for the findings of research on program evaluation use. American Journal of Evaluation, 30(2), 123-157. Chen, H. T. (2010). The bottom-up approach to integrative validity: A new perspective for program evaluation. Evaluation and Program Planning, 33, 205-214. Forssen, A., Lauriski-Karriker, T., Harriger, A., & Moskal, B. (2011). Surprising Possibilities Imagined and Realized through Information Technology: Encouraging high school girls' interests in information technology. Journal of STEM Education: Innovations & Research, 12(5/6), 46-57. Hayden, K., Ouyang, Y., Scinski, L., Olszewski, B., & Bielefeldt, T. (2011). Increasing student interest and attitudes in STEM: Professional development and activities to engage and inspire learners. Contemporary Issues in Technology and Teacher Education, 11(1), 47-69. Lam, P., Doverspike, D., Zhao, J., Zhe, J., & Menzemer, C. (2008). An evaluation of a STEM program for middle school students on learning disability related IEPs. Journal of STEM Education: Innovations & Research, 9(1/2), 21-29. Lenaburg, L., Aguirre, O., Goodchild, F., & Kuhn, J.-U. (2012). Expanding Pathways: A Summer Bridge Program for Community College STEM Students. Community College Journal of Research and Practice, 36(3), 153-168. Mark, M. M. (2011). New (and old) directions for validity concerning generalizability. New Directions for Evaluation, 2011(130), 31-42. Nassif, N., & Khalil, Y. (2006). Making a pie as a metaphor for teaching scale validity and reliability. American Journal of Evaluation, 27(3), 393-398. Nunnally, J. (1978). Psychometric theory. New York, NY: McGraw-Hill. Pedhazur, E. J., & Pedhazur Schmelkin, L. (1991). Measurement, design, and analysis: An integrated approach. New York, NY: Psychology Press. Raines, J. M. (2012). FirstSTEP: A preliminary review of the effects of a summer bridge program on pre-college STEM majors. Journal of STEM Education : Innovations and Research, 13(1). Russomanno, D., Best, R., Ivey, S., Haddock, J. R., Franceschetti, D., & Hairston, R. J. (2010). MemphiSTEP: A STEM Talent Expansion Program at the University of Memphis. Journal of STEM Education : Innovations and Research, 11(1/2), 69-81. Tyler-Wood, T., Knezek, G., & Christensen, R. (2010). Instruments for assessing interest in STEM content and careers. Journal of Technology and Teacher Education, 18(2), 341-363. Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (Eds.). (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage. Brandon, P. R., & Singh, J. M. (2009). The strength of the methodological warrants for the findings of research on program evaluation use. American Journal of Evaluation, 30(2), 123-157. Chen, H. T. (2010). The bottom-up approach to integrative validity: A new perspective for program evaluation. Evaluation and Program Planning, 33, 205-214. Forssen, A., Lauriski-Karriker, T., Harriger, A., & Moskal, B. (2011). Surprising Possibilities Imagined and Realized through Information Technology: Encouraging high school girls' interests in information technology. Journal of STEM Education: Innovations & Research, 12(5/6), 46-57. Hayden, K., Ouyang, Y., Scinski, L., Olszewski, B., & Bielefeldt, T. (2011). Increasing student interest and attitudes in STEM: Professional development and activities to engage and inspire learners. Contemporary Issues in Technology and Teacher Education, 11(1), 47-69. Lam, P., Doverspike, D., Zhao, J., Zhe, J., & Menzemer, C. (2008). An evaluation of a STEM program for middle school students on learning disability related IEPs. Journal of STEM Education: Innovations & Research, 9(1/2), 21-29. Lenaburg, L., Aguirre, O., Goodchild, F., & Kuhn, J.-U. (2012). Expanding Pathways: A Summer Bridge Program for Community College STEM Students. Community College Journal of Research and Practice, 36(3), 153-168. Mark, M. M. (2011). New (and old) directions for validity concerning generalizability. New Directions for Evaluation, 2011(130), 31-42. Nassif, N., & Khalil, Y. (2006). Making a pie as a metaphor for teaching scale validity and reliability. American Journal of Evaluation, 27(3), 393-398. Nunnally, J. (1978). Psychometric theory. New York, NY: McGraw-Hill. Pedhazur, E. J., & Pedhazur Schmelkin, L. (1991). Measurement, design, and analysis: An integrated approach. New York, NY: Psychology Press. Raines, J. M. (2012). FirstSTEP: A preliminary review of the effects of a summer bridge program on pre-college STEM majors. Journal of STEM Education : Innovations and Research, 13(1). Russomanno, D., Best, R., Ivey, S., Haddock, J. R., Franceschetti, D., & Hairston, R. J. (2010). MemphiSTEP: A STEM Talent Expansion Program at the University of Memphis. Journal of STEM Education : Innovations and Research, 11(1/2), 69-81. Tyler-Wood, T., Knezek, G., & Christensen, R. (2010). Instruments for assessing interest in STEM content and careers. Journal of Technology and Teacher Education, 18(2), 341-363. Yarbrough, D. B., Shulha, L. M., Hopson, R. K., & Caruthers, F. A. (Eds.). (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage.

Contact Information JCCI Resource Development Services http://www.jccionline.com BECO Building West 5410 Edson Lane - Suite 210B Rockville, MD 20852 Jennifer Kerns, President 301-468-1851 | jkerns@jccionline.comjkerns@jccionline.com Krista S. Schumacher, Associate 918-284-7276 | krista.schumacher@okstate.edukrista.schumacher@okstate.edu JCCI Resource Development Services http://www.jccionline.com BECO Building West 5410 Edson Lane - Suite 210B Rockville, MD 20852 Jennifer Kerns, President 301-468-1851 | jkerns@jccionline.comjkerns@jccionline.com Krista S. Schumacher, Associate 918-284-7276 | krista.schumacher@okstate.edukrista.schumacher@okstate.edu

Do your results really say what you think they say? Issues of reliability and validity in evaluation measuring instruments Krista S. Schumacher, PhD student.

Similar presentations

Presentation on theme: "Do your results really say what you think they say? Issues of reliability and validity in evaluation measuring instruments Krista S. Schumacher, PhD student."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Do your results really say what you think they say? Issues of reliability and validity in evaluation measuring instruments Krista S. Schumacher, PhD student.

Similar presentations

Presentation on theme: "Do your results really say what you think they say? Issues of reliability and validity in evaluation measuring instruments Krista S. Schumacher, PhD student."— Presentation transcript:

Similar presentations

About project

Feedback