Cognitive and Academic Assessment

Slides:

Advertisements

Similar presentations

Assessing Student Performance

Advertisements

Assessment Adapted from text Effective Teaching Methods Research-Based Practices by Gary D. Borich and How to Differentiate Instruction in Mixed Ability.

Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.

Topics: Quality of Measurements

Taking Stock Of Measurement. Basics Of Measurement Measurement: Assignment of number to objects or events according to specific rules. Conceptual variables:

1 COMM 301: Empirical Research in Communication Kwan M Lee Lect4_1.

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Assessment Procedures for Counselors and Helping Professionals, 7e © 2010 Pearson Education, Inc. All rights reserved. Chapter 5 Reliability.

VALIDITY AND RELIABILITY

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

Designing Scoring Rubrics. What is a Rubric? Guidelines by which a product is judged Guidelines by which a product is judged Explain the standards for.

Constructing Exam Questions Dan Thompson & Brandy Close OSU-CHS Educational Development-Clinical Education.

Assessment: Reliability, Validity, and Absence of bias

By: Michele Leslie B. David MAE-IM WIDE USAGE To identify students who may be eligible to receive special services To monitor student performance from.

Chapter 4 Validity.

Concept of Measurement

Reliability and Validity

Principles of High Quality Assessment

FOUNDATIONS OF NURSING RESEARCH Sixth Edition CHAPTER Copyright ©2012 by Pearson Education, Inc. All rights reserved. Foundations of Nursing Research,

BASIC PRINCIPLES OF ASSSESSMENT RELIABILITY & VALIDITY

Classroom Assessment A Practical Guide for Educators by Craig A

Understanding Validity for Teachers

Questions to check whether or not the test is well designed: 1. How do you know if a test is effective? 2. Can it be given within appropriate administrative.

Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.

PhD Research Seminar Series: Reliability and Validity in Tests and Measures Dr. K. A. Korb University of Jos.

Measurement in Exercise and Sport Psychology Research EPHE 348.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

Classroom Assessments Checklists, Rating Scales, and Rubrics

Assessment in Education Patricia O’Sullivan Office of Educational Development UAMS.

Teaching Today: An Introduction to Education 8th edition

Chap. 2 Principles of Language Assessment

Reliability & Validity

Validity Is the Test Appropriate, Useful, and Meaningful?

Assessing Learners with Special Needs: An Applied Approach, 6e © 2009 Pearson Education, Inc. All rights reserved. Chapter 4:Reliability and Validity.

EDU 8603 Day 6. What do the following numbers mean?

Week 5 Lecture 4. Lecture’s objectives  Understand the principles of language assessment.  Use language assessment principles to evaluate existing tests.

Classroom Evaluation & Grading Chapter 15. Intelligence and Achievement Intelligence and achievement are not the same Intelligence and achievement are.

Measurement Validity.

Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

RELIABILITY AND VALIDITY OF ASSESSMENT

Classroom Assessment (1) EDU 330: Educational Psychology Daniel Moos.

Assessment and Testing

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

Assessment Information from multiple sources that describes a student’s level of achievement Used to make educational decisions about students Gives feedback.

Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.

Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Assessing Measurement Quality in Quantitative Studies.

Validity Validity is an overall evaluation that supports the intended interpretations, use, in consequences of the obtained scores. (McMillan 17)

RESEARCH METHODS IN INDUSTRIAL PSYCHOLOGY & ORGANIZATION Pertemuan Matakuliah: D Sosiologi dan Psikologi Industri Tahun: Sep-2009.

Measurement Experiment - effect of IV on DV. Independent Variable (2 or more levels) MANIPULATED a) situational - features in the environment b) task.

Chapter 6 - Standardized Measurement and Assessment

Testing. Psychological Tests  Tests abilities, interests, creativity, personality, behavior  Must be standardized, reliable, and valid  Timing, instructions,

Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.

TEST SCORES INTERPRETATION - is a process of assigning meaning and usefulness to the scores obtained from classroom test. - This is necessary because.

Assessment in Education ~ What teachers need to know.

Copyright © Springer Publishing Company, LLC. All Rights Reserved. DEVELOPING AND USING TESTS – Chapter 11 –

Consistency and Meaningfulness Ensuring all efforts have been made to establish the internal validity of an experiment is an important task, but it is.

Classroom Assessments Checklists, Rating Scales, and Rubrics

Concept of Test Validity

Classroom Assessments Checklists, Rating Scales, and Rubrics

Reliability & Validity

Human Resource Management By Dr. Debashish Sengupta

پرسشنامه کارگاه.

TESTING AND EVALUATION IN EDUCATION GA 3113 lecture 1

Chapter 8 VALIDITY AND RELIABILITY

EDUC 2130 Quiz #10 W. Huitt.

Presentation transcript:

Cognitive and Academic Assessment Dr. K. A. Korb University of Jos

Outline Classroom Assessment Achievement Testing Intelligence Testing General Issues in Psychological Testing Reliability Validity Dr. K. A. Korb University of Jos

Classroom Assessment Purpose of classroom assessment Provide information to students, parents, teachers, and others Provide information to adapt instructional practices Increase learning Enhance motivation Grades should measure achievement in the specific class Grades should NOT measure: Achievement in the specific class Effort Ability Interests Attitude Degree of corruption Dr. K. A. Korb University of Jos

Classroom Assessment Grading system should be: Clear and understandable Designed to support learning and provide frequent feedback Based on hard data Fair to all students Defendable to students, parents, and administrators Dr. K. A. Korb University of Jos

Classroom Assessment Components of grading system Formative Evaluation: Evaluation before or during instruction to provide feedback to the teacher/student Summative: Evaluation after instruction for grading purposes Dr. K. A. Korb University of Jos

Tips for Writing Exams The purpose of an exam is assess how well the student understands the course content Critically read questions from the students’ perspective to determine if the question is understandable. Rotate exam questions between terms so students cannot get access to the questions prior to the exam. Students have to study more of the material and think deeper about the content Ask questions that test Bloom’s higher level of thinking Assess understanding, not memorization Cheating becomes more difficult Dr. K. A. Korb University of Jos

Multiple Choice Items Strengths Weaknesses Able to assess different levels of thinking Reliable scoring Easy to grade Weaknesses Unable to give credit for partial knowledge Decontextualized Dr. K. A. Korb University of Jos

Multiple Choice Items Martha talks to the person sitting next to her while the teacher is giving instruction. To make sure that Martha does not talk in class next time, the teacher makes Martha write an essay on why paying attention to the teacher is important. This is an example of: Positive Reinforcement Negative Reinforcement Positive Punishment Negative Punishment Dr. K. A. Korb University of Jos

Multiple Choice Items Components Stem: Question/Problem Correct answer Determines level of knowledge assessed Correct answer Distracters: Wrong answers Contain likely misconceptions Dr. K. A. Korb University of Jos

Multiple Choice Items Guidelines for preparing Multiple Choice Items Present one clear problem in the stem Make all distracters plausible Avoid similar wording in the stem and correct choice Keep the correct answer and distracters similar in length Avoid absolute terms (always, never) in incorrect choices Keep stem and distracters grammatically consistent Avoid using two distracters with the same meaning Emphasize NEGATIVE wording Use None of the above with care and avoid All of the above Dr. K. A. Korb University of Jos

Essay Items Strengths Weaknesses Assess creative and critical thinking Students more likely to meaningfully organize information when studying Weaknesses Scoring takes time Scoring can be unreliable Dr. K. A. Korb University of Jos

Essay Items Rubric: Scoring scale that describes criteria for grading Establish criteria for credit based on critical element of essay According to the Theory of planned Behavior, what are the four major factor that influence the relationship between attitudes and behavior? 2 points apiece, 1 points the name and 1 points for the explanation Behavioral intentions: Cognitive representation of the readiness to perform a behavior Specific attitude towards behavior: Whether like or dislike Subject norms: Belief about how significant others view behavior Perceived behavior control: Perception about the ability to perform the behavior Dr. K. A. Korb University of Jos

Essay Items Scoring essay items Require students to answer each item Prepare rubric in advance Write a model answer for each item and compare a few students responses to model to determine if score adjustments are needed Score all students’ answers to one essay question before moving to the next essay question Score all responses to single item in one sitting Score answers without knowing the identity of the student Dr. K. A. Korb University of Jos

Intelligence Testing Achievement Test: Instrument created to assess developed skills or knowledge in a specific domain Purpose of standardized achievement testing is to place students in appropriate education environment Intelligence Test: Ability to perform on cognitive tasks Sample performance on a variety of cognitive tasks and then compare performance to others of a similar developmental level Purpose of Intelligence Testing Diagnose students with special needs Learning Disabilities Talented and Gifted Place students in appropriate educational environments Educational Research Dr. K. A. Korb University of Jos

Intelligence Testing Subtest Score and General Score Example: Stanford Binet Intelligence Scale: Fourth Edition Verbal Reasoning Visual Reasoning Quantitative Reasoning Short-Term Memory Subtest Score and General Score Dr. K. A. Korb University of Jos

Reliability: Consistency of results Reliable Reliable Unreliable Dr. K. A. Korb University of Jos

Reliability Theory Actual score on test = True score + Error True Score: Hypothetical actual score on test The reliability coefficient indicates the ratio between the true score variance on the test and the total variance In other words, as the error in testing decreases, the reliability increases What sources of error? Dr. K. A. Korb University of Jos

Reliability: Sources of Error Error in Test construction Error in Item Sampling: Results from items that measure more than one construct in the same test Error in Test Administration Test environment: Room temperature, amount of light, noise Test-taker variables: Illness, amount of sleep, test anxiety, exam malpractice Examiner-related variables: Absence of examiner, examiner’s demeanor Error in Test Scoring Scorer: With subjectively marked assessments, different scorers may give different scores to the same responses Dr. K. A. Korb University of Jos

Reliability: Error due to Test Construction Split-Half Reliability: Determines how consistently the measure assesses the construct of interest. A low split-half reliability indicates poor test construction. An instrument with a low split-half reliability is probably measuring more constructs than it was designed to measure Calculate split-half reliability with Coefficient Alpha Dr. K. A. Korb University of Jos

Reliability: Error due to Test Administration Test-Retest Reliability: Determines how much error in a test score is due to problems with test administration. Administer the same test to the same participants on two different occasions. Correlate the test scores of the two administrations using Pearson’s Product Moment Correlation. Dr. K. A. Korb University of Jos

Reliability: Error due to Test Construction with Two Forms of the Same Measure Parallel Forms Reliability: Determines the similarity of two different versions of the same measure. Administer the two tests to the same participants within a short period of time. Correlate the test scores of the two tests using Pearson’s Product Moment Correlation. Dr. K. A. Korb University of Jos

Reliability: Error due to Test Scoring Inter-Rater Reliability: Determines how closely two different raters mark the assessment. Give the exact same test results from one test administration to two different raters. Correlate the two markings from the different raters using Pearson’s Product Moment Correlation Dr. K. A. Korb University of Jos

Validity: Measuring what is supposed to be measured Invalid Invalid Dr. K. A. Korb University of Jos

Validity Three types of validity: Construct validity: Measure the appropriate psychological construct Criterion validity: Predict appropriate outcomes Content validity: Adequate sample of content Each type of validity should be established for all psychological tests. Criterion: DL test, ACT/SAT Dr. K. A. Korb University of Jos

Construct Validity Construct Validity: Appropriateness of inferences drawn from test scores regarding an individual’s status of the psychological construct of interest Two considerations: Construct underrepresentation Construct irrelevant variance Dr. K. A. Korb University of Jos

Construct Validity Construct underrepresentation: A test does not measure all of the important aspects of the construct. Content Validity Construct-irrelevant variance: Test scores are affected by other unrelated processes Dr. K. A. Korb University of Jos

Sources of Construct Validity Evidence Homogeneity: The test measures a single construct Evidence: High internal consistency Convergence: Test is related to other measures of the same construct and related constructs Evidence: Criterion Validity Theory: The test behaves according to theoretical propositions about the construct Evidence by changes in test scores according to age: Scores on the measure should change by age as predicted by theory. Evidence from treatments: Scores on the measure change as predicted by theory from a treatment between pretest and posttest. Dr. K. A. Korb University of Jos

Criterion Validity Criterion Validity: Correlation between the measure and a criterion. Criterion: Other accepted measures of the construct or measures of other constructs similar in nature. A criterion can consist of any standard with which your test should be related Examples: Behavior Other test scores Ratings Psychiatric diagnosis Dr. K. A. Korb University of Jos

Criterion Validity Three types: Convergent validity: High correlations with measures of similar constructs taken at the same time. Divergent validity: Low correlations with measures of different constructs taken at the same time. Predictive validity: High correlation with a criterion in the future Dr. K. A. Korb University of Jos

Criterion Validity Example: An essay test of science reasoning was developed to admit students into the science program at the university. Convergent Validity: High correlations with other science tests, particularly well established science tests. Divergent Validity: Low correlations with measures of writing ability because the test should only measure science reasoning, not writing ability. Predictive Validity: High correlations with future grades in science courses because the purpose of the test is to determine who will do well in the science program at the university. Dr. K. A. Korb University of Jos

Criterion Validity Example Criterion Validity Evidence for New Science Reasoning Test: Correlations between Science Reasoning and Other Measures New Science Reasoning Test WAEC Science Scores .83 School Science Marks .75 WAEC Writing scores .34 WAEC Reading Scores .24 Future marks in university science courses .65 High correlations with other measures of science ability indicates good criterion validity. Low correlations with measures unrelated to science ability indicates good criterion validity. High correlation with future measures of science ability indicates good criterion validity. Dr. K. A. Korb University of Jos

Content Validity Content Validity: Sampling the entire domain of the construct it was designed to measure Dr. K. A. Korb University of Jos

Content Validity Dr. K. A. Korb University of Jos

Content Validity To assess: Gather a panel of judges Give the judges a table of specifications of the amount of content covered in the domain Give the judges the measure Judges draw a conclusion as to whether the proportion of content covered on the test matches the proportion of content in the domain. Dr. K. A. Korb University of Jos

Face Validity Face validity: Addresses whether the test appears to measure what it purports to measure. To assess: Ask test users and test takers to evaluate whether the test appears to measure the construct of interest. Face validity is rarely of interest to test developers and test users. The only instance where face validity is of interest is to instill confidence in test takers that the test is worthwhile. Face validity CANNOT be used to determine the actual interpretive validity of a test. Dr. K. A. Korb University of Jos

Concluding Advice The best way to determine that the measures you use are both reliable and valid is to use a measure that another researcher has developed and validated This will assist you in three ways: You can confidently report that you have accurately measured the variables in the study. By using a measure that has been used before, your study is intimately tied to previous research that has been conducted in your field, an important consideration in determining the importance of your study. It saves you time and energy in developing your measure Dr. K. A. Korb University of Jos

Revision What are the purposes of classroom assessment, achievement testing, and intelligence testing? Compare and contrast the strengths and weaknesses of multiple choice and essay tests Describe the three sources of error that contribute to lowering the reliability of an instrument. How can the reliability coefficient be calculated for each source of error? What are the three types of validity evidence required for psychological measurement? How can each type be assessed? Dr. K. A. Korb University of Jos