Chapter 4 Characteristics of a Good Test

Slides:

Advertisements

Similar presentations

Topics: Quality of Measurements

Advertisements

The Research Consumer Evaluates Measurement Reliability and Validity

© 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Validity and Reliability Chapter Eight.

Psychometrics William P. Wattles, Ph.D. Francis Marion University.

VALIDITY AND RELIABILITY

1Reliability Introduction to Communication Research School of Communication Studies James Madison University Dr. Michael Smilowitz.

Reliability for Teachers Kansas State Department of Education ASSESSMENT LITERACY PROJECT1 Reliability = Consistency.

What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more.

MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT

Chapter 4A Validity and Test Development. Basic Concepts of Validity Validity must be built into the test from the outset rather than being limited to.

Reliability and Validity of Research Instruments

Assessment: Reliability, Validity, and Absence of bias

RESEARCH METHODS Lecture 18

Concept of Measurement

© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 5 Making Systematic Observations.

Research Methods in MIS

Chapter 7 Correlational Research Gay, Mills, and Airasian

Classroom Assessment A Practical Guide for Educators by Craig A

Understanding Validity for Teachers

Measurement Concepts & Interpretation. Scores on tests can be interpreted: By comparing a client to a peer in the norm group to determine how different.

Technical Issues Two concerns Validity Reliability

Measurement and Data Quality

Reliability, Validity, & Scaling

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

MEASUREMENT CHARACTERISTICS Error & Confidence Reliability, Validity, & Usability.

Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 14 Measurement and Data Quality.

Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.

LECTURE 06B BEGINS HERE THIS IS WHERE MATERIAL FOR EXAM 3 BEGINS.

Technical Adequacy Session One Part Three.

Student assessment AH Mehrparvar,MD Occupational Medicine department Yazd University of Medical Sciences.

Chapter 4: Test administration. z scores Standard score expressed in terms of standard deviation units which indicates distance raw score is from mean.

Reliability & Validity

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

Selecting a Sample. Sampling Select participants for study Select participants for study Must represent a larger group Must represent a larger group Picked.

Validity and Reliability Neither Valid nor Reliable Reliable but not Valid Valid & Reliable Fairly Valid but not very Reliable Think in terms of ‘the purpose.

Validity Validity: A generic term used to define the degree to which the test measures what it claims to measure.

RELIABILITY AND VALIDITY OF ASSESSMENT

Presented By Dr / Said Said Elshama  Distinguish between validity and reliability.  Describe different evidences of validity.  Describe methods of.

Criteria for selection of a data collection instrument. 1.Practicality of the instrument: -Concerns its cost and appropriateness for the study population.

Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.

Nurhayati, M.Pd Indraprasta University Jakarta.  Validity : Does it measure what it is supposed to measure?  Reliability: How the representative is.

Technical Adequacy of Tests Dr. Julie Esparza Brown SPED 512: Diagnostic Assessment.

Chapter 6 - Standardized Measurement and Assessment

VALIDITY, RELIABILITY & PRACTICALITY Prof. Rosynella Cardozo Prof. Jonathan Magdalena.

Validity & Reliability. OBJECTIVES Define validity and reliability Understand the purpose for needing valid and reliable measures Know the most utilized.

Classroom Assessment Chapters 4 and 5 ELED 4050 Summer 2007.

Reliability EDUC 307. Reliability  How consistent is our measurement?  the reliability of assessments tells the consistency of observations.  Two or.

©2013, The McGraw-Hill Companies, Inc. All Rights Reserved Chapter 5 What is a Good Test?

Copyright © 2014 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 11 Measurement and Data Quality.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

Measurement and Scaling Concepts

1 Measurement Error All systematic effects acting to bias recorded results: -- Unclear Questions -- Ambiguous Questions -- Unclear Instructions -- Socially-acceptable.

ESTABLISHING RELIABILITY AND VALIDITY OF RESEARCH TOOLS Prof. HCL Rawat Principal UCON,BFUHS Faridkot.

Survey Methodology Reliability and Validity

Concept of Test Validity

Selecting Employees – Validation

Associated with quantitative studies

Evaluation of measuring tools: validity

RELIABILITY OF QUANTITATIVE & QUALITATIVE RESEARCH TOOLS

Validity and Reliability

Reliability & Validity

Human Resource Management By Dr. Debashish Sengupta

پرسشنامه کارگاه.

Reliability and Validity of Measurement

PSY 614 Instructor: Emily Bullock, Ph.D.

Evaluation of measuring tools: reliability

RESEARCH METHODS Lecture 18

Chapter 8 VALIDITY AND RELIABILITY

Chapter 3: How Standardized Test….

Presentation transcript:

Chapter 4 Characteristics of a Good Test

The following are the Characteristics of a good test:

Validity Reliability Objectivity Power of Discrimination Administrability Economy of Reusability Relevance or Practicability Interpretability

Validity is the degree to which a test measures what it intends to measure. It is often expressed numerically as a coefficient of correlation with another test of the same kind and of known validity. A good test item is valid when it does what it is expected to do.

Types of Validity Content Validity also known as face validity or logical validity refers to the relevance of the test items to the subject matter or situation from which they are taken and the individual’s test responses to the behavior area under consideration.

Types of Validity Concurrent Validity refers to the degree to which the test agrees or correlates with the criterion that is set up as an acceptable measure. It also refers to the correspondence of the scores of a group in a test with the scores of the same group in a similar test of already known validity.

Types of Validity Predictive Validity refers to the degree of accuracy of a test predicting the level of performance in a certain activity.

Types of Validity Construct Validity refers to the agreement of test results with certain characteristics that the test aims to portray.

Factors that Influence Test Validity

Appropriateness of Test Items Thinking skills cannot be measured by measuring knowledge of facts. A test that is valid for measuring knowledge of facts is not valid for measuring the skills in problem solving.

2. Directions Unclear directions tend to reduce validity. 3 2. Directions Unclear directions tend to reduce validity. 3. Construction of Test Items When items unintentionally provide clues, the test becomes a test on detecting clues.

4. Arrangement of items Test items should be arranged in an increasing order of difficulty 5. Difficulty of items When the test items are too difficult or too easy, they cannot discriminate between the bright and the slow pupils. Validity is lowered. When the test do not match the difficulty level specified in the instructional objectives, their validity is likewise reduced.

6. Reading Vocabulary and Sentence Structures When reading vocabulary and sentence structures are very difficult, the test becomes a test in reading or intelligence rather than what it is intended to measure. 7. Length of the test The test items should not be too long nor too short for the examinees to answer. The number of the test items should be fairly adequate.

8. Pattern of answers In a true-false test, a student can answer correctly the items even without really knowing the answers if a pattern of true and false responses is established. Example: True, False; False, False or 1-5 true and 6-10 false.

Reliability is the degree of consistency between two measures of the same thing. Statistics Reliability is the consistency of a set of measurements or measuring instruments.

Experimental Sciences Reliability is the extent to which the measurements of the test remains consistent over repeated tests of the same subject under identical conditions.

Theoretical Definition Reliability is the proportion of score variance that is caused by systematic variation in the population of test takers.

to determine the reliability Procedures to determine the reliability of achievement Test

Administer two forms of test Administer two forms of test. Two forms of test measures the same objectives and items of parallel difficulty. If the students who got high scores on form A are also the same with form B, then generally the test is reliable.

2. If you don’t have two forms of test, you can actually repeat the administration of the test under approximately the same conditions, but with sufficiently long interval between testing so that the pupils are presumed to have forgotten the details.

3. You can determine the reliability coefficient of the test by one of two method; the split-half method or odd and even scores and the analysis of variance method.

Estimates of Reliability Estimates of Stability Often called Test-Retest Estimate of Reliability this is obtained by administering a test to a group of individuals then re-administering the same test to the individuals at a later date and correlating the two sets of scores

Estimates of Reliability B. Measures of Equivalence Obtained through giving two forms of the test having the same degree of difficulty and ease to the same group of individuals on the same day and correlating to the test results.

Estimates of Reliability C. Split-Half Method This is determined by establishing the relationship between the scores of two equivalent halves of a test administered to a group at one time, then correlating to the results.

Estimates of Reliability D. Rationale Equivalence Reliability This is obtained through the use of the Kuder-Richardson Method. The method estimates internal consistency by determining how all terms on a test relate to all other items and to the total test.

Factors Affecting Test Reliability 1. Adequacy This refers to the appropriate length of the test and the proper sampling of the test content. It also refers to the extent or degree to which the test samples previously determined outcomes in terms of knowledge, skills and attitudes in the learning area being measured.

Factors Affecting Test Reliability 2. Objectivity This refers to the degree to which equally competent scorers obtain the same results. A test is objective if it yields the same score no matter who checks it or even it is checked at different times. It must be noted that criteria of a good test are validity, reliability and objectivity are closely related and interdependent.

Factors Affecting Test Reliability 3. Testing Condition This has something to do with the condition of the testing room. Students become restless if the room is too hot, sleepy if its too cold, lose enthusiasm if the room is not well ventilated. Testing must also be done in a familiar and clean place.

Factors Affecting Test Reliability 4. Test Administration Procedure Directions should be clearly stated, understood, and strictly followed. Before the test is administered, the teacher should see to it that students know what to do and what not to do.

Item Analysis This event gives information concerning each of the following points; discriminating power of an item and effectiveness of each item.

Benefits of Item Analysis It gives useful information for class discussion of the test. It gives data for helping the students to improve their leaning method. It gives insights and skills which lead to the construction of better test items for future use.

Steps in Item Analysis 1. Arrange the papers from highest to lowest. 2. Get 27% of the high scores and 27% of the lowest scores. 3. Count the number of students from the higher group to the lower group who answered each item correctly. 4. Estimate the index of difficulty of each item. (% of students who got the item right)

Steps in Item Analysis 5. Estimate discriminating index of each item (difference between the number of pupils in the upper and lower groups who got the item right). 6. Evaluate the effectiveness of the distracters in each item (attractiveness of the incorrect alternatives)

Index of Difficulty Formula Index of Diff. = nH + nL N Where: nH: number of students in high group answering the item correctly. nL: number of students in lower groups answering the item correctly. n: total number of students in both high and low groups.

Difficulty Indices 0.00-0.20 Very Difficult 0.21-0.40 Difficult 0.41-0.60 Moderately Difficult 0.61-0.80 Easy 0.81-1.00 Very Easy

Index of Discrimination Formula Index of Disc. = nH + nL N Where: nH: number of students in high group answering the item correctly. nL: number of students in lower groups answering the item correctly. n: total number of students in both high and low groups.

Reasonably good, subject to improvement Discrimination Level 0.40 and above Very Good 0.30-0.39 Reasonably good, subject to improvement 0.20-0.29 Needs Improvement 0.19 and below Poor/to be discarded

Evaluation of Item Analysis Discrimination Level Difficulty Level Item Category High Opt./Easy Difficult Good Easy/Difficult Fair Moderate High/Moderate Poor/Difficult Low At any level Poor

Discrimination Constructed in such a way that it will detect or measure all differences in achievement or attainment, picking out the good students from the poor ones.

Positive Discrimination Negative Discrimination When the good students get the item right significantly more than the poor students Negative Discrimination When more poor students get the item right significantly more that the bright students.

No Discrimination When there is the same number of students from the bright to low groups answering correctly the item.

Administrability The test is easy to give in the sense that no time or effort is wasted in the process and the conditions surrounding its giving and taking at different times are more or less kept the same. The directions are also clear and the prints do not have typographical error and are very clear.

Scorability This is met when the test papers are accurately scored or rated in the simplest, quickest and most routine fashion possible.

Comparability The test is enhanced if its results can be compared with those of its previous administration or with norms of previous performance on the test by other groups of examinees on duplicate forms of the test.

Comparability Can be done trough the use of answer sheets separate from the test question. Utility The utilization of a test and its results should be so planned that the test is made to serve the purpose for which it was constructed and administered.

Usability This refers to the degree to which the measuring instrument can be satisfactorily used by teachers, supervisors, and school administrators without undue expenditure in making sound educational decisions.

5 Factors that determine Usability Ease of administration Ease of scoring Ease of interpretation and application Low cost Proper mechanical make-up