How to Make a Test & Judge its Quality. Aim of the Talk Acquaint teachers with the characteristics of a good and objective test See Item Analysis techniques.

1 How to Make a Test & Judge its Quality

2 Aim of the Talk Acquaint teachers with the characteristics of a good and objective test See Item Analysis techniques that help to improve the quality of a test by identifying items that are candidates for retention, revision or removal clarify what concepts the examinees have and have not mastered

3 Types of Tests Criterion-referenced tests Norm-referenced tests Standardized tests Ipsative Tests

4 Focus of this Talk The following guidelines apply most appropriately to tests that are designed to identify differences in achievement levels between students (Norm-referenced tests) Some of the criteria outlined either do not apply or apply in somewhat different ways to tests designed to measure mastery of content (Criterion-referenced tests)

5 Important factors in judging a tests quality 1. Course Objectives 2. Fairness to Students 3. Conditions of Administration 4. Measure of Achievements 5. Time Limits 6. Difficulty Index 7. Discrimination Index 8. Levels of Ability 9. Test Reliability 10. Accuracy of Scores Depend on the knowledge and judgment of the teacher Can be aided by various statistical analysis techniques

6 1. Course Objectives Does the test reflects course objectives? Good Practices Make a Test Plan Content to be covered Relative emphasis to be given to included topics Teachers should exchange examinations for review and constructive criticism Teachers should not feel obligated to accept and apply all the suggestions made by their colleagues, as good teachers usually have their own unique style and special abilities

7 2. Fairness to Students A test is fair if it emphasizes the knowledge, understanding and abilities that were emphasized in the actual teaching of the course There is no such thing as out-of-course if the relevant concepts were covered in the class Probably no such test has ever been taken that was regarded as perfectly fair by all persons taking it Nevertheless, student feedback after the test is very important e.g., ambiguity or confusion in questions, figures, tables etc.

8 3. Conditions of Test Administration No confusion or disturbance during the test Prevent cheating, use of unfair means Satisfactory conditions of light, heat and comfort etc. Again, student feedback can be helpful here

9 4. Measure of Achievements Students should be judged on their knowledge, understanding, abilities and interests instead of on the basis of what they remember or what they read in preparation for the test Knowledge of terms and isolated facts/trivia is a low measure of achievement For example, question like Explain the Ethernet frame format or Define and explain the Two-Army Problem do not measure important achievements Majority of the questions should deal with applications, understanding and generalizations of the learned concepts

10 5. Time Limits Tests should be work-limit tests rather than time-limit tests Students scores should depend on how much they can do and not on how fast they can do it Speed may be important in repetitive, clerical-type operations, but it is not important in critical or creative thinking or decision making Test time limits be generous enough for at least 90% of the students to attempt and complete all questions in the test

11 6. Item Difficulty Index (p) It is the proportion of students that answered the item correctly If almost all students get an item correct/incorrect then the item is not very efficient For ideal MCQs, difficulty indices are about.50 to.70 For the test as a whole, the difficulty index should be about midway between the expected chance score and the maximum possible score The p value varies with each class group that takes the test

12 7. Item Discrimination Index (D) It is a measure of an item's ability to discriminate between good and poor students Students in the top 27% in terms of total test score are taken to be good students and vice versa The discrimination index is a basic measure of the validity of an item Validity: Whether a student got an item correct or not is due to their level of knowledge or ability and not due to something else such as chance or test bias

13 7. Item Discrimination Index (D) How to interpret D D can take on negative values and can range between and 1.00 D = 1.00 is Perfect Positive Discriminator Most psychometricians say that items yielding D values of 0.30 and above are good discriminators and worthy of retention for future exams D value is unique to a group of examinees An item with satisfactory discrimination for one group may be unsatisfactory for another

14 8. Levels of Ability For a test to distinguish clearly between students at different levels of ability it must yield scores of wide variability The larger the standard deviation (σ), the better the test A σ value equal to one-sixth of the range between the highest possible score and the expected chance score is generally considered an acceptable standard

15 9. Test Reliability The reliability coefficient represents the estimated correlation between the scores on the test and scores on another equivalent test, composed of different items, but designed to measure the same kind of achievement The highest possible value is 1.00 This level is difficult to achieve consistently with homogeneous class groups and with items that previously have not been administered, analyzed, and revised A reasonable goal for teachers to set is a reliability estimate of.80

16 10. Accuracy of Scores The accuracy of the scores is reflected by the standard error of measurement (SEM), a statistic computed using the standard deviation and the reliability coefficient If the SEM is 2 score points, for example, one can say that about two-thirds of the scores reported were within 2 points of each students true score. About one-sixth of the students received scores more than 2 points higher than they should have received. The remaining one-sixth received scores more than 2 points too low The SEM simply serves as an indication of how much chance error remains in the scores from even a good test

17 Conclusions Item Analysis itself doesn't improves a test Its main purpose is to serve as a guide to the teacher Teachers can conduct the analysis themselves but usually the last five factors are (and should be) implemented by a Evaluation and Examination Department The analysis techniques work reliably on classes of 30 or more students

18 References How to Judge the Quality of an Objective Classroom Test: Evaluation and Examination Service, The University of Iowa Haladyna, T.M. & Downing, S.M. & Rodriguez, M.C. (2002). A review of multiple- choice item-writing guidelines for classroom assessment. Applied Measurement in Education, 15(3), Zurawski, R. (1998). Making the Most of Exams: Procedures for Item Analysis. National Teaching and Learning Forum, Vol. 7 Item Analysis Guidelines: Scoring Office of Michigan State University ( Wikipedia, the free encyclopedia

