Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLEAR 2008 Annual Conference Anchorage, Alaska Fundamental Testing Assumptions Revisited: Examination Length and Number of Options Karine Georges & Kelly.

Similar presentations


Presentation on theme: "CLEAR 2008 Annual Conference Anchorage, Alaska Fundamental Testing Assumptions Revisited: Examination Length and Number of Options Karine Georges & Kelly."— Presentation transcript:

1 CLEAR 2008 Annual Conference Anchorage, Alaska Fundamental Testing Assumptions Revisited: Examination Length and Number of Options Karine Georges & Kelly Piasentin Assessment Strategies Inc.

2 2 Overview  Credentialing organizations seek to balance many factors such as program validity and credibility with more tangible aspects such as costs and ease of development. Two such aspects are investigated: Method to reduce the total number of test questions while retaining validity and reliability. The effects of reducing the typical number of options from four (4) to three (3).

3 3 Part I Examination Length: A Case Study Karine Georges, MSc.

4 4 Case Study: Certification Program Tasked in 2007 to determine whether 180-item, 4-hour examinations could be shortened in light of a potential move to CBT.

5 5 Validity and Examination Length Content Validity: The number of items on an examination must be sufficient to ensure adequate representative coverage. Face Validity: If shortened, perceptions of stakeholders need to be considered vis-a-vis comparable professions.

6 6 Examination Length and Reliability What is an acceptable reliability index for credentialing? “ A reliability correlation coefficient should fall in the high.80s or above for longer examinations (e.g., 150 or more items)”. [NOCA, 2004]. What is the range of reliability indices for the current 180- item certification examinations? Average :.84 Min:.78 Max:.92

7 7 Examination Length and Practical Considerations If reliability is related to item length why shorten the examination? Costs and efficiency Each item costs between $300-$1000 to develop (Vale, 2006). Need additional items for safeguard purposes, or ancillary materials such as prep guides or readiness tests. Client’s intention to go to CBT makes it an advantage to have shorter examinations so seat time can be reduced and more candidates accommodated within the testing period.

8 8 Research Approaches Two approaches: Classical Test Theory (CTT) approach  Examining reliability coefficient using Spearman- Brown formula. Item Response Theory (IRT) approach  Examining the item information function using empirical data.

9 9 CTT Results for the Two Certification Programs Spearman Brown Formulation: Pxx= Npxx 1+ (N-1) pxx Results show that examinations can be lowered by 20-30 questions (or about 10%) and still remain above.80. Number of Items 100%90%75%50% A.91.90.88.84 B.89.88.86.80

10 10 Limitations of CTT Results General Limitations of Spearman Brown: Assumption that examinations are exactly parallel Only one value for a range of abilities Largely impacted by cohort

11 11 IRT Approach: Item Information Curve Research has shown that in higher stakes examinations with Pass/Fail decisions such as certification examinations, examinations can be shortened without impacting classification abilities (Schulz & Wang, 2001) What would be the impact if the certification examinations had 10% fewer items? – How about 25% or 50%?

12 12 IRT - Item Information Curve IRT models specify the probability of a discrete outcome such as a correct response to an item, in terms of person and item parameters. Person parameter: ability of a candidate (theta) Item parameters: a: Discrimination (slope) b: Difficulty (location) c: Guessing

13 13 IRT - Test Information Curve All Item Information Curves add to a Test Information Curve Amount of information scale differs based on length of examination and quality of the items Pass/Fail decision must be made where error is minimal (ideally where the passmark is located) and where level of ability can be clearly differentiated

14 14 IRT Results for Program A

15 15 IRT Results for Program B

16 16 IRT - Results and Implications The examinations can be reduced by at least 10% without significantly impacting the pass/fail decision. Other factors to take into consideration Number of candidates Robustness of item bank

17 17 Other Considerations What about face validity? How would an examination with 90 items be viewed by other professionals compared to a comparable examination of 180 items?

18 18 Other Certification Programs Review of over 75 certification programs within the same profession. The average number of items: 164 or between 150-175 items (including experimental items) Minimum: 100 Maximum: 250

19 19 Summary Data suggest that the number of items can be reduced by 10% with minimal impact on the validity and reliability.

20 20 Part II How Many Options is Optimal in Multiple Choice Testing? Kelly Piasentin, PhD

21 21 Multiple Choice Testing Most common format used in Licensure and Certification examinations Consists of a stem (i.e., the question being asked) and a series of options to choose from (usually 4) Example: In which state is the 2008 CLEAR conference being held? 1.Arkansas 2.Alaska 3.Arizona 4.Alabama Stem Options

22 22 Advantages of Multiple Choice Versatility Efficiency Scoring accuracy and economy Reliability Diagnosis Control of difficulty Amenable to item analysis

23 23 Disadvantages of Multiple Choice Time consuming to write Difficult to create effective distracters (i.e., options that are plausible, but incorrect)

24 24 Time Spent Writing MCQs Sample of 75 Item Writers for 3 different licensing/certification examinations Average time spent writing an MCQ: 52 minutes Percentage of time spent writing: Stem26% Correct Response12% 1 st Distracter11% 2 nd Distracter13% 3 rd Distracter17% Rationales/References21%

25 25 Effort Spent Writing Distracters Of the 75 Item Writers… 25% reported that it was difficult to write the 1 st distracter 40% reported that it was difficult to write the 2 nd distracter 75% reported that it was difficult to write the 3 rd distracter

26 26 How many options should an MCQ have? 4-option MCQs are widely used in standardized testing everywhere But, are 4 options ideal? Some IW guidelines say, “develop as many options as feasible” (Haladyna & Downing, 1989) More recently, “develop as many functional distractors as are feasible” (Haladyna, Downing, & Rodriguez, 2002) Increasing emphasis on the quality of distractors as opposed to the quantity

27 27 Definition of a Functional Distracter “A functional distracter is one that has (a) a significant negative point-biserial correlation with the total test score, (b) a negative sloping item characteristic curve, and (c) a frequency of response greater than 5% for the total group.” Haladyna & Downing (1988)

28 28 How does # options impact guessing? With 4 options, candidates have a 25% chance of getting any one question correct by simply guessing –Probability is reduced to 20% if there are 5 options –Probability is increased to 33% if there are 3 options BUT…. if a typical examination has 25 items, each with 3-options, chance of getting at least a 70% on the examination by pure blind guessing is 1 in 25,000 So, do you get more bang for your buck by having more options?

29 29 Are 4-option MCQs optimal? Factors to consider: Time and cost it takes to develop distracters Time it takes for candidates to complete the examination Psychometric properties of examination –Item difficulty –Item discrimination –Test reliability (Coefficient alpha)

30 30 Arguments in favour of 3-options: Less time is needed to develop two plausible distracters More 3-option items can be administered without increasing testing time –Inclusion of additional high quality items per unit of time should improve test score reliability Having fewer options decreases the likelihood of exposing additional aspects of the domain to candidates (e.g., context clues to other questions)

31 31 Data from a Licensing/Certification Examination Number of MCQs: 235 Number of candidates: 5,393 Mean item difficulty:.721 Mean discrimination index:.166 Test reliability:.88 Most chosen distracter:.167 2 nd most chosen distracter:.077 Least chosen distracter:.035

32 32 Reducing Examination Items to 3 Options What would be the effect on item difficulty, discrimination and reliability of reducing the items on the examination to 3 options if the least chosen distracter was: Attributed to correct answer? Attributed to 2 nd least chosen distracter? Randomly distributed to each of the other 3 choices?

33 33 Reducing Examination Items to 3 Options If least chosen attributed to correct answer: Item difficulty:.752 Mean discrimination index:.136 Coefficient Alpha:.834

34 34 Reducing Examination Items to 3 Options If least chosen attributed to 2 nd least chosen distracter: Item difficulty:.720 Mean discrimination index:.168 Reliability:.881

35 35 Reducing Examination Items to 3 Options If least chosen distributed randomly to each of the other 3 choices: Item difficulty.731 Mean discrimination index:.158 Reliability :.868

36 36 Summary DifficultyDiscriminationReliability 4 options.721.166.880 LCD → Correct.752.136.834 LCD → 2 nd LCD.720.168.881 LCD → Random.731.158.868

37 37 4 Options vs. 3 Options Moving from 4 options to 3 options did not have a significant impact on average item difficulty, discrimination or test reliability.

38 38 Summary Two primary benefits of using 3 options (as opposed to 4 options) –Faster item writing –Better testing Better quality items Cost savings Shorter test time More questions in same amount of time (potential for increased reliability)

39 39 Conclusion These two presentations demonstrate that you can accrue some efficiencies from reducing test length and number of response options without compromising test validity. Further research needed to confirm findings.

40 40 Contact Information Assessment Strategies 1400 Blair Place, Suite 210 Ottawa, ON K1J 9B8 Canada. Telephone: 613-237-0241 E-mail: www.asinc.ca Karine Georges, MSc kgeorges@asinc.cakgeorges@asinc.ca Kelly Piasentin, PhD kpiasentin@asinc.cakpiasentin@asinc.ca


Download ppt "CLEAR 2008 Annual Conference Anchorage, Alaska Fundamental Testing Assumptions Revisited: Examination Length and Number of Options Karine Georges & Kelly."

Similar presentations


Ads by Google