Presentation is loading. Please wait.

Presentation is loading. Please wait.

Powerpoint Templates Page 1 Powerpoint Templates Methods of Standard Setting Natalia Gaponova.

Similar presentations


Presentation on theme: "Powerpoint Templates Page 1 Powerpoint Templates Methods of Standard Setting Natalia Gaponova."— Presentation transcript:

1 Powerpoint Templates Page 1 Powerpoint Templates Methods of Standard Setting Natalia Gaponova

2 Powerpoint Templates Page 2 Introduction All standard setting methods involve expert judgemental decision making at some level... (Jaegar, 1979) There is no such thing as a true standard, but there is a theoretical cut-score that would be set by a judge if he or she totally understood the process, the test, the content, and the policy and had a true score on the test in mind as the standard. The question is whether the standard setting method can recover the theoretical cut-score assuming a judge performed every task consistently and without error (Reckase, 2000) Many different terms are used in the measurement literature to refer to performance standards: “passing scores”, “cut scores”, “cutoff score”, “performance levels”, “achievement levels”, “mastery levels”, “proficiency levels”, “tresholds” and “standards ” (Hambleton, 2001)

3 Powerpoint Templates Page 3 The importance of standard-setting Cut-score – is crucial for all participants of testing must be reasoned and fair necessary to use methods that allow with a mathematical precision to make it possible

4 Powerpoint Templates Page 4 Participants of testing need to compare themselves with other examinees to estimate correctly and adequately their level of mastery of the material Common solution: Setting of cut-scores and division of examinees into groups in accordance with their ability level Policy-makers Are interested in overall level of educational achievements, which could reflect the real situation in schools and classes of a region Interpretation of the mass-testing results

5 Powerpoint Templates Page 5 Professional and ethic responsibility of people, who conduct testing for the provided results 1. Interpretation of the results should be available to any understanding of the audience and should not cause an obvious disagreement with them 2. The results interpretation should reflect real situation and be informative for policy-makers 3. The results interpretation should not have a dual meaning – the examinees of one group should have really different levels of ability from examinees from another group 4. Why is it important to establish reasonable and fair cut-scores?

6 Powerpoint Templates Page 6 Second Page : "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum." Cycle Diagram Test-centered Criterion-referenced Norm- referenced Examinee- centered Standard-Setting Methods Classification of

7 Powerpoint Templates Page 7 The most commonly used classification scheme nowadays is the one suggested by Jaeger (1989) who splits the standard setting methods into two large groups Test-centered Angoff Ebel Nedelsky Jaeger Objective Standard Setting Bookmark Etc. Examinee-centered Method of Contrasting Groups Method of Borderline group Etc.

8 Powerpoint Templates Page 8 ANGOFF Test-centered method

9 Powerpoint Templates Page 9 Method Angoff – one of the most preferred widely and frequently used methods Angoff TraditionalModified

10 Powerpoint Templates Page 10 Procedure of standard setting (traditional method Angoff)  Experts rate the probability that a barely or minimally satisfactory or qualified person would answer each test item correctly  The average of these probabilities across judges or raters is the cutoff score

11 Powerpoint Templates Page 11 Advantages and disadvantages + Transparency and clarity Simplicity Flexibility - ? Objectiveness decision making about the probability of a correct answer by a minimally competent examinee One round in rating variable values (fluctuating rated probability)

12 Powerpoint Templates Page 12 EBEL Test-centered method

13 Powerpoint Templates Page 13 Procedure of Standard Setting 2 Rounds Experts classify independently test items by: I level of difficulty II level of relevance easymediumhard essential importantacceptablequestionable

14 Powerpoint Templates Page 14 For each judge then: All items could be classified 12 cells in a 3*4 grid defined by the three difficulty and four relevance category. As in the example: categoriesExpert №3Expert №4Expert №5 Number of items in a category (А) % correctly performed items (В) А*В Number of items in a category (А) % correctly performed items (В) А*В Number of items in a category (А) % correctly performed items (В) А*В Essential Easy Medium Hard Questionable Easy Medium Hard Mean Mean for all experts 28 Cut-score 12 … …

15 Powerpoint Templates Page 15 How to count a cut-score Judges indicated the percentage of items within each of the 12 cells that a student should answer correctly in order to be judged minimally competent each item assigned to one of the 12 cells based on the expert’s ratings the percent passing judgment for a cell multiplied times the number of items in a cell these products summed over all 12 cells to get an overall passing score for a judge these passing scores - averaged over judges in order to get the composite passing score

16 Powerpoint Templates Page 16 Advantages and disadvantages + Can be used with different types of items (not only multiple-choice) - It may be challenging for standard setting participants to keep the two dimensions of difficulty and relevance distinct because those dimensions may, in some situations, be highly correlated Validity concern has to do with judgments about item relevance. Because the inclusion of items judged to be of questionable relevance appears on its face to weaken the validity evidence supporting defensible interpretation of the total test scores

17 Powerpoint Templates Page 17 NEDELSKY Test-centered

18 Powerpoint Templates Page 18 General concept Nedelsky proposed considering the characteristics and performance of a hypothetical borderline examinee that he referred to as the “F-D student”. Responses (distractors) which the lowest D-student should be able to reject as incorrect, and which therefore should be attractive to [failing students] are called F- responses… Students who possess just enough knowledge to eliminate F-responses and must choose among the remaining responses at random are called F- D students. Nedelsky proposed considering the characteristics and performance of a hypothetical borderline examinee that he referred to as the “F-D student”. Responses (distractors) which the lowest D-student should be able to reject as incorrect, and which therefore should be attractive to [failing students] are called F- responses… Students who possess just enough knowledge to eliminate F-responses and must choose among the remaining responses at random are called F- D students.

19 Powerpoint Templates Page 19 Procedure of Standard Setting be able to eliminate as incorrectThe experts independently determine F-responses which minimally competent examinees would be able to eliminate as incorrect The number of other options determines the probability with which the candidate will answer correctly the question: a plausible answer = 100%, 2 = 50%, 3 = 33%, 4 = 25%, and 5 = 0% probability of a correct answer

20 Powerpoint Templates Page 20 An example Participants judged that, for a certain five-option item, borderline examinees would be expected to rule out two of the options as incorrect, leaving them to choose from the remaining three options. The Nedelsky rating for this item would be 1/3 = Repeating the judgment process for each item would give a number of Nedelsky values equal to the number of items in the test (n). The sum of the n values can be directly used as a raw score cut score. For example, a 50-item test consisting entirely of items with Nedelsky ratings of 0.33 would yield a recommended passing score of 16.5 (i.e., 50 × 0.33 = 16.5)Participants judged that, for a certain five-option item, borderline examinees would be expected to rule out two of the options as incorrect, leaving them to choose from the remaining three options. The Nedelsky rating for this item would be 1/3 = Repeating the judgment process for each item would give a number of Nedelsky values equal to the number of items in the test (n). The sum of the n values can be directly used as a raw score cut score. For example, a 50-item test consisting entirely of items with Nedelsky ratings of 0.33 would yield a recommended passing score of 16.5 (i.e., 50 × 0.33 = 16.5)

21 Powerpoint Templates Page 21 Advantages and disadvantages + Nedelsky method is used for many years to establish threshold assessment. Probably it’s been popular for many years, because the procedure is clear for experts, they can make a decision about responses quickly, which is minimally competent examinee would be able to eliminate as incorrect. It can be used without preliminary approbation of a test - Can be used only with multiple- choice items Raters tend not to assign probabilities of 1.00 (i.e., to judge that a borderline examinee could rule out all incorrect response options), this tends to create a downward bias in item ratings (i.e., a rating of.50 is assigned to an item instead of 1.00) with the overall result being a somewhat lower passing score than the participants may have intended to recommend, and somewhat lower passing scores compared to other methods

22 Powerpoint Templates Page 22 BOOKMARK Test-centered (based on Item-Response Theory)

23 Powerpoint Templates Page 23 Directions to Bookmark participants Ordered item booklet Booklet guideline Student exemplar papers Scoring Guide Essential materials

24 Powerpoint Templates Page 24 Standard Setting Presentation of the percentage of students falling into each performance level and each median cut-score from Round 2. After discussion individual judgments Overview of established cut-scores by every expert, repeating of the same procedure as in the first step Experts are informed about the essential number of cut-scores to establish. Experts work in of cut-scores to establish. Experts work in small groups, all the essential material is introduced to them introduced to them Basic steps of the procedure Round III Round II Round I

25 Powerpoint Templates Page 25 Round 1 The main goals are to get panelists familiar with the ordered item booklet, set initial bookmarks, and then discuss the placements. Panelists are asked to discuss and determine the content that students should master for placement into a given performance level. Their independent judgments of cut-scores are expressed by simply placing a bookmark between the items judged to represent a cut-point. One bookmark is placed for each of the required cut-points. Items preceding the participant's bookmark reflect content that all students at the given performance level are expected to know and be able to perform successfully with a probability of at least 0.67 or 0.50.

26 Powerpoint Templates Page 26 Round 2 The first activity in Round 2 involves having each member place bookmarks in his/her ordered item booklet where each of the other panelists in their small group made their bookmark placement. For a group of 6 people, each panelist’s ordered booklet will have 6 bookmarks for each cut point. Discussions are then focused on the items between the first and last bookmarks for each performance level. Upon completion of this discussion, the panelists then independently reset their bookmarks. The median of the Round 2 bookmarks for each cut point is taken as that group’s recommendation for that cut-point.

27 Powerpoint Templates Page 27 Round 3 The percentage of students falling into each performance level is presented, given each group’s median cut-score from Round 2. With this information of how students actually performed, the panelists discuss the bookmarks in the large group and then make their Round 3 independent judgments of where to place the bookmarks. The median for the large group is considered to be the final cut-point for a given performance level.

28 Powerpoint Templates Page 28 METHOD OF CONTRASTING GROUPS Examinee-centered

29 Powerpoint Templates Page 29 Method of contrasting groups Procedure includes testing of two groups of examinees Comparison of the distribution of test scores for each examinee, who was classified by category In the place of intersection of two distributions cut-score Competent Non-competent

30 Powerpoint Templates Page 30

31 Powerpoint Templates Page 31 Advantages and disadvantages + Can be used with any kind of an item type- Classifying students on competent and non-competent is doubted to be objective

32 Powerpoint Templates Page 32 THANK YOU FOR ATTENTION Your questions?


Download ppt "Powerpoint Templates Page 1 Powerpoint Templates Methods of Standard Setting Natalia Gaponova."

Similar presentations


Ads by Google