Presentation is loading. Please wait.

Presentation is loading. Please wait.

David DiBattista, Ph.D. Brock University Department of Psychology Creating Effective Multiple-choice Questions July, 2012 ©D. DiBattista 2012.

Similar presentations


Presentation on theme: "David DiBattista, Ph.D. Brock University Department of Psychology Creating Effective Multiple-choice Questions July, 2012 ©D. DiBattista 2012."— Presentation transcript:

1 David DiBattista, Ph.D. Brock University Department of Psychology Creating Effective Multiple-choice Questions July, 2012 ©D. DiBattista 2012

2 Overview   Some essential terminology   The why and the how of testing   Two challenges in MC testing   Addressing the challenges ©D. DiBattista 2012

3 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? A. ordinal scale B. nominal scale C. ratio scale D. interval scale A well-constructed four-option multiple-choice question ©D. DiBattista 2012

4 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? A. ordinal scale B. nominal scale C. ratio scale D. interval scale  This part is the STEM. ©D. DiBattista 2012

5 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? A. ordinal scale B. nominal scale C. ratio scale D. interval scale  These are the OPTIONS. ©D. DiBattista 2012

6 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? A. ordinal scale B. nominal scale C. ratio scale D. interval scale  The one correct (or best) option is the KEYED OPTION. ©D. DiBattista 2012

7 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? A. ordinal scale B. nominal scale C. ratio scale D. interval scale  The incorrect options are called DISTRACTORS. ©D. DiBattista 2012

8 Overview   Some essential terminology   The why and the how of testing   Two challenges in MC testing   Addressing the challenges ©D. DiBattista 2012

9 The Why and How of Testing A primary goal of testing To measure the extent to which test-takers have learned the facts, concepts, procedures, and skills that have been taught in the course. An effective test Test-takers who have learned more will obtain higher test scores, and those who have learned less will obtain lower scores. To be effective, a test must consist of effective items. ©D. DiBattista 2012

10 The Why and How of Testing A primary goal of testing To measure the extent to which test-takers have learned the facts, concepts, procedures, and skills that have been taught in the course. An effective test Test-takers who have learned more will obtain higher test scores, and those who have learned less will obtain lower scores. To be effective, a test must consist of effective items. ©D. DiBattista 2012

11 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item A poor item An awful item What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores. Percent answering correctly This chart is on Page 2 of the handout. ©D. DiBattista 2012

12 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item90 A poor item An awful item Percent answering correctly ©D. DiBattista 2012 What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores. This chart is on Page 2 of the handout.

13 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050 A poor item An awful item Percent answering correctly ©D. DiBattista 2012 What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

14 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item An awful item Percent answering correctly ©D. DiBattista 2012 What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

15 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item71 An awful item ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

16 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item7169 An awful item ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

17 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item7169+2 An awful item ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores. This poor item is simply not pulling enough weight.

18 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item7169+2 An awful item ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores. Note that these two items are equally difficult (i.e., 70% chose the keyed option in each item).

19 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item7169+2 An awful item5 ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

20 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item7169+2 An awful item525 ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

21 Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item7169+2 An awful item525B20 ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

22 The Discrimination Index is often expressed as a proportion: +40  +0.40 We always want the Discrimination Index to be positive, and the bigger the better. (Top-Bottom) = Discrimination Index Top 25% of test-takers Bottom 25% of test-takersTop-Bottom A good item9050+40 A poor item7169+2 An awful item525B20 ©D. DiBattista 2012 Percent answering correctly What is an effective test item? For an individual MC test item to be effective, test-takers with higher test scores must be more likely to answer it correctly than those with lower scores.

23 Interpreting the Discrimination Index Value of DIInterpretation ≥+0.50 +0.40 to +0.49 +0.30 to +0.39 +0.20 to +0.29 0 to +0.19 <0 This chart is on Page 3 of the handout. ©D. DiBattista 2012

24 Value of DIInterpretation ≥+0.50Outstanding +0.40 to +0.49 +0.30 to +0.39 +0.20 to +0.29 0 to +0.19 <0 ©D. DiBattista 2012 Interpreting the Discrimination Index This chart is on Page 3 of the handout.

25 Value of DIInterpretation ≥+0.50Outstanding +0.40 to +0.49Very good +0.30 to +0.39 +0.20 to +0.29 0 to +0.19 <0 ©D. DiBattista 2012 Interpreting the Discrimination Index

26 Value of DIInterpretation ≥+0.50Outstanding +0.40 to +0.49Very good +0.30 to +0.39Good +0.20 to +0.29 0 to +0.19 <0 ©D. DiBattista 2012 Interpreting the Discrimination Index

27 Value of DIInterpretation ≥+0.50Outstanding +0.40 to +0.49Very good +0.30 to +0.39Good +0.20 to +0.29 Acceptable (but could be better!) 0 to +0.19 <0 ©D. DiBattista 2012 Interpreting the Discrimination Index

28 Value of DIInterpretation ≥+0.50Outstanding +0.40 to +0.49Very good +0.30 to +0.39Good +0.20 to +0.29 Acceptable (but could be better!) 0 to +0.19 Unsatisfactory (despite being ≥0) <0 ©D. DiBattista 2012 Interpreting the Discrimination Index

29 Value of DIInterpretation ≥+0.50Outstanding +0.40 to +0.49Very good +0.30 to +0.39Good +0.20 to +0.29 Acceptable (but could be better!) 0 to +0.19 Unsatisfactory (despite being ≥0) <0Harmful! ©D. DiBattista 2012 Interpreting the Discrimination Index

30 A key point The Discrimination Index tends to suffer when items are either very easy or very hard.  Very easy: 85% or more answer correctly  Very hard: 35% or less answer correctly  “Just right”: 40 to 80% answer correctly ©D. DiBattista 2012 This information is on Page 3 of the handout.

31 100 MC items Class mean = 62.6% Mean Discrimination Index = +0.42 Harder Easier 10% of items are weak discriminators. This chart is on Page 4 of the handout. ©D. DiBattista 2012 Difficulty Index=0.71 Discrimination Index=0.30

32 211 MC items Class mean = 66.0% Mean Discrimination Index = +0.23 Harder Easier 51% of items are poor discriminators, and 6.6% are negative. ©D. DiBattista 2012

33 Key points In general, the Discrimination Index of MC items will be greatest when:  they are in the mid-range of difficulty,  they conform to widely-accepted item- writing guidelines,  their content is consistent with the course learning objectives, and  the instruction provided has allowed motivated students to learn the material. ©D. DiBattista 2012 This information is on Page 3 of the handout.

34 Overview   Some essential terminology   The why and the how of testing   Two challenges in MC testing   Addressing the challenges ©D. DiBattista 2012

35 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012

36 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012

37 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012

38 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012

39 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012

40 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012

41 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012

42 A good MCQ is difficult to write. Many will contain item writing flaws and most will do no more than test factual recall. Our study has shown that this does not necessarily have to be the case, but it cannot be assumed that (just) anyone can write a quality MCQ unaided and without peer review. Palmer & Devitt, 2007 ©D. DiBattista 2012 Some good news: Writing high-quality MC items is a learnable skill!

43 Challenge #1 “Many will contain item-writing flaws”  Flawed items less effectively discriminate among students who differ in achievement. ©D. DiBattista 2012 Two Challenges in MC Testing

44 Challenge #2 “Most will do no more than test factual recall”  An emphasis on memory-based items over higher-level items may threaten the content validity of the test. ©D. DiBattista 2012 Two Challenges in MC Testing

45 Overview   Some essential terminology   The why and the how of testing   Two challenges in MC testing   Addressing the challenges  Constructing high-quality items  Assessing higher-level thinking ©D. DiBattista 2012

46 Tips for MC Item Construction When writing the stem, use question format rather than sentence-completion format. ©D. DiBattista 2012 The complete list of tips is on Page 5 of the handout.

47 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? A. ordinal scale B. nominal scale C. ratio scale D. interval scale ©D. DiBattista 2012 Here the stem is in question format.

48 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? A. ordinal scale B. nominal scale C. ratio scale D. interval scale ©D. DiBattista 2012 Here the stem is in question format.

49 You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of a(n) A. ordinal measurement scale. B. nominal measurement scale. C. ratio measurement scale. D. interval measurement scale. ©D. DiBattista 2012 Here the stem ends with an incomplete sentence. Question format works better. Shuttling—ESL

50  The stem should present the issue under consideration CLEARLY and contain as much information as possible.  Do not include irrelevant information in the stem unless it plays a role in the assessment procedure.  Avoid using long, complex sentences. ©D. DiBattista 2012 Tips for MC Item Construction

51 Poor South America A. imports coffee from Australia. B. is where the Gobi Desert is located. C. was heavily colonized by people from Spain. D. has a larger population than the United States of America. The stem is not informative at all, and there really is no question here. ©D. DiBattista 2012 Having lengthy options increases the amount of reading that students must do.

52 Even worse South America, which has an area of more than 17 million square kilometres, A. imports coffee from Australia. B. is where the Gobi Desert is located. C. was heavily colonized by people from Spain. D. has a larger population than the United States of America. Avoid window dressing. ©D. DiBattista 2012

53 Poor Which of the following statements about South America is true? A. South America imports coffee from Australia. B. The Gobi Desert is located in South America. C. South America was heavily colonized by people from Spain. D. South America has a larger population than the United States of America. This is really a multiple true-false question. The stem contains little information and does not pose a question related to the topic. ©D. DiBattista 2012

54 Better People from which of these countries colonized a large part of South America? A. Spain B. France C. Holland D. England ©D. DiBattista 2012 This is a clear, straightforward question, and it is focused on a single topic. Even poorly constructed items can sometimes provide the inspiration for a useful item!

55 Which classical theorist’s insights were tested by Zurcher in the real-life social laboratory provided by the Kansas tornado about change and social solidarity? A. Martineau B. Marx C. Durkheim D. Weber ©D. DiBattista 2012

56 Which classical theorist’s insights about change and social solidarity did Zurcher study in the context of the Kansas tornado? A. Martineau B. Marx C. Durkheim D. Weber ©D. DiBattista 2012

57 Which classical theorist’s insights about change and social solidarity did Zurcher study in the context of the Kansas tornado? A. Martineau B. Marx C. Durkheim D. Weber The importance of good writing! ©D. DiBattista 2012

58 If we suppose that John’s score on a recent Canadian History test is 80, and the distribution of test scores, which has a mean of 70 and a standard deviation of 10, contains 100 scores and is positively skewed, then what is John’s standard score? A. -1.0 B. +1.0 C. -10.0 D. +10.0 ©D. DiBattista 2012 This question has 45 words, all in one sentence!

59 Henry David Thoreau (1817-1862) “Simplify, simplify.”

60 If we suppose that John’s score on a recent Canadian History test is 80, and the distribution of test scores, which has a mean of 70 and a standard deviation of 10, contains 100 scores and is positively skewed, then what is John’s standard score? A. -1.0 B. +1.0 C. -10.0 D. +10.0 ©D. DiBattista 2012 So let’s simplify this 45-word question…

61 A set of 100 test scores is positively skewed, with a mean of 70 and a standard deviation of 10. John’s test score is 80. What is his standard score? A. -1.0 B. +1.0 C. -10.0 D. +10.0 ©D. DiBattista 2012 Easy reading: The stem now has 30 words in three sentences, including a straightforward question.

62 ©D. DiBattista 2012 A set of 100 test scores is positively skewed, with a mean of 70 and a standard deviation of 10. John’s test score is 80. What is his standard score? A. -1.0 B. +1.0 C. -10.0 D. +10.0 Aim to make sentences shorter and simpler, rather than longer and more complex.

63 Note that some information in the stem is not needed to answer the questionC but that’s okay here. For math-based problems:  Focus on the principles.  Keep the numbers simple.  Put options in reader-friendly order. ©D. DiBattista 2012 A set of 100 test scores is positively skewed, with a mean of 70 and a standard deviation of 10. John’s test score is 80. What is his standard score? A. -1.0 B. +1.0 C. -10.0 D. +10.0

64 After a bad day at work, George comes home and yells at his young son, who starts to cry. What type of behaviour is A. projection B. displacement C. sublimation D. reaction formation he demonstrating? ©D. DiBattista 2012 Watch for ambiguity!

65 After a bad day at work, George comes home and yells at his young son, who starts to cry. What type of behaviour is A. projection B. displacement C. sublimation D. reaction formation George demonstrating? ©D. DiBattista 2012 Watch for ambiguity!

66 Mary loves being in the limelight. On which Big Five factor would you expect her to have a very high score? A. conscientiousness B. extroversion C. agreeableness D. neuroticism ©D. DiBattista 2012 Watch for idioms and uncommon words.

67 Mary enjoys talking and spending time with others, and many of her friends consider her a natural leader. On which Big Five factor would you expect Mary to have a very high score? A. conscientiousness B. extroversion C. agreeableness D. neuroticism Of course, discipline-related technical terms are perfectly appropriate. ©D. DiBattista 2012 Watch for idioms and uncommon words.

68 Whenever possible, avoid negative wording in the stem, and be sure to emphasize it when it does occur. ©D. DiBattista 2012 Tips for MC Item Construction

69 Poor Which of the following terms is not usually associated with Sigmund Freud? A. superego B. extinction C. repression D. latent content ©D. DiBattista 2012

70 Better Which of the following terms is NOT usually associated with Sigmund Freud? A. superego B. extinction C. repression D. latent content Negation adds an extra cognitive burden, so use it only when really necessary. ©D. DiBattista 2012

71 Also better Which of the following terms is usually associated with behaviourism? A. synesthesia B. extinction C. repression D. closure ©D. DiBattista 2012 The keyed option is still the same, but the question is now positively framed.

72 Check carefully for spelling and grammatical errors, giving special attention to distractors. ©D. DiBattista 2012 Tips for MC Item Construction

73 What do stamp collectors use stamp hinges for? A. to pick up stamps B. to fold learge stamps in half C. to mount stamps in albums D. to joining stamps together ©D. DiBattista 2012 Errors like these are more likely to crop up in the distractors than in the keyed option. Such errors can give clues to testwise students!

74 What do stamp collectors use stamp hinges for? A. to pick up stamps B. to fold learge stamps in half C. to mount stamps in albums D. to joining stamps together ©D. DiBattista 2012 Errors like these are more likely to crop up in the distractors than in the keyed option. Such errors can give clues to testwise students!

75 What do stamp collectors use stamp hinges for? A. to pick up stamps B. to fold large stamps in half C. to mount stamps in albums D. to join stamps together ©D. DiBattista 2012 Errors like these are more likely to crop up in the distractors than in the keyed option. Such errors can give clues to testwise students!

76  All distractors should be plausible.  Four options will usually be quite adequate, but the number used is best determined by the number of PLAUSIBLE distractors you can supply. ©D. DiBattista 2012 Tips for MC Item Construction

77 Which river flows through the city of Edmonton? A. North Saskatchewan River B. Peace River C. Milk River D. Athabasca River ©D. DiBattista 2012 These four rivers are all in the same “domain.”

78 E. Mississippi River F. Seine River Which river flows through the city of Edmonton? A. North Saskatchewan River B. Nile River C. Amazon River D. Rhine River Distractor plausibility is a key to success! ©D. DiBattista 2012

79  To generate plausible distractors  Use students’ most common errors on constructed-response tests.  Use distractors that are similar to the correct answer in content, length, and complexity.  Use words that sound important or have associations to the stem.  Use distractors that are true, but do not correctly answer the question. ©D. DiBattista 2012 Tips for MC Item Construction

80 Name the river that flows through the city of Edmonton. Athabasca River What do stamp collectors use stamp hinges for? To join stamps together   ©D. DiBattista 2012 And listen carefully to questions students ask, and watch for their misconceptions.

81  To generate plausible distractors  Use students’ most common errors on constructed-response tests.  Use distractors that are similar to the correct answer in content, length, and complexity.  Use words that sound important or have associations to the stem.  Use distractors that are true, but do not correctly answer the question. ©D. DiBattista 2012 Tips for MC Item Construction

82 In severe cases of obesity, there may be a substantial increase in the number of adipocytes. Which of the following terms is used to refer to this increase? A. hyperbole B. hyperplasia C. hypertrophy D. hypertonicity ©D. DiBattista 2012 Knowing that the answer is “hyper-something” is not enough to get this item correct.

83 A. Barack Obama B. Muhammad Ali C. Martin Luther D. Joseph Wolpe Who developed the behavioural therapy known as systematic desensitization? These four people have little in common— that is, they are not in the same domain. ©D. DiBattista 2012

84 A. Anna Freud B. Jean Piaget C. Wilhelm Wundt D. Joseph Wolpe Who developed the behavioural therapy known as systematic desensitization? More challenging: All four of these people are well known within the domain of psychology. ©D. DiBattista 2012

85 A. Ivan Pavlov B. Albert Ellis C. B. F. Skinner D. Joseph Wolpe Even more challenging: All four of these people have a connection to the domain of behavioural psychology. ©D. DiBattista 2012 Who developed the behavioural therapy known as systematic desensitization?

86  To generate plausible distractors  Use students’ most common errors on constructed-response tests.  Use distractors that are similar to the correct answer in content, length, and complexity.  Use words that sound important or have associations to the stem.  Use distractors that are true, but do not correctly answer the question. ©D. DiBattista 2012 Tips for MC Item Construction

87 In responding to a lengthy survey, a man answers “yes” to every yes-no question asked. It is reasonable to suspect that his responses may be influenced by which of the following? A. response acquiescence B. opportunistic characterization C. the partial reinforcement effect D. the conspicuous agreement predisposition ©D. DiBattista 2012

88  To generate plausible distractors  Use students’ most common errors on constructed-response tests.  Use distractors that are similar to the correct answer in content, length, and complexity.  Use words that sound important or have associations to the stem.  Use distractors that are true, but do not correctly answer the question. ©D. DiBattista 2012 Tips for MC Item Construction

89 Which of the following events caused the Prime Minister of Canada to proclaim the War Measures Act? A. Quebec was invaded by Germany in 1940. B. The October Crisis occurred in 1970. C. The first Quebec Referendum was held in 1980. D. The Meech Lake Accord was defeated in 1990. ©D. DiBattista 2012 Option A can be ruled out simply because it is a FALSE statement. Because Options C and D are TRUE, they must be considered as possible answers to the question posed in the stem.

90  Avoid patterns in the length and location of correct answers that could provide clues that are unrelated to content.  Balance the answer key so that the correct response appears in each position about the same number of times. ©D. DiBattista 2012 Tips for MC Item Construction

91 What characteristic of hallucinations would make their occurrence sufficient for a diagnosis of schizophrenia? A. a satanic or religious theme B. bizarre content C. derailment and neologisms D. voices providing a running commentary on the person’s behaviour, or two or more voices conversing with one another The keyed response is too often the longest– and testwise students know this! ©D. DiBattista 2012

92 In a four-option multiple-choice test, about how often should the correct answer appear in each of the four locations? A. 10% of the time B. 25% of the time C. 40% of the time D. 60% of the time Balance the answer key! ©D. DiBattista 2012

93 Who invented the binaural recording system commonly known as “stereo”? A. Xxxxxxxxxxxxxx B. Xxxxxxxxxxxxxx C. Xxxxxxxxxxxxxx D. Xxxxxxxxxxxxxx “Edge avoidance” ©D. DiBattista 2012 When the four options appear, make your best guess as quickly as you can! Edge avoidance can be a major problem for the creators of MC tests! In one test I came across, 74% of the keyed options were either B or C. Think about those testwise students!

94 In one test I came across, 74% of the keyed options were either B or C. Think about those testwise students! Thanks, Alan! ©D. DiBattista 2012 Who invented the binaural recording system commonly known as “stereo”? A. Xxxxxxxxxxxxxx B. Xxxxxxxxxxxxxx C. Xxxxxxxxxxxxxx D. Xxxxxxxxxxxxxx Who invented the binaural recording system commonly known as “stereo”? A. Alan Dower Blumlein B. Alan Dower Blumlein C. Alan Dower Blumlein D. Alan Dower Blumlein

95 For numerical options, let the correct answer appear in each of the positions about the same number of times. ©D. DiBattista 2012 Tips for MC Item Construction

96 How many chromosomes are found in an ovum of a healthy adult woman? A. 18 B. 23 C. 37 D. 46 Item-writers tend NOT to let the key be either the smallest or largest value in the option list. Knowing this, testwise students discount the smallest and largest values. ©D. DiBattista 2012 ← Options are in reader-friendly order.

97 Avoid having the options include a single pair of opposites, one of which is the keyed option. ©D. DiBattista 2012 Tips for MC Item Construction

98 A psychologist administers an aptitude test to 200 people, and then one month later she has the same people take the test again. The correlation between the two sets of scores is +0.91. What should she conclude about the test? A. 91% of the items are effective. B. It has poor test-retest reliability. C. It has good test-retest reliability. D. It has poor criterion-related validity. A problem: When the options include a single pair of opposites, one member of the pair is the keyed option 75-80% of the time. ©D. DiBattista 2012

99 A psychologist administers an aptitude test to 200 people, and then one month later she has the same people take the test again. The correlation between the two sets of scores is +0.91. What should she conclude about the test? A. It has poor test-retest reliability. B. It has good test-retest reliability. C. It has poor criterion-related validity. D. It has good criterion-related validity. Using two pairs of opposites generally solves the problem. ©D. DiBattista 2012

100 Do not use “none of the above.” ©D. DiBattista 2012 Tips for MC Item Construction

101 Which of these 19th century authors wrote Middlemarch? A. Jane Austen B. Anne Bronte C. Wilkie Collins D. none of the above “None of the above” as the key ©D. DiBattista 2012

102 I’m sure Dickens wrote Middlemarch, so I’ll go with “none of the above.” ©D. DiBattista 2012

103 Dickens didn’t write Middlemarch. George Eliot wrote it! But here is the problem: When NOTA is the keyed option, misinformed students often earn full marks. ©D. DiBattista 2012 NOTA is often used as “the distractor of last resort.”

104 Which of these 19th century authors wrote Middlemarch? A. Jane Austen B. Anne Bronte C. Wilkie Collins D. none of the above ©D. DiBattista 2012 D. George Eliot So let’s fix this NOTA item…

105 Do not use “all of the above.” ©D. DiBattista 2012 Tips for MC Item Construction

106 Which of these terms is associated with Sigmund Freud? A. superego B. repression C. latent content D. all of the above ©D. DiBattista 2012

107 I never heard of latent content, but superego and repression are both definitely Freudian terms, so it must be “all of the above.” ©D. DiBattista 2012

108 Which of these terms is associated with Sigmund Freud? A. superego B. repression C. latent content D. all of the above ©D. DiBattista 2012

109 Which of these terms is associated with Sigmund Freud? A. superego B. repression C. latent content D. all of the above ©D. DiBattista 2012

110 Which of these terms is associated with Sigmund Freud? A. superego B. repression C. latent content ??? D. all of the above ©D. DiBattista 2012

111 Which of these terms is associated with Sigmund Freud? A. superego B. repression C. latent content ??? D. all of the above When AOTA is the keyed option, students with partial knowledge can still earn full marks. ©D. DiBattista 2012 Moreover, AOTA usually serves as the keyed option–and testwise students know this!

112 Better Which of these terms is associated with Sigmund Freud? A. latent content B. fixed-interval schedule C. cognitive dissonance D. bulimia nervosa ©D. DiBattista 2012

113 Overview   Some essential terminology   The why and the how of testing   Two challenges in MC testing   Addressing the challenges  Constructing high-quality items  Assessing higher-level thinking ©D. DiBattista 2012 

114 Two Challenges in MC Testing Challenge #2 “Most (MCQs) do no more than test factual recall”  An emphasis on memory-based items over higher-level items may threaten the content validity of the test. ©D. DiBattista 2012

115 Evaluation Synthesis Analysis Application Comprehension Knowledge The Original Bloom’s Taxonomy ©D. DiBattista 2012

116 Factual Knowledge Dimension Conceptual ProceduralMetacognitive Cognitive Process Dimension Remember Understand Apply Analyze Evaluate Create The Revised Bloom’s Taxonomy Anderson and Krathwohl, 2001 ©D. DiBattista 2012 See Pages 6-7 of the handout!

117 Factual Knowledge Dimension Conceptual ProceduralMetacognitive Cognitive Process Dimension Remember Understand Apply Analyze Evaluate Create The Revised Bloom’s Taxonomy Anderson and Krathwohl, 2001  These are all ACTION verbs—i.e., things students can DO with their knowledge. ©D. DiBattista 2012

118 Good tests allow us to determine what our students are capable of. …but they can be used effectively to assess all of the other cognitive processes. MC questions are not useful for assessing creativity… Factual Knowledge Dimension Conceptual ProceduralMetacognitive Cognitive Process Dimension Remember Understand Apply Analyze Evaluate Create The Revised Bloom’s Taxonomy Anderson and Krathwohl, 2001 Can you remember this? Can you understand this? Can you apply this? Can you evaluate this? Can you create this? Can you analyze this?      ©D. DiBattista 2012 Some thoughts about REMEMBER …  These are all ACTION verbs—i.e., things students can DO with their knowledge.

119 ALL assessment tasks involve using memory to at least some degree. BUT “If assessment tasks are to tap higher-order cognitive processes, they must require that students cannot answer them correctly by relying on memory ALONE.” —Anderson and Krathwohl, 2001, page 71 A simple, unfortunate fact: Creating MC items that rely on memory alone is far easier than creating higher-level items. ©D. DiBattista 2012

120 Let’s take a closer look at how the cognitive processes in the Revised Bloom’s Taxonomy relate to multiple-choice questions.

121 COGNITIVE PROCESS DIMENSION REMEMBER Retrieve relevant knowledge from long-term memory Recognize; Recall UNDERSTAND Determine the meaning of instructional messages, including oral, written, and graphic communications Interpret; Exemplify; Classify; Summarize; Infer; Compare; Explain  Observable behaviours ©D. DiBattista 2012 See Pages 8-9 of handout for further details.

122 Because MC is a selected response technique, remember-level items always involve recognition rather than recall. ©D. DiBattista 2012 What city is the capital of the state of California? A. Sacramento B. Los Angeles C. San Francisco D. Fresno

123 Remember-level items are very easy to create, which is probably why there are so many of them on classroom tests! If the options were not included in this item, it would involve recall rather than recognition. ©D. DiBattista 2012 What city is the capital of the state of California?

124 COGNITIVE PROCESS DIMENSION REMEMBER Retrieve relevant knowledge from long-term memory Recognize; Recall UNDERSTAND Determine the meaning of instructional messages, including oral, written, and graphic communications Interpret; Exemplify; Classify; Summarize; Infer; Compare; Explain  Observable behaviours ©D. DiBattista 2012

125 “If assessment tasks are to tap higher-order cognitive processes, they must require that students cannot answer them correctly by relying on memory ALONE.” —Anderson and Krathwohl, 2001, page 71 Interpret. In the graph shown below, which group has the most variability in its scores? ©D. DiBattista 2012 Note the important role of NOVELTY. If the exact same chart is shown in the textbook, then this will actually be a remember-level item! A. Group 1 B. Group 2 C. Group 3 D. Group 4

126 Classify. You are reading an article in which the world’s major cities are ranked with respect to the quality of life for their residents. This is an example of what type of measurement scale? Exemplify. Which of the following is an example of negative feedback? Summarize. Which of the following statements best summarizes Carol Gilligan’s response to Lawrence Kohlberg’s theory of moral development? ©D. DiBattista 2012

127 Infer. Which of the words listed below best completes the following analogy? Retina is to Cranial Nerve II as hair cells are to ______. Compare. In what way are a neuron and a battery similar to each other? Explain. Why is the z-test for independent samples so rarely used? ©D. DiBattista 2012

128 APPLY Carry out or use a procedure in a given situation Execute; Implement ANALYZE Break material into its constituent parts and detect how the parts relate to one another and to an overall structure or purpose Differentiate; Organize; Attribute  Observable behaviours ©D. DiBattista 2012

129 Execute Working with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the median for this set of scores? A. 0 B. 2 C. 3 D. 5 ©D. DiBattista 2012

130 Execute Working with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the median for this set of scores? A. 0 B. 2 C. 3 D. 5 Execution involves being told what procedure to apply and then carrying it out. ©D. DiBattista 2012

131 Implement Working with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the most appropriate measure of central tendency for this set of scores? A. 0 B. 2 C. 3 D. 5 ©D. DiBattista 2012

132 Implement Working with an ordinal data scale, Jeff obtained the following five scores: 0, 0, 2, 5, 18. What is the value of the most appropriate measure of central tendency for this set of scores? A. 0 B. 2 C. 3 D. 5 Implementation involves deciding what procedure to apply and then carrying it out. ©D. DiBattista 2012

133 APPLY Carry out or use a procedure in a given situation Execute; Implement ANALYZE Break material into its constituent parts and detect how the parts relate to one another and to an overall structure or purpose Differentiate; Organize; Attribute  Observable behaviours ©D. DiBattista 2012

134 Differentiate Keri’s history test grade was 70. A total of 200 students took the test, and the lowest score was 30. The class mean was 60, and the variance was 100. Which of these values must you use to obtain Keri’s standard score? A. 30, 70, 100 B. 30, 70, 200 C. 60, 70, 100 D. 60, 70, 200 Differentiation involves distinguishing the parts of a whole with respect to their relevance or importance. ©D. DiBattista 2012

135 Organize Suppose you are reviewing the research literature on a particular topic. Which of the following patterns would be most likely to describe the methodological progress of the research over time? A. case studies first, then experimental studies, then correlational studies B. case studies first, then correlational studies, then experimental studies C. experimental studies first, then case studies, then correlational studies D. experimental studies first, then correlational studies, then case studies Organization involves identifying the elements of a situation and recognizing how they fit together into a coherent structure. ©D. DiBattista 2012

136 Attribution involves determining the point of view, bias, values, or intent associated with a written work or an action. Attribute Which of the following would a Rogerian therapist be MOST likely to say when working with a client? A. You seem to be feeling a bit down today. B. Your dream about going to the zoo—what do you think it might signify? C. You should talk to your sister and find out if she agrees with you. D. There are some things I want you to work on before we meet again next week. ©D. DiBattista 2012

137 EVALUATE Make judgments based on criteria and standards Check; Critique CREATE Put elements together to form a novel, coherent whole or make an original product Generate; Plan; Produce  Observable behaviours ©D. DiBattista 2012

138 Checking involves looking for internal contradictions and determining whether a conclusion is appropriate, and assessing whether evidence supports or disconfirms a hypothesis. Check Alyssa has carried out a one-way ANOVA for independent groups and rejected the null hypothesis. Which of the following would indicate to you that Alyssa has made an error in her work? A. She says that df-total is 197. B. She says that F-critical is 0.47. C. She says that the F-statistic is 6.84. D. She says that eta-squared is 0.22. ©D. DiBattista 2012

139 Check Which of these research findings would suggest that differences in Trait X are influenced by genetic factors? A. Sisters reared apart have more similar scores on X than sisters reared together. B. Sisters reared together have more similar scores on X than sisters reared apart. C. Identical twins reared together have more similar scores on X than fraternal twins reared together. D. Fraternal twins reared together have more similar scores on X than identical twins reared together. Checking involves looking for internal contradictions and determining whether a conclusion is appropriate, and assessing whether evidence supports or disconfirms a hypothesis. ©D. DiBattista 2012

140 Critiquing involves assessing the positive and negative aspects of a product, idea or action and making a judgment based on external criteria. Critique Bill wants to compare the effectiveness of two training methods for teaching people to juggle. He obtains a group of non-jugglers and randomly assigns each person to one of the two training methods. He sets alpha at 0.05, two-tailed, and he determines that beta is equal to 0.60. Which of the following is a valid criticism of this research study? A. The power of the statistical test is too low. B. The probability of a Type I error is too high. C. He should use a one-tailed test. D. People should select their own training method. ©D. DiBattista 2012

141 EVALUATE Make judgments based on criteria and standards Check; Critique CREATE Put elements together to form a novel, coherent whole or make an original product Generate; Plan; Produce  Observable behaviours ©D. DiBattista 2012

142 EVALUATE Make judgments based on criteria and standards Check; Critique CREATE Put elements together to form a novel, coherent whole or make an original product Generate; Plan; Produce Because multiple choice is a selected response technique, it is NOT useful for assessing the ability to create. Other testing techniques are needed to do this.  Observable behaviours ©D. DiBattista 2012

143 Overview   Some essential terminology   The why and the how of testing   Two challenges in MC testing   Addressing the challenges  Constructing high-quality items  Assessing higher-level thinking ©D. DiBattista 2012  

144 David DiBattista, Ph.D. Brock University Department of Psychology Creating Effective Multiple-choice Questions July, 2012 ©D. DiBattista 2012


Download ppt "David DiBattista, Ph.D. Brock University Department of Psychology Creating Effective Multiple-choice Questions July, 2012 ©D. DiBattista 2012."

Similar presentations


Ads by Google