Presentation is loading. Please wait.

Presentation is loading. Please wait.

Characteristic of a Good Test

Similar presentations


Presentation on theme: "Characteristic of a Good Test"— Presentation transcript:

1 Characteristic of a Good Test
Ningtyas O.A, M.Pd STKIP Siliwangi Bandung

2 General Principles of Good Practice for Assessing Student Learning (Dorobat, 2007)
Assessment is not an end in itself but a vehicle for educational improvement. Assessment is most effective when it reflects on understanding of learning as multidimensional, integrated, and revealed in performance over time Assessment should reflect that learning is a complex process i.e. it involves knowledge, values, attitudes, habits of mind that affect both academic success and performance in real life

3 Brown (2004) stated that the principles of good test should satisfy for the conditions as follows:
Validity Reliability Practicality Authenticity Backwash

4 Validity of a test Validity means the test should adequately measure what is supposed to measure. A valid test is one in which a testee’s score gives a true reflection of his ability on the trait. Statistical and descriptive means have been used to check validity.

5 There are three factors that affect the validity of test results (Gronlund, 1985), namely:
Factor of the instrument If the instrument is not in good quality, then the validity of student learning outcomes would be unfavorable. Factor of how a test is administered and is scored Deviation (error) at the time of a test is administered and is scored, such as the allocation of time, cheating, scoring errors, students’ physical and psychological condition will also affect the validity of test results. Factor of students’ response These factors include the students’ willingness to quickly answer the question, answered by trial and error, and the use of certain language style in answering essay questions.

6 In general, there are two types of validity: Internal validity
The criteria of instrument validity present in the instrument itselft. content validity, face validity, construct validity Exernal validity The criteria of instrument validity is based outside criteria, namely it is based on empirical facts or experiences. curricular validity, criterion-related validity, concurrent validity, predictive validity.

7 Reliability Reliability refers to the consistency of scores or answers from one administration of an instrument to another, or from one set of items to another. If a teacher gave the same test to the same student or matched students on two different occasions, the test should yield similar results. Reliability is a requirement for validity. A test is not valid unless it meets the conditions of reliability.

8 There are four factors that may affect the reliability of the test (Gronlund, 1985):
Test length There is a trend that the longer / more items, the higher reliability. The more items, the more materials which are measured and the proportion of correct answers given by the students will probably increase, so the guessing factor will be even lower. Distribution of scores The amount of the score distribution will create a higher level of reliability, due to greater reliability coefficient obtained when students remain in the relatively same position in a testing group to the next test.

9 3. The level of difficulty (difficulty index) The too difficult or too easy items will likely result in low reliability. Such items would result in a small discriminating power. The ideal level of difficulty will be obtained if the items yield the normal curve-shaped distribution of scores. Easy items Moderate items Difficult items

10 4. Objectivity Objectivity refers to the same test scores between a student and the others. This means that if there are students who have similar capabilities, they will obtain the same test results when they take the same tests.

11 Method to determine the reliability of the test:
Test-Retest Method Test – Retest method i.e. re-administer the test to the same testees after a lapse of time (no more than two weeks). Parallel Forms to the Same Group Administer parallel forms of the test to the same group. The second test should be identical in its sampling, difficulty, length, rubrics. The Split-Half Method Divide the test (into the first and second halves) and the corresponding scores obtained.

12 Practicality of test Practicality refers to the degree to which the test is easy to prepare, use, interprete and of course to store. Practicality of a test (instrument) is emphasis more on the efficiency and effectiveness of a test to measure student learning outcomes. The criteria of test practicality can be seen as follows: It is not excessively expensive Stays within appropriate time constraints It is relatively easy to administer It has a specific and effective scoring procedure

13 Likely to be impracticle
Likely to be practicle Teachers use the essay test to measure the responses of 200 students on the discussion group. Teachers use the oral test to measure the results of group discussions. Teachers use the computer-based answer sheet (LJK), but no scanner available to check the LJK. Teachers provide the answer sheet of plain paper to answer the questions of daily tests Teachers provide internet-based listening test, while the internet has not been adequately available Teachers use a tape recorder for listening test. Teachers prepare an English test comprising 150 items and it should be done by students for 3 hours Teachers prepare an English test comprising 50 items and it should be done by students for 1.5 hours

14 Washback of the test Washback or backwash refers to effects of language testing on teaching and learning (Aldersen & Wall, 1993). A test affects participants, processes and products in teaching and learning. The washback could be positive or negative, both for the students or the teachers. Washback can be observed solely at the ‘micro’ level of the individuals (mostly teachers and students).

15 Examples of negative effects:
Teaching is dominated by coaching for the testing session/examination; The test content and testing techniques differ from the objective of the course; Examples of positive effects: When the student’s motivation to learn more is increased; When the teacher tried hard to improve the qulity of teaching

16 5 Most Commonly used Test Format
Multiple Choice True or False Matching Type Fill-in the blanks (Sentence Completion) Essay Source: Turn-out of Test Questions in SSI ( )

17 1. Multiple Choice Test

18 What to Look for on Multiple Choice Tests
When checking the stems (questions) for correctness: Ensure that the stem asks a clear question. Reading level is appropriate to the students The stem is grammatically correct. Negatively stated stems are discouraged.

19 Multiple Choice Questions
Use negatively stated stems sparingly and when using negatives such as NOT, underline or bold the print. Use none of the above and all of the above sparingly, and when you do use them, don't always make them the right answer. Only one option should be correct or clearly best.

20 Multiple Choice Questions:
All options should be homogenous and nearly equal in length. The stem (question) should contain only one main idea. Keep all options either singular or plural. Have four or five responses per stem (question).

21 Multiple Choice Questions:
When using incomplete statements, place the blank space at the end of the stem versus the beginning. When possible, organize the responses. Reduce wordiness. When writing distracters, think of incorrect responses that students might make.

22 Example of not good stem
Sheldon developed a highly controversial theory of personality based on body type and temperament of the individual. Which of the following is a criticism of Sheldon's work? a. He was influenced too much by the Freudian psychoanalysis. b. His rating of physique and temperament were not independent. c. He failed to use empirical approach. d. His research sample was improperly selected.

23 Examples Better: (Eliminate excessive wording and irrelevant information) 1. Which of the following is a criticism of Sheldon's theory of personality?

24 Example of not good stem & Options
The first paragraph is … a. about the materials of weathering. b. about the process of weathering. c. about the impact of weathering. d. about the result of weathering.

25 Examples Better: (Include in the stem any word(s) that might otherwise be repeated in each option.) The first paragraph is about _______. a. the materials of weathering. b. the process of weathering. c. the impact of weathering. d. the result of weathering.

26 Example of not good stem
Which is not a major technique for studying brain function? a. Accident and injury b. Cutting and removing c. Electrical stimulation d. Direct phrenology

27 Examples Better: (Use negatively stated stems sparingly. When used, underline and/or capitalize the negative word.) Which is NOT a major technique for studying brain function? The related examples: The following statements are TRUE based on the text … These are any activities done by the farmers, EXCEPT …

28 Example of not good stem
4. ________________ is the least form of behavior disorder. a. Psychosis b. Panic disorder c. Neurasthenia d. Neurosis

29 Examples Better: (When using incomplete statements avoid beginning with the blank space.) The least severe form of behavior disorder is __________________.

30 Example of not good options
The number of photoreceptors in the retina of each human is about a. 115 million b. 5 million c. 65 million d. 35 billion

31 Examples Better: (When possible, present alternatives in some logical order.) The number of photo receptors in the retina of each human is about a. 5 million b. 35 million c. 65 million d. 115 million

32 Example of not good options
6. Latane and Darley's smoke-filled room experiment suggested that people are less likely to help in groups than alone, because people a. in groups talk to one another. b. who are alone are more attentive. c. in groups do not display pluralistic ignorance. d. in groups allow others to define the situation as a non-emergency

33 Examples Better: (All alternatives should be approximately equal in length.) 6. Latane and Darley's smoke-filled room experiment suggested that people are less likely to help in groups than alone, because people in groups a. talk to one another b. are less attentive than people who are alone c. do not display pluralistic ignorance d. allow other to define non-emergencies

34 Strengths of Multiple-choice Test
More representative in term of material/content. Easy to administer and analyze More objective in scoring Weaknesses of Multiple-choice Test The technique tests only recognition knowledge Guessing may have a considerable effect on test scores The technique severely restricts what can be tested Washback may be harmful Cheating may be facilitated

35 2. True or False

36 What to Look for on True/False Tests
Each statement is clearly true or clearly false. Trivial details should not make a statement false. Statements are written concisely without more elaboration than necessary. Statements are NOT quoted exactly from text.

37 Tips in Making True/False Tests
Give emphasis on the use of quantitative terms than qualitative terms. Avoid using of specific determiners which usually gives a clue to the answer. False = all, always, never, every, none, only True = generally, sometimes, usually, maybe, often Discourage the use of negative statements. Whenever a controversial statement is used, the authority should be quoted. Discourage the use of pattern for answers.

38 Examples: Find the errors, and/or problems with the following true-false tests. ____ 1. Repetition always strengthens the tendency for a response to occur. (Using "always" usually means the answer is false.)

39 Examples: _____ 2. The process of extinction is seldom immediate but extends over a number of trials. (Words like "seldom" usually indicate a true statement.)

40 Examples: _____ 3. The mean, median, and mode are measures of central tendency, whereas the standard deviation and range are measures of variability. (Express a single idea in each statement.) e.g.“The mean, median, and mode are measures of central tendency.”

41 3. Matching Type test

42 Parts of the Matching Type Test (Horizontal Type)
Column A (Premise) Column B (Response)

43 Parts of the Matching Type Test (Vertical Type)
(Response) (Premise)

44 What to Look for on Matching Type Tests
The list of responses should be relatively short. Response options should be arranged alphabetically or numerically. Directions clearly indicate the basis for matching. Can responses be used more than once? Where will you place your answer? Can students infer relationships or are they based on real world logic?

45 What to Look for on Matching Type Tests
Position of matches should be varied. Avoid using patterns. The choices of each matching set should be on one page There are more responses than premises in a single set if responses cannot be used more than once.

46 What to Look for on Matching Type Tests
The premises are homogeneous as well as the responses and are grouped as one item. If responses can be used more than once, it should be proportional to the number of premises (3:5 or 4:10)

47 Examples of not good matching:
Directions: Match the following. Food A. Primary reinforcer Psychoanalysis B. Sigmund Freud B.F. Skinner C. Operant conditioning Standard deviation D. Measure of variability Schizophrenia E. Hallucinations

48 Examples: Better: (Use homogenous material in matching items, and if responses are not to be used more than once, include more responses than stimuli.) Match the theories in Column A with their proponents in Column B. Write the letter of the correct answer. Column A Column B ___ 1. Psychodynamic Theory A. Albert Bandura ___ 2. Trait Theory B. B.F. Skinner ___ 3. Behaviorism C. Carl Rogers ___ 4. Humanism D. Gordon Allport ___ 5. Social Learning Theory E. Karn Horney F. Raymond Cattell G. Sigmund Freud

49 The other Examples: In the blank space in column A, put the letter of that in column B that has the same or similar meaning. Column A Column B ___ 1. Board A. Catch up with ___ 2. Call B. Get in ___ 3. Postpone C. Get on ___ 4. Remove D. Keep off ___ 5. Continue E. Keep on ___ 6. Overtake F. Look out for ___ 7. Arrive G. Look up to ___ 8. Be careful of H. Point out I. Put across J. Put off K. Run across L. Take off

50 4. Sentence Completion / Fill-in the Blanks

51 Sentence Completion Tests
What to Look for on Sentence Completion Tests Only significant words are omitted. When omitting words, enough clues are left so that the student who knows the correct answer can supply the correct response. Ensure that grammatical clues are avoided. Blanks are at the end of the statement. The length of the responses are limited to single words or short phrases. Questions are not lifted as verbatim quotes from text.

52 Examples: An animal with six legs is called _________.
The item is so indefinite. It can be completed with answers such as bee, mosquito or any other insect Better: 1. Animals with six legs are called ___________.

53 Examples: The __________ is the answer in _____.
Too many key words are omitted. Lines are not in equal length. Better: 1. The product is the answers in _________.

54 Examples: 1. If a mango weighs 250 grams, 10 mangoes would weigh ______. There are two possible answers – 250 grams and 0.25 kilos. Better: 1. If a mango weighs 250 grams, 10 mangoes would weigh ____ grams.

55 5. Essay / Short Answer Test

56 Types of Essay Items: Extended response type Restricted response type
The test may be answered by the examinee in whatever manner he wants Example: Do you think teachers should be allowed to work abroad as domestic helpers? Explain your answer!. Restricted response type The test limits the examinees response may be answered by the examinee’s responses in terms of length, content, style or organization. Example: Give and explain three reasons why the government should or should not allow teachers to work abroad as domestic helpers.

57 What to Look for on Essay Tests
The task is clearly defined. The students are given an idea on the scope and direction you intended for the answer to take. The question starts with a description of the required behavior to put them in the correct mind frame. E.g. “Compare” or “Analyze” The questions are written in the linguistic level appropriate to the students. Questions require a student to demonstrate command of background information, not simply repeating information.

58 What to Look for on Essay Tests
Questions regarding a student’s opinion on a certain issue should focus not on the opinion but on the way it is presented and argued. A larger number of shorter, more specific questions are better, than, one or two longer questions.

59 Proposed Criteria in Grading Essay Test
Ideas (20%) Weight of Evidence Presented (40%) Correct Usage (20%) Logical Conclusions drawn from the evidence (20%)

60 Example: Better: (Clearly explain what is expected of the student.)
What is wrong with this question? Describe asthma? Better: (Clearly explain what is expected of the student.) Describe asthma. Include in your answer : the pathophysiologic features of asthma the clinical manifestations associated with an asthma episode the management of an asthma episode. (10 points)

61 Example: What is wrong with this question? Who is better, Rizal or Bonifacio? Better: ( The students are given an idea on the scope and direction you intended for the answer to take.) Compare and contrast the method used by Rizal and Bonifacio in promoting nationalism. (5 points)

62 6. Other types of Test Questions

63 Restricted Response Test (RRT)
Test takers are not given choices as possible answers. Items ask for a specific answer to each questions. Example: Who discovered the America continent? Enumerate the four elements of the state?

64 Principles in constructing RRT
Do not ask for trivial facts or details. It is not only useless but also frustrates the students. How many balls are used in a 9-ball match? Questions should elicit facts not opinions? What do you think President SBY should do for the country to recover from its’ economic deficit? Minimize questions that call for sheer memory work unless if the answer has important analytical significance. When will the next president be sworn to office?

65 Chronological Sequencing Test (CST)
Test takers are asked to arrange items in a systematic or logical order. Arrange the presidents according to their term of office. _____ Soeharto _____ Megawati Soekarno Putri _____ Ir. Soekarno _____ B.J. Habibie _____ Abdurrahman Wahid

66 Principles in constructing CST
Items should be homogenous and are related to each other. There should not be more than 5 items in each set. Do not number the items. This confuses the students. All items to be arranged should be in the same page. Directions should be clearly stated and that each set should be labeled about their relevance.

67 What is wrong in this test question?
Arrange the following events in their chronological order. The first thing I saw was the Oceanorium. It’s the Australia’s largest marine park. Where you can watch all sorts of sea fish and animals underwater. Last Sunday, I visited marine park called Sea World at Surfer’s Paradise near Brisbane. After the show, I had my lunch at a shape-like- ship restaurant. The show was in a big outdoor of swimming pool Then I watched the performance of sea animals.

68 Better: War in the Pacific
Arrange the following events in chronological order. Write the numbers 1-5 on the blanks provided. ___ USAFEE forces in Bataan surrender to the Japanese. ___ Japanese forces attacks the US fleet in Pearl Harbor, Hawaii. ___ Japan breaks diplomatic ties with the US. ___ The US declares war with Japan. ___ Gen. MacArthur escapes to Australia from Corregidor.

69 Proposed Arrangement of Test Items
True or False Multiple Choice Matching Type Sentence Completion Others (RRT/Analogy/CST) Essay

70 Things to Remember: Making a good test takes time
Teachers have the obligation to provide their students with the best evaluation Tests play an essential role in the life of the students, parents, teachers and other educators

71 POINTS TO PONDER… A good lesson makes a good question
A good question makes a good content A good content makes a good test A good test makes a good grade A good grade makes a good student A good student makes a good COMMUNITY

72 Thank you …


Download ppt "Characteristic of a Good Test"

Similar presentations


Ads by Google