Presentation is loading. Please wait.

Presentation is loading. Please wait.

Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference.

Similar presentations


Presentation on theme: "Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference."— Presentation transcript:

1 Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference Assessment in Education Ritz Carlton Resort & Spa, Montego Bay April 2-4, 2013

2 Concept of a Test Some of the earliest forms of assessment or testing may be noted in biblical references. Adam and Eve, for example, were subjected to a simple test in the Garden of Eden based on a test item presented in a negative form. Another account, is taken from Judges 12: It was an oral examination (shibolleth) devised by the Gileadite army to identify members of the defeated Ephraimite army who were attempting to escape under cover of a false identity. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

3 Outside of the biblical accounts, historians generally agree that the Chinese were the first to use large scale testing These were introduced as early as 2000 B.C. to measure the proficiency of candidates for public office and to reduce patronage Today, we think of a test as an item/question, problem or task or a mix of these, administered under prescribed conditions Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

4 It is designed to elicit responses that provide information to make judgements about a candidate.
It is a systematic procedure for measuring a sample of a candidate’s behaviour that can give an accurate and truthful account of a candidate’s skills, knowledge or ability, or other characteristics, at the time the test was administered. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

5 Reliability of Test Scores
Two essential requirements for a technically sound test are reliability and validity. Reliability is the extent to which test scores are consistent or dependable. Only to the extent that scores are reliable can they be useful in conveying information about a student’s performance. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

6 From a more technical standpoint, reliability is the extent to which scores are free from errors of measurement. Classical Test Theory (CTT) defines reliability as a property that is based on three considerations: observed scores, true scores and measurement errors. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

7 This may be represented simply as: Xo = Xt + Xe
In Classical Test Theory, a person’s observed score is a function of that person’s true score, plus error.  This may be represented simply as: Xo = Xt + Xe Where Xo represents the observed score; Xt represents the true score; and Xe represents the error. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

8 The level of confidence we can have in test scores hinges on how much error we have in the observed scores of students. Reliability, or level of confidence we can have in test scores, is expressed as a index ranging from 0 to 1. It may therefore be .99 (high) or .10 (low). Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

9 The reliability coefficients commonly used to determine and report on the consistency with which a test measures are derived from various approaches: test-retest, alternative form, internal consistency, split-half and inter-rater (a special form of reliability). Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

10 Validity of Test Scores
Validity is the extent to which a test does the job for which it is intended. Essentially, validity is about what inference can be made from the scores obtained on an instrument. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

11 The most widely encountered discussions refer to three lines of validity evidence:
content validity (representativeness of the domain); criterion-related validity (correlation with/prediction of scores from another instrument); construct validity (association with some theoretical construct). Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

12 Validity is the most important technical quality of a test.
An important way of assuring, or assessing validity is to use a subject matter by behaviour grid called a specifications table or a table of specifications. It helps to define the weighting to be given to various subject matter and behaviours (or objectives or skills). It helps to avoid the testing of extraneous material. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

13 Example of a Table of Specifications
Cont Obj Kn Co Ap An Tot Classif of animals 2 4 - 10 Plants of the earth Pop and Evol 3 Var and Selec 1 5 Origin of Sol Sys Chan in Land Fea 6 Total 16 17 11 60 Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

14 It is important to work out the types of items/questions, their psychometric characteristics, the number of items and questions and how these will be scored. The specifications for test construction should be so clear that two test constructors would produce tests that are comparable and interchangeable. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

15 Item Analysis In writing and analysing test tasks, two critical indicators of goodness of the tasks should be considered: the facility (or difficulty) and the discrimination. The facility level for a task is the percentage of candidates responding correctly or satisfactory to it. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

16 It is expressed as an index:
an f-value, or a p-value (which is really the probability of a person in a particular group responding correctly or satisfactorily). The formula for calculating p is very simple: p = R/T, that is, the number of students responding correctly to an item divided by the number of students responding to the item. Its value ranges from 0 to 1.00. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

17 The discrimination level for a task is the extent to which performance on the task separates the better candidates from the poorer ones. The calculation of this d-index is generally more complex than the calculation of the facility index and is often represented by a biserial or a point-biserial correlation index (r). It ranges from to Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

18 Easier and relatively accurate estimates of the extent of discrimination of a task scored dichotomously are, however, obtained by: comparing the way the top performing students perform on the task with the way the bottom performing students perform on that task. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

19 The discrimination index for an item is calculated by:
ranking students according to performance on the test; separating the top performing students and the bottom performing students; finding the p value of the item for the top performing students and the p value for the bottom performing students; subtracting the p value for the low performing students from the p value of the high performing students Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

20 The table indicates how students performed on an item with four possible responses (A, B, C and D). The correct response is C. Response A B C D Upper Group Lower Group The facility index of the item is (a) 1.00 (b) .10 (c) .05 (d) .50 The discrimination index of the item is (a) 6 (b) .60 (c) .06 (d) .66 Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

21 Summary Based on our discussions, I trust that in developing and using tests for assessment in the classroom, you will consider the need to: provide scores that are reliable provide scores that are valid develop and use items/tasks that are at the right difficulty level develop and use items/tasks that can discriminate between those who have the desired competences and those who do not. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona

22 Thank you. Professor Stafford A. Griffith, Director of the School of Education, UWI, Mona


Download ppt "Issues of Reliability, Validity and Item Analysis in Classroom Assessment by Professor Stafford A. Griffith Jamaica Teachers Association Education Conference."

Similar presentations


Ads by Google