Assessment: Reliability, Validity, and Absence of bias

1 Assessment: Reliability, Validity, and Absence of bias
Lecture 2:

2 Essential Terminology for Evaluating Assessment Instruments
Reliability: The consistency of results (measurement) obtained from an assessment, based on the control, reduction, and/or elimination of measurement error. Validity: The accuracy and appropriateness of the interpretations and inferences (evaluation) drawn from the results of a test (measurement). Absence-of-Bias: The absence of any characteristic associated with an assessment that might offend or unfairly penalize those being assessed and thus distort a student's score.

3 Validity Characteristics of Validity (Gronlund, 1998):
Validity refers to the inferences drawn, not the instrument. Validity is specific to a particular use. Validity is concerned with the consequences of using the assessment.

4 Validity Validity is expressed by degree (high, moderate, low).
Validity is inferred from available evidence (not measured). Validity depends on many different types of evidence.

5 Content Validity: The degree to which the content of the items of an assessment adequately and accurately represent the content of the assessment domain. Does the assessment match the objectives in both content and cognitive processes?

6 Concurrent Validity: The extent to which a student's current performance on an assessment estimates that student's current performance on another assessment or task (the criterion measure).

7 Predictive Validity: The extent to which a student's current performance on an assessment estimates that student's later performance on another assessment or task (the criterion measure).

8 Constructing an expectancy table
Pre test scores 1 at 1 2 at 2 8 at 3 6 at 4 5 at 5 Grades at end 5 As 9 Bs 6 Cs 2 Ds

9 Expectancy Table (one possible case) How is the prediction validity?
5 11 111 4 1 1111 3 2

10 Face Validity: The degree to which performance on an assessment appears to be valid in relation to the score's use and interpretation. Face validity is really not a measure of validity, but merely the appearance of validity. Face validity can often be very misleading.

11 Construct Validity: The degree to which performance on an assessment may be explained by the presence or absence of some psychological state or trait (construct). A construct is a hypothetical psychological characteristic that is presumed to exist that explains patterns of behavior and thought.

12 Factors that Affect Validity of Classroom Assessments (Nitko, 1996):
Content Representativeness and Relevance Does my assessment procedure emphasize what I have taught? Do my assessment tasks accurately represent the outcomes specified in my school's or state's curriculum framework? Are my assessment tasks in line with the current thinking about what should be taught and how it should be assessed? Is the content in my assessment procedure important and worth learning?

13 Thinking Processes and Skills Represented
Do the tasks on my assessment instrument require students to use important thinking skills and processes? Does my assessment instrument represent the kinds of thinking skills that my school's or state's curriculum framework states are important? Do students actually use the types of thinking I expect them use on the assessment to complete the assessment? Did I allow enough time for students to demonstrate the type of thinking I was trying to assess?

14 Consistency with other Classroom Assessments
Is the pattern of results in the class consistent with what I expected based on my other assessments of them? Did I make the assessment tasks to difficulty or too easy for my students?

15 Reliability and Objectivity
Do I use a systematic procedure for obtaining quality ratings or scores from students' performance on the assessment? Does my assessment instrument contain enough tasks relative to the types of learning outcomes I am assessing?

16 Fairness to Different Types of Students
Do you word the problems or tasks on your assessment so those students with different ethnic and socioeconomic backgrounds will interpret them in appropriate ways? Did you modify the working or the administrative conditions of the assessment tasks to accommodate students with disabilities or special learning problems? Do the pictures, stories, verbal statement, or other aspects of my assessment procedure perpetuate racial, ethnic, or gender stereotypes?

17 Economy, Efficiency, Practicality, Instructional Features
Is the assessment relatively easy for me to construct and not too cumbersome to use to evaluate students? Is the time needed to use this assessment procedure better spent on teaching students instead? Does your assessment procedure represent the best use of your time?

18 Multiple Assessment Usage
Are the assessment results used in conjunction with other assessment results?

19 Features and Procedures in Establishing Reliability and Validity (Grondlund, 1998):
Procedures to Follow 1. State intended learning outcomes in performance terms. 2. Prepare a description of the achievement domain to be assessed and the sample of tasks to be used. Desired Features 1. Clearly specified set of learning outcomes 2. Representative sample of a clearly defined domain of learning task (assessment/ achievement domain).

20 Features and Procedures in Establishing Reliability and Validity
3. Tasks that are relevant to the learning outcomes to be measured. 4. Tasks that are at the proper level of difficulty. 3. Match assessment tasks to the specified performance stated in the learning outcomes. 4. Match assessment task difficulty to the learning task, the students' abilities, and the use to be made of the results.

21 Features and Procedures in Establishing Reliability and Validity
5. Tasks that function effectively in distinguishing between achievers and non-achievers. 6. Procedures that contribute to efficient preparation and use. 5. Follow general guidelines and specific rules for preparing assessment procedures and be alert for factors that distort the results. 6. Write clear directions and arrange procedures for ease of administration, scoring or judging, and interpretation.

22 Features and Procedures in Establishing Reliability and Validity
6. Sufficient number of tasks to measure an adequate sample of achievement, provide dependable results, and allow for a meaningful interpretation of the results.6 . Where the students' age or available assessment time limit the number of tasks, make tentative interpretations, assess more frequently, and verify the results with other evidence.

23 Types of Bias (General) (Popham, 1999):
Offensiveness: Any component of an assessment that may cause undue resentment, pain, discomfort, or embarrassment (e.g., stereotyping, word choice). Unfair Penalization: Any assessment practice that may disadvantage a student and distort their test score as a result of group membership (e.g., socioeconomic class, race, gender). Unfair penalization does not result from scores that differ due to differences in ability.

24 Absence of Bias Disparate Impact: An assessment that differentiates according to group membership is not necessarily biased. The question is whether or not that differentiation occurs due to unfair circumstances. If an assessment is not offensive and does not unfairly penalize, and there is still group differentiation, the likely cause is inadequate prior instructional experiences.

25 Types of Bias (Specific) (Nitko, 1996):
Assessment Bias as Mean Differences: Bias may be indicated if the mean test of one group differs substantially from another group. However, if the test is free from offensiveness and unfair penalization, the test may be representing real differences between the groups relative to the domain tested. Mean differences are generally not a good indicator of bias.

26 Assessment Bias as Differential Item Functioning:
Bias may be indicated if the mean score for a particular item differs substantially from one group to another. The key to differential item functioning is to examine persons of equal ability, from different groups, to see if there is a difference relative to the item of concern. If there is, bias may be present, although differential item functioning does not prove bias.

27 Assessment Bias as Misinterpretation of Scores:
Bias may be indicated if the results of an assessment are interpreted beyond their valid usage. Scores are valid for a particular use, relative to a particular group. Inferences beyond these specifics are invalid and may be biased.

28 Assessment Bias as Sexist or Racist Content:
An assessment would be biased if it perpetuates stereotypes or portrays groups in an offensive manner.

29 Assessment Bias as Differential Validity:
Bias may be indicated if an assessment predicts performance on a second assessment or task (predictive validity) differently for different groups. This source of bias is generally not a problem in educational assessment.

30 Assessment Bias as Content and Experience Differential: An assessment is biased if the content of the assessment differs significantly from a groups life experiences and the evaluation of the results of the assessment do not take this difference into account.

31 Assessment Bias in Selection Decisions: In cases where several people are vying for a few openings (e.g., jobs, programs), assessments are often used as part of the selection process. The selection process may be biased if it uses an assessment that differentially measures groups unfairly or if the relationship between the differential assessment and the attributes necessary for success is not clearly understood.

32 Assessment Bias related to Assessment Atmosphere and Conditions:
Bias may be indicated if the testing situation differentially affects different groups. Feelings of being unwelcome, anxiety, or being tested by a member of an antagonistic group may lead to this type of bias.

