Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classroom Assessment Validity And Bias in Assessment.

Similar presentations


Presentation on theme: "Classroom Assessment Validity And Bias in Assessment."— Presentation transcript:

1 Classroom Assessment Validity And Bias in Assessment

2 Classroom Assessment: Validity
A characteristic of the inferences made from test scores; not a characteristic of the test itself. Requires a well-defined assessment domain (or construct) and, typically, a sampling strategy for selecting elements from that domain. A key notion here is representative sampling of items and learning targets that represent the content domain.

3 Validity: A matter of degree.
Validity often depends on the situation. For example, a history test… Might have high validity for inferring students’ knowledge of events leading up to the American revolution. But, have less validity for inferring whether students can reason and apply their knowledge to current or future events.

4 Validity is Related to the Learning Targets
A good question to ask yourself is: Do scores on the assessment allow me to make inferences that are directly related to performance on the learning targets I am trying to assess? Do the learning targets address only facts and verbal knowledge? Or Do the learning targets address higher-order outcomes such as reasoning and application?

5 Validity Requires Evidence
Validity is concerned with the collection of evidence to support an argument that a particular score-based inference is the correct inference. While validity is best viewed as a unitary concept, there are several ways of collecting evidence to support arguments for validity. Content evidence. Criterion evidence. Construct evidence. Some writers have argued that validity is really all about building a strong argument to support test-based inferences.

6 Content-related Evidence of Validity
Important Question: Are the items or tasks sampled truly representative of the assessment domain? Is there sufficient sampling of the content? Is there sufficient sampling of the type of learning desired? Content validity is often determined by having experts (which could be teachers) evaluate the linkages between items and the content domain.

7 More on Content-related Evidence of Validity
An important source of evidence for classroom tests. Generally enhanced by the use of learning targets or assessment domain specifications. Declarative knowledge specifications. Procedural knowledge specifications. Again, representative sampling is key. An example specification is given in the next slide.

8 A table of Specifications Can Help Establish Content Validity
Type of Learning Target Knowledge Reasoning Application Content 1 No Items/% Content 2 Content 3 Content 4 Content 5

9 Level of Learning Target
For High Content Validity, the items need to be representative of the content domain Low Level of Learning Target High 1 2 Elements of the . Curriculum . . N ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● The dots inside the box represent items. The items are distributed not only over the curriculum content, but also over the levels of cognitive outcomes desired.

10 Here is an example of an underrepresented set of items
Low Level of Learning Target High 1 2 Elements of the . Curriculum . . N ● ● ● ● ● ● ● A test that might be represented like this is one that assesses only low-level cognitive outcomes.

11 Level of Learning Target
Here is a representation of a test that not only under represents the content, but assesses irrelevant content Low ● ● Level of Learning Target High 1 2 Elements of the . Curriculum . . N ● ● ● ● ● ● ● ● ● ● ●

12 Criterion-related Evidence of Validity: Two types
Concurrent validity. Correlate scores on new test with scores on established test. Correlate scores on classroom test with what teacher already knows about students. Predictive validity. Correlate current test scores with scores collected at a later date. Example: SAT. Concurrent validity is important when we want to substitute one test (perhaps a shorter test) for another (perhaps longer) test. For a classroom test, we might show that scores on the test correlate highly with other assessments (some of which are informal) we have collected. Or we might show how scores on our tests (or the grades we assign) correlate highly with scores on tests at the next level of education.

13 Construct-related Evidence of Validity
The gathering of evidence to support an assertion that inferences based on a test are valid. Entails testing hypotheses involving the construct (e.g., learning, anxiety, attitude, etc.) being assessed and performance on the assessment. In construct validity the inferences are with respect to some hypothetical construct. Constructs are unobserved. Invented to explain behavior. Examples: motivation learning anger achievement attitude intelligence

14 More Construct-related Evidence of Validity
Three approaches: Intervention studies: Hypothesis: students will perform higher on the test following an instructional intervention. Differential-population studies: Hypothesis: different populations will perform differently on the assessment. Related measures studies: Hypothesis: students will perform similarly on similar tasks. Intervention Studies: In constructing an instrument to measure test anxiety, for instance, test the hypothesis that individuals who take a test under high anxiety-inducing will perform more poorly then individuals taking the test under low anxiety-inducing situations. In the classroom, test (informally) the hypothesis that students perform better on an assessment after instruction then before instruction. As another example the hypothesis that students who complete there homework regularly perform better on the assessment than students who complete their homework infrequently. Requires that homework be relevant to the content included in the assessment. Differential-population studies: Test hypothesis that different populations will perform differently on the assessment. For instance in a test designed to assess competence in world literature, test the hypothesis that students who have completed a unit or course on world literature perform better than students who have not. Related measure studies:

15 When Collecting Evidence of Validity, Note the Following
Structured (formal) assessments are likely to yield more valid inferences than unstructured (informal) assessments. Major focus: Accuracy of assessment-based inferences. Major sources of invalidity: construct underrepresentation. construct irrelevance. Informal observations tend to be less valid than formal assessments. We can only measure what we see. What we see is usually on the top of the iceberg. For example, we call on a student who raises his hand (probably knows the answer) and infer that the student is mastering the material. Construct underrepresentation: When the construct involves higher-level learning targets (reasoning, application, evaluation, etc.) and the test assess verbal information. Construct irrelevance: A math problem solving exam that has too high a readability level Most statewide, high stakes tests are written at a lower readability level (e.g., 8th grade for high school tests). Poor test performance can result from at least three sources: Poor test (lacks reliability and/or validity). Poor teaching. Poor achievement.

16 Two Important Relationships Between Reliability and Validity
Reliability is a necessary but not sufficient condition for Validity. Validity necessarily implies reliability. Pam Moss, notwithstanding.

17 A final word Validity, or our notions of validity, apply to all manner of assessments. This includes scores (or grades) placed on report cards. The question to ask is, “Are the inferences to be drawn from a report card grade valid?”

18 End


Download ppt "Classroom Assessment Validity And Bias in Assessment."

Similar presentations


Ads by Google