Presentation is loading. Please wait.

Presentation is loading. Please wait.

Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland.

Similar presentations


Presentation on theme: "Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland."— Presentation transcript:

1 Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland

2 Objectives review the evidence that state testing programs provide to the United States Department of Education on the validity of their assessments examine in detail the validity evidence that certain selected states provided for their peer reviews make recommendations for improving the evidence submissions supporting validity for state assessments

3 Data Sources official decision letters on each state's final assessment system under NCLB from USED; publicly available at peer review reports for five selected states technical reports for available states that have received full approval from USED; downloaded from the web sites of each state

4 Types of Validity Evidence the AERA/APA/NCME Standards lists five types of validity evidence –content-based evidence –response-process-based evidence –evidence based on internal structure –evidence based on relationships with other variables –evidence based on consequences we will look at the judgments that each type should support in the context of statewide assessments of educational achievement

5 Content-Based Evidence judgments that need to be supported: the domain is described in the academic content standards at the grade level the test items sample that content domain appropriately achievement level descriptions refer back to the content domain of the test

6 Response-Process-Based Evidence judgment that needs to be supported: the activities the test demands of students are consistent with the cognitive processes the test is supposed to represent (as implied by the content standards)

7 Evidence Based on Internal Structure judgment that needs to be supported: test score relationships are consistent with the strand structures of the academic content standards

8 Evidence Based on Relationships with Other Variables judgments that need to be supported: higher correlations occur when traits are more similar low correlations (perhaps partial on ability) exist with specific traits (e.g., gender, race- ethnicity, disability)

9 Evidence Based on Consequences judgments that need to be supported: test use maximizes positive outcomes test use minimizes negative outcomes

10 Decision Letters decision letters were viewed at the USED web site – they are public documents 19 of the states were required to provide additional validity evidence the evidence was not classified by USED, but we classified it into the five types to help make the project manageable decision-letter evidence is required by USED – it is mandatory – these elements may be thought of as necessary for states to submit

11 Content-Based Evidence evidence to show that assessments measure the academic content standards and not characteristics not specified in the academic content standards or grade level expectations blueprints, item specifications, and test development procedures evidence of alignment with content standards – this is an emphasis in peer review explanations of design and scoring standard setting process, results, and impact

12 Response-Process-Based Evidence evidence to show that items are tapping the intended cognitive processes – this sort of evidence is commonly a part of alignment studies

13 Evidence Based on Internal Structure item interrelationships subscale score correlations showing they are are consistent with the structures inherent to the academic content standards scoring and reporting are consistent with the subdomain structure of the content standards justification of score use given the threat (observed) that the subdomain correlations are higher between content areas than within content areas

14 Evidence Based on Relationships with Other Variables criterion validity relationships between test scores and external variables

15 Evidence Based on Consequences studies of intended and unintended consequences

16 Evidence from State Submissions each state submitted voluminous evidence to USED the Peer Review Reports included descriptions of the evidence submitted we had sets of Reports for five states this evidence may be over and above what is actually required

17 Evidence of Purposes each state was asked to provide evidence about the purposes of their assessments each state did that this is an important part of Kane’s (2006) concept of a validity argument because it does not fall into the categories of validity evidence in the USED Peer Review Guidance, we did not include it in our review

18 Content-Based Evidence test blueprints & construction process alignment reports –categorical concurrence (each content strand has enough items for a subscore report) –range of knowledge (the number of content elements in each strand that have items associated with them) –balance of representation (the distribution of items across the content elements within each strand) achievement level descriptions (ALDs) compared with the strand structure

19 Response-Process-Based Evidence alignment reports –depth of knowledge (relates the cognition tapped by each item to that implied in the statement of the element in the content standards the item is associated with) think-aloud studies (proposed)

20 Evidence Based on Internal Structure dimensional analysis at the item level –principal components analysis –dimensionality hypothesis testing intercorrelations among the subtest scores

21 Evidence Based on Relationships with Other Variables correlations with external tests of similar constructs (and dissimilar constructs) correlations with student demographics and course-taking patterns choosing and implementing accommodations for disabilities and limited English proficiency bias studies (e.g., DIF) and passage reviews universal design principles monitoring of test administration procedures

22 Evidence Based on Consequences longitudinal change in dropout and graduation rates and NAEP results use of results to evaluate schools and districts use of test data to improve curriculum & instruction use of adequate yearly progress reports use of tests to make promotion & graduation decisions

23 Synthesis of Evidentiary Needs it would be useful to have a minimum list for state regulatory submissions can we use these studies to generate a list? most likely over-inclusive using our evidence as soon as we do so, it will surely be challenged it seems reasonable to submit the following –for each test series (e.g., regular, alternate) –for each tested content and grade combination

24 Content Evidence content standards test blueprint item (and passage) development process item categorization rules and process forms development process (e.g., item sampling; item location; section timing) results of alignment studies

25 Process Evidence test blueprint (if it has a process dimension) item categorization rules and method (if items are categorized by process) results of alignment studies results of other studies, such as think- alouds

26 Internal Structure Evidence subscore correlations Item-subscore correlations dimensionality analyses

27 Relations with Other Variables convergent Evidence –correlations with independent, standardized measures –correlations with within-class variables, such as grades discriminant Evidence –correlations with standardized tests of other traits (e.g., math with reading) –correlations with within-class variables, such as grades in other contents –correlations with irrelevant student characteristics (e.g., gender) –item-level (e.g., DIF) studies

28 Consequential Evidence purposes of the test – as they describe intended consequences uses of results by educators trends over time studies that generate and evaluate positive and negative aspects from user input

29 Validity in the Accountability Context – Role of Processes majority of the evidence submitted capitalizes on well-known methods for study of the validity of a particular test form – a product but object of study in accountability is actually a process by which tests are developed & used –a test form is important only as a representative of a process of test development –programs are expected to engage in a continual process of self-evaluation and improvement

30 Process Evidence assume it is useful to distinguish between product evidence and process evidence –product evidence focuses on a particular test and –process evidence focuses on a testing program will review and extend some suggestions for process evidence that were originally proposed in the context of state assessment and accountability peer reviews

31 What is a Process? a recurring activity that takes material, operates on it, and produces a product concept is borrowed from project management could be as large as the entire assessment and accountability program could be as small as, say, the production of a test item one challenge is to organize the activities of a program into useful processes

32 Is Validity a Process Concept? i.e., is there a sense in which we can use the concept of the validity of a process? validity is justification for an interpretation of a score –a test form is a static element that can contribute support for an interpretation –a process is a dynamic element that can contribute support for future interpretations so we give this one a tentative “yes”

33 Elements of Process Evidence process –The process is described –The inputs and operating rules are laid out product –The results of the process are presented or described evaluation (how are these questions are considered) –is the process adequate? –can (or how can) it be improved? –should it be improved (e.g., do the benefits justify the costs)? improvement (how the consideration is done?) –The recommendations from the evaluation are considered for implementation in order to improve the process

34 Examples of Process Evidence three examples of these four elements of process evidence follow they vary markedly in scope –small to large –illustrate the nature of process evidence for different contexts within an assessment and accountability program

35 Bias and Sensitivity Committee Selection process. desired composition, generation of committee members, contacting potential members, proposed meeting schedule, etc. product. committee composition, especially the constituencies represented. evaluation. comparison of actual with desired composition, follow up with persons who declined, suggestions for improvement. improvement. who has responsibility to consider the recommendations generated by the evaluation, how they go about their analysis, how change is implemented in the system, examples of changes that were made in the past to document responsiveness

36 Alignment process. test blueprint, items, item categorizations, sampling processes product. a test form evaluation. alignment study improvement. review of study recommendations, plan for future

37 Psychometric Adequacy of a Test Form process. the analyses that are performed. product. technical manual evaluation. review by a group such as a TAC, recommendations for the manual as well as the testing program improvement. consideration of recommendations, plan for future

38 Making Judgments About Processes two typically independent layers of judgment first layer is an evaluation that makes recommendations about improvement second layer considers them in many cases, second layer would be an excellent way for a state to use its TAC

39 Judging Process Evidence process evidence by definition describes processes it should be judged by how well it describes processes that support interpretations based on future assessments it should also be judged on how well it describes processes that lead to improvements in the program

40 Possible Criteria for Process Evidence data are collected from all relevant sources data are reported completely and efficiently reviewed by persons with appropriate expertise review is conducted fairly review results are reported completely and efficiently recommendations are suggested in the reports consideration given to the recommendations past actions based are presented as evidence that the process results in improvement


Download ppt "Validity in Action: State Assessment Validity Evidence for Compliance with NCLB William D. Schafer, Joyce Wang, and Vivian Wang University of Maryland."

Similar presentations


Ads by Google