Presentation on theme: "Review of AERA/APA/NCME Test Standards Revision"— Presentation transcript:
1Review of AERA/APA/NCME Test Standards Revision Barbara S. PlakeUniversity of Nebraska-LincolnCo-Chair, Committee for Revision of Test Standards
2Joint Committee Members Lauress Wise, Co ChairBarbara Plake, Co ChairLinda Cook, ETSFritz Drasgow, University of IllinoisBrian Gong, NCIEALaura Hamilton, Rand CorporationJo-Ida Hansen, University on MNJoan Herman, UCLAMichael Kane, Bar Examiners
3Joint Committee Members Michael Kolen, University of IowaAntonio Puente, UNC-WilmingtonPaul Sackett, University of MNNancy Tippins, Valtera CorporationWalter (Denny) Way, PearsonFrank Worrell, Univ of CA- Berkeley
4Scope of RevisionBased on comments each organization received from invitation to commentSummarized by the Management Committee in consultation with the Co-ChairsWayne Camara, Chair, APASuzanne Lane, AERADavid Frisbie, NCME
5Four Substantive Areas for Revisions TechnologyAccountabilityWorkplaceAccessPlus attention to format issues
6Theme Teams Working teams Cross team collaborations Chapter Leaders Focusing of bringing into chapters content related to themes in coherent and meaningful ways
7Presentation: Four Substantive Areas Access – Linda CookAccountability – Brian GongTechnology – Denny WayWorkplace – Laurie Wise
8Format Issues Organization of Chapters Consideration of ways to identify of “Priority Standards”More parallelism between chapterToneComplexityTechnical language
9Timeline First meeting January, 2009 Three year process for completing text of revisionOpen comment/Organization reviewsProjected publication Summer, 2012
10Revising our Test Standards: Access for All Examinee Populations Presentation to the 2009 Annual Meeting of the American Educational Research AssociationSan Diego, CALinda Cook, ETS
11OverviewStandards related to Access appear throughout many of the chapters but are concentrated inChapter 9: Testing Individuals of Diverse Linguistic BackgroundsChapter 10: Testing Individuals with DisabilitiesComments on Access were received by the management committee and summarized for the committee charge
12Elements of the ChargeFive of the elements of the charge focused on accommodations/modificationsImpact/differentiation of accommodation and modificationAppropriateness for ELL and EWDAppropriateness for variety of groups, e.g., pre-K, older populationsFlaggingComparability/validityOne element focused on adequacy and comparability of translationsOne element focused on Universal Design
13Key Access Issues Included in our Charge - 1 Impact/differentiation of accommodations/modificationsAppropriate ways to determine or establish the impact of accommodations/modifications on inferences, interpretations, uses of scoresHow do you differentiate clearly between what is an accommodation and what is a modification?
14Key Access Issues Included in our Charge - 2 Appropriateness of accommodations for English-language learners and examinees with disabilitiesSelecting the appropriate accommodation for the individualWho should select the accommodation?What evidence should the selection be based on?Administering the appropriate accommodationWhat evidence is available to determine impact on test scores, given purpose of the test?how effective is the accommodation?Alternative assessments/modified achievement standards
15Key Access Issues Included in our Charge - 3 Appropriateness of accommodations for a wider variety of groupsPre-KOlder populationsNumber of older adults with cognitive impairments is risingTested to determine mental status changesThere are many complexities associated with testing this populationCombined effects of medical problems, medication side effects, multiple sensory deficits, testing environment
16Key Access Issues Included in our Charge - 4 FlaggingCurrent treatment needs to be updated to reflect changes in practice since 1999 standardsMost testing organizations no longer flagDecisions about flagging should be based on empirical evidence
17Key Access Issues Included in our Charge - 5 Comparability and validity of inferences made based on scores from accommodated or modified testsFoundational issues such as comparability and validity need to be addressed in foundational chaptersIf sample sizes do not support analyses such as DIF, other evidence of validity should be pursued
18Key Access Issues Included in our Charge - 6 Adequacy and comparability of translations (language to language and language to symbol, e.g., Braille)Evidence needed to demonstrate adequacy of translation and comparability of scores from translated testsFluency, rather than primary language should be used to describe target population for a testQuality of translation/adaptation needs to be emphasizedInteraction of language proficiency and construct needs to be considered
19Key Access Issues Included in our Charge - 7 Universal Design1999 Standards focus too much on accommodations and modifications and not enough on building accessibility features into design and development process
20Revising our Test Standards: Issues for Accountability Presentation to the 2009 Annual Meeting of the American Educational Research AssociationSan Diego, CABrian Gong, Center for Assessment
21OverviewThere has been a dramatic expansion of the use of tests for various forms of accountability and other uses related to educational policy-setting.The Joint Committee has been charged with considering how these uses in accountability should impact revisions to the StandardsAs with the other themes, comments on the standards that related to accountability were compiled by the Management Committee and summarized in their charge to the Joint Committee
22OverviewStandards related to accountability currently appear throughout; accountability also is especially relevant to Chapter 13 (Educational Testing and Assessment) and Chapter 15 (Testing in Program Evaluation and Public Policy)Under No Child Left Behind, there has been a dramatic increase in the use of tests for accountability. In such cases, test results have important consequences for third parties such as school administrators and teachers, although not always for the examinees themselves.Federal peer review procedures have required assurances of reliability and validity that often go beyond requirements of the current Test Standards. Attention to the overall technical quality of tests and score interpretation is required. High school tests are used as a graduation requirement and there have been questions about how the current Standards should be interpreted in these cases. In general, the validity and reliability of individual and aggregated scores used for accountability purposes need to be addressed.
23Key Accountability Topics Included in our Charge Validity and reliability requirementsIssues with scores, scaling, and equatingPolicy and practiceFormative and interim assessments
241. Validity and Reliability Requirements Use of a single test (whether or not scores resulting from retesting or repeat testing are sufficient for using more than one score for high stakes decisions) as the sole source of high stakes decisions (e.g., graduation, promotion).How test alignment studies should be documented and used to demonstrate the validity of score interpretations regarding mastery of required content standards.
251. Validity, Reliability, and Reporting Requirements - continued Provide additional guidance on score accuracy, especially when used to classify individuals or groups into performance regions or other bands on a score scale.Validity and reliability requirements for reporting individual or aggregate performance on subscales (skills or diagnostics) and for instructing users in appropriate interpretations of such scores or data (e.g., as they impact between or within student and school comparisons, validity considerations in subscore interpretation).Incorporating error estimates and interpretive guidance in score reports, including subscores and diagnostic reporting for individuals and groups.
262. Issues with Scores, Scaling, and Equating Growth modeling, gain scores, and other methods of estimating aggregated performance or growth based on individual or school/district performance and characteristics.Issues or requirements when linking assessments (e.g., concordances, linkages and equating)
273. Policy and PracticeHow to balance privacy concerns for individual examinees, teachers, and administrators while meeting information needs for policy-makers.Issues related to the appropriate role of practice and test preparation, especially in contrast to admissions testing or credentialing.
284. Addressing formative and interim assessments Distinguishing among commercial formative and benchmark assessments (as well as item banks), their appropriate uses, and validation evidence required in interpreting scores from them.
29Revising our Test Standards: Technological Advances Presentation to the 2009 Annual Meeting of the American Educational Research AssociationSan Diego, CADenny Way, Pearson
30OverviewTechnological advances are changing the way tests are delivered, scored, interpreted and in some cases, the nature of the tests themselvesThe Joint Committee has been charged with considering how technological advances should impact revisions to the StandardsAs with the other themes, comments on the standards that related to technology were compiled by the Management Committee and summarized in their charge to the Joint Committee
31Key Technology Issues Included in our Charge Reliability & validity of innovative item formatsValidity issues associated with the use of:Automated scoring algorithmsAutomated score reports and interpretationsSecurity issues for tests delivered over the internetIssues with web-accessible data, including data warehousing
32Resources for Consideration Guidelines for Computer-Based Testing, Copyright 2002 Association of Test Publishers (ATP)International Guidelines on Computer-Based and Internet Delivered Testing, Copyright 2005 International Test Commission (ITC)
33Reliability & Validity of Innovative Item Formats What special issues exist for innovative items with respect to access and elimination of bias against particular groups? How might the standards reflect these issues?What steps should the standards suggest with regards to “usability” of innovative items?What issues will emerge over the next five years related to innovative items/test formats that need to be addressed by the standards?
34Automated Scoring Algorithms What level of documentation/disclosure is appropriate and tolerable for automated scoring developers/vendors?What sorts of evidence seem most important for demonstrating the validity and “reliability” of automated scoring systems?What issues will emerge over the next five years related to automated scoring systems that need to be addressed by the standards?
35Automated Score Reports and Interpretation Use of computer for score interpretation“Actionable” reports (e.g., routing students and teachers to instructional materials and lesson plans based on test results)
36Security issues for tests delivered over the internet Two aspects of this topic are of concern:protecting privacy and threats to validity due to breach of security.Protecting examinee privacyConsiderations likely to affect standards related to test administration and responsibilities of test users
37Web-Accessible Data, including Data Warehousing Applicability of general technology standards?SecurityInteroperabilityRevision to commentary vs. drafting additional standards
38Revising our Test Standards: Issues for Work-Place Testing Presentation to the 2009 Annual Meeting of the American Educational Research AssociationSan Diego, CALaurie Wise, HumRRO
39OverviewStandards for testing in the work place are currently covered in Chapter 14 (one of the testing application chapters)Work-place testing includes employment testing as well as licensure, certification, and promotion testing.Comments on standards related to work place testing were received by the Management Committee and summarized in their charge to the Joint Committee.
40Key Work-Place Testing Issues Included in our Charge Validity and reliability requirements for certification, licensure, and promotion tests.Issues when tests are administered only to small populations of job incumbents.Requirements for tests for new, innovative job positions that do not have incumbents or job history to provide validity evidence.Assuring access to licensure, certification, and promotion tests for examinees with disabilities that may limit participation in regular testing sessions?Differential requirements for certification and licensure and employment tests.
411. Validity and Reliability Requirements Some specific issues:Documenting and communicating the validity and reliability of pass-fail decisions in addition to the underlying scoresHow cut-offs are determinedHow validity and reliability information is communicated to relevant stakeholders
422. Issues with Small Examinee Populations Including:Alternatives to statistical tools for item screeningAssuring fairnessAssuring technical accuracyAlternatives to empirical validity evidenceMaintaining comparability of scores from different test forms
433. Requirements for New Jobs Issues include:Identifying test contentEstablishing passing scoresAssessing reliabilityDemonstrating validity
444. Assuring Access to Employment Testing See also separate presentation on fairnessIssues include:Determining appropriate versus inappropriate accommodationsRelating testing accommodations to accommodations available in the work place
455. Certification and Licensure versus Employment Testing Currently, two sections in the same chapterExamples of relevant issues:Differences in how test content is identified and validatedDifferences in test score useWho oversees testing:Private company versus professional board/organization