CASAS Technical Manual 3 rd Edition Presentation at 2004 CASAS Summer Institute Drs. John Martois and Richard Stiles.

Slides:

Advertisements

Similar presentations

Ed-D 420 Inclusion of Exceptional Learners. CAT time Learner-Centered - Learner-centered techniques focus on strategies and approaches to improve learning.

Advertisements

Test Development.

A Tale of Two Tests STANAG and CEFR Comparing the Results of side-by-side testing of reading proficiency BILC Conference May 2010 Istanbul, Turkey Dr.

Campus Improvement Plans

Title I Schoolwide Providing the Tools for Change Presented by Education Service Center Region XI February 2008.

General Information --- What is the purpose of the test? For what population is the designed? Is this population relevant to the people who will take your.

Item Analysis: A Crash Course Lou Ann Cooper, PhD Master Educator Fellowship Program January 10, 2008.

1 CASAS Overview Symposium on Issues and Challenges in Assessment and Accountability for Adult English Language Learners May 16, 2003 Washington DC Linda.

Issues of Technical Adequacy in Measuring Student Growth for Educator Effectiveness Stanley Rabinowitz, Ph.D. Director, Assessment & Standards Development.

1 The New Adaptive Version of the Basic English Skills Test Oral Interview Dorry M. Kenyon Funded by OVAE Contract: ED-00-CO-0130 The BEST Plus.

Chapter 4 Validity.

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

New Hampshire Enhanced Assessment Initiative: Technical Documentation for Alternate Assessments Alignment Inclusive Assessment Seminar Brian Gong Claudia.

Chapter 7 Correlational Research Gay, Mills, and Airasian

Understanding Validity for Teachers

Creating Research proposal. What is a Marketing or Business Research Proposal? “A plan that offers ideas for conducting research”. “A marketing research.

Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)

What should be the basis of

performance INDICATORs performance APPRAISAL RUBRIC

Classroom Test and Assessment

Copyright © 2001 by The Psychological Corporation 1 The Academic Competence Evaluation Scales (ACES) Rating scale technology for identifying students with.

The College Board: Expanding College Opportunity The College Board is a national nonprofit membership association dedicated to preparing, inspiring, and.

COMPASS National and Local Norming Sandra Bolt, M.S., Director Student Assessment Services South Seattle Community College February 2010.

CAHSEE California High School Exit Exam. OVERVIEW Purpose of the CAHSEE Purpose of the CAHSEE Background Background Contents of the CAHSEE Contents of.

Implication of Gender and Perception of Self- Competence on Educational Aspiration among Graduates in Taiwan Wan-Chen Hsu and Chia- Hsun Chiang Presenter.

New York State Education Department Understanding The Process: Science Assessments and the New York State Learning Standards.

Building Effective Assessments. Agenda  Brief overview of Assess2Know content development  Assessment building pre-planning  Cognitive factors  Building.

Student Engagement Survey Results and Analysis June 2011.

Instrumentation.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Technical Adequacy Session One Part Three.

Standardization and Test Development Nisrin Alqatarneh MSc. Occupational therapy.

ACCESS for ELLs® Interpreting the Results Developed by the WIDA Consortium.

The Genetics Concept Assessment: a new concept inventory for genetics Michelle K. Smith, William B. Wood, and Jennifer K. Knight Science Education Initiative.

Cara Cahalan-Laitusis Operational Data or Experimental Design? A Variety of Approaches to Examining the Validity of Test Accommodations.

Writing research proposal/synopsis

Instrumentation (cont.) February 28 Note: Measurement Plan Due Next Week.

The Analysis of the quality of learning achievement of the students enrolled in Introduction to Programming with Visual Basic 2010 Present By Thitima Chuangchai.

Teaching Today: An Introduction to Education 8th edition

Reliability & Validity

Counseling Research: Quantitative, Qualitative, and Mixed Methods, 1e © 2010 Pearson Education, Inc. All rights reserved. Basic Statistical Concepts Sang.

DVC Essay #2. The Essay  Read the following six California Standards for Teachers.  Discuss each standard and the elements that follow them  Choose.

6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)

CAROLE GALLAGHER, PHD. CCSSO NATIONAL CONFERENCE ON STUDENT ASSESSMENT JUNE 26, 2015 Reporting Assessment Results in Times of Change:

Assessment and Testing

Validity and Item Analysis Chapter 4.  Concerns what instrument measures and how well it does so  Not something instrument “has” or “does not have”

McGraw-Hill/Irwin © 2012 The McGraw-Hill Companies, Inc. All rights reserved. Obtaining Valid and Reliable Classroom Evidence Chapter 4:

Building the NCSC Summative Assessment: Towards a Stage- Adaptive Design Sarah Hagge, Ph.D., and Anne Davidson, Ed.D. McGraw-Hill Education CTB CCSSO New.

Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov February 16, 2011.

Understanding the 2015 Smarter Balanced Assessment Results Assessment Services.

Adult Education Assessment Policy Effective July 1 st, 2011.

Chapter 6 - Standardized Measurement and Assessment

Michigan Assessment Consortium Common Assessment Development Series Module 16 – Validity.

LISA A. KELLER UNIVERSITY OF MASSACHUSETTS AMHERST Statistical Issues in Growth Modeling.

Research Methodology Lecture No :32 (Revision Chapters 8,9,10,11,SPSS)

Chapter Two Copyright © 2006 McGraw-Hill/Irwin The Marketing Research Process.

Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.

Ready, Set, Start! Using CASAS in Your Even Start Program Ready, Set, Start! Using CASAS in Your Even Start Program Presenter: Martha Gustafson Date: June.

Instrument Development and Psychometric Evaluation: Scientific Standards May 2012 Dynamic Tools to Measure Health Outcomes from the Patient Perspective.

Copyright © 2009 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 47 Critiquing Assessments.

Evaluation Requirements for MSP and Characteristics of Designs to Estimate Impacts with Confidence Ellen Bobronnikov March 23, 2011.

Smarter Balanced Assessment Results

Evaluation of measuring tools: validity

Reliability & Validity

Reliability and Validity of Measurement

Assessment Literacy: Test Purpose and Use

Presentation transcript:

CASAS Technical Manual 3 rd Edition Presentation at 2004 CASAS Summer Institute Drs. John Martois and Richard Stiles

Organization of Manual Introduction Introduction Chapter 1. Background Information About CASAS Chapter 1. Background Information About CASAS Chapter 2. Development of CASAS Item Banks Chapter 2. Development of CASAS Item Banks Chapter 3. Validity & Psychometric Properties of the CASAS Item Banks Chapter 3. Validity & Psychometric Properties of the CASAS Item Banks Chapter 4. Life Skills Series Chapter 4. Life Skills Series Chapter 5. Employability Competency System Series Chapter 5. Employability Competency System Series Chapter 6. (future test series and specific tests) Chapter 6. (future test series and specific tests)

Introduction The CASAS Technical Manual provides descriptive background and psychometric information about the item bank and selected tests and test series developed from the banks.

Introduction (cont) The manual begins with an overview of the CASAS system. The manual begins with an overview of the CASAS system. Succeeding chapters describe the development of the item banks and the associated competency statements, the psychometric model underlying CASAS — Item Response Theory — and the test development process including test reliability and validity. Succeeding chapters describe the development of the item banks and the associated competency statements, the psychometric model underlying CASAS — Item Response Theory — and the test development process including test reliability and validity.

Current (1999) version of the AERA, APA, NCME Standards for educational and psychological testing was used as a guide in developing of the CASAS Technical Manual, 3 rd Edition

Intent of Test Standards “…promote the sound and ethical use of tests and “…promote the sound and ethical use of tests and To provide a basis for evaluating the quality of testing practices.” To provide a basis for evaluating the quality of testing practices.” (STANDARDS for educational and psychological testing. AERA, APA, and NCME 1999)

Users of the APA, NCME, AERA Test Standards Test developers Test developers Test publishers Test publishers Test administrators Test administrators Test results users and decision-makers Test results users and decision-makers Test interpreters for clients Test interpreters for clients Test takers Test sponsors or contractors for testing Test buyers, selectors, and reviewers

Chapter 1. Background Information About CASAS Overview of CASAS: the Organization and the System Overview of CASAS: the Organization and the System Formation of CASAS, the Organization and the System Formation of CASAS, the Organization and the System Theoretical Framework Theoretical Framework Competency and Results-Oriented Education Competency and Results-Oriented Education CASAS National Consortium CASAS National Consortium Transportability and Adaptability of CASAS Transportability and Adaptability of CASAS National Validation of CASAS Assessment System National Validation of CASAS Assessment System

Overview of CASAS: the Organization and the System CASAS provides learner-centered curriculum management, assessment, evaluation, data management and reporting systems to education and training programs in the public and private sector. CASAS provides learner-centered curriculum management, assessment, evaluation, data management and reporting systems to education and training programs in the public and private sector. The system is designed to serve primarily adult learners functioning below the high school graduation level. The system is designed to serve primarily adult learners functioning below the high school graduation level. Assessment system is used to place learners into levels of program, monitor their progress in program, and certify competency attainment. Assessment system is used to place learners into levels of program, monitor their progress in program, and certify competency attainment.

Formation of CASAS, the Organization and the System Established by the California Department of Education in 1980 as a consortium of local adult education and literacy providers including community based organizations, public libraries, community colleges, correctional institutions, and adult schools. Established by the California Department of Education in 1980 as a consortium of local adult education and literacy providers including community based organizations, public libraries, community colleges, correctional institutions, and adult schools. In 1984, CASAS was nationally validated by the U.S. Department of Education, Joint Dissemination Review Panel. In 1984, CASAS was nationally validated by the U.S. Department of Education, Joint Dissemination Review Panel. Currently it is a division of a non-profit for public benefit organization, the Foundation for Educational Achievement housed in San Diego, California. Currently it is a division of a non-profit for public benefit organization, the Foundation for Educational Achievement housed in San Diego, California. CASAS has both a California and a National Consortium that provide guidance for its research and development. CASAS has both a California and a National Consortium that provide guidance for its research and development.

Theoretical Content Framework; Competency and Results-Oriented Education Initially used the frame of the Adult Performance Level Study (APL) conducted in the mid-70’s Initially used the frame of the Adult Performance Level Study (APL) conducted in the mid-70’s Incorporated item types from the California High School Proficiency Examination Incorporated item types from the California High School Proficiency Examination Adopted and helped form many of the tenants of the competency-based adult education movement which lead to Results-oriented education where educators are accountable for learners to attain exit outcomes. Adopted and helped form many of the tenants of the competency-based adult education movement which lead to Results-oriented education where educators are accountable for learners to attain exit outcomes. CASAS assesses basic skills in a functional context. CASAS assesses basic skills in a functional context.

CASAS National Consortium Meeting twice annually, the consortium identifies priorities for development, participates in field-testing and evaluating new assessment and curriculum management products and processes, develops training needed for implementation, and shares successful strategies and outcomes in their respective states or territories. Meeting twice annually, the consortium identifies priorities for development, participates in field-testing and evaluating new assessment and curriculum management products and processes, develops training needed for implementation, and shares successful strategies and outcomes in their respective states or territories. The consortium involvement represents a wide variety of state administered adult education programs ensuring relevant assessment and curriculum management validated across diverse populations and programs. The consortium involvement represents a wide variety of state administered adult education programs ensuring relevant assessment and curriculum management validated across diverse populations and programs.

Transportability and Adaptability of CASAS Implemented by education programs in all 50 states. Implemented by education programs in all 50 states. Designed to assess progress toward individual learner goals as well as monitor learning progress at the class, site, agency and state levels. Designed to assess progress toward individual learner goals as well as monitor learning progress at the class, site, agency and state levels. Provides articulation among program levels and a uniform method for reporting learning progress. Provides articulation among program levels and a uniform method for reporting learning progress. Designed to accommodate new populations being served by adult and alternative education programs and has responded to the assessment needs of state and national initiatives. Designed to accommodate new populations being served by adult and alternative education programs and has responded to the assessment needs of state and national initiatives. Implementation has resulted in better program accountability at local, state, and national levels. Implementation has resulted in better program accountability at local, state, and national levels.

National Validation of CASAS Reviewed and Approved by the Federal Joint Dissemination and Review Panel (USDOE) in 1984 Reviewed and Approved by the Federal Joint Dissemination and Review Panel (USDOE) in 1984 The three claims of CASS implementation were upheld by the Program Effectiveness Panel (PEP) USDOE in 1993 The three claims of CASS implementation were upheld by the Program Effectiveness Panel (PEP) USDOE in 1993 Three Claims Supported with the Adoption of CASAS Three Claims Supported with the Adoption of CASAS Learning Gains Learning Gains Student Persistence Student Persistence Goal Attainment Goal Attainment

General Information About CASAS Item Banks and Tests The CASAS multiple-choice item banks, covering the domains of reading, mathematics, and listening in functional contexts for adults and youth provide test items that have undergone rigorous psychometric evaluations using both Classical and Modern Test Theory. The CASAS multiple-choice item banks, covering the domains of reading, mathematics, and listening in functional contexts for adults and youth provide test items that have undergone rigorous psychometric evaluations using both Classical and Modern Test Theory. The psychometric methodology used to establish this difficulty level comes from the Single Parameter or Rasch model of Item Response Theory (IRT) where each test item is assigned a scaled difficulty level on a common scale. The psychometric methodology used to establish this difficulty level comes from the Single Parameter or Rasch model of Item Response Theory (IRT) where each test item is assigned a scaled difficulty level on a common scale.

General Information (cont) Information on Specific Tests and Test Series Information specific to individual CASAS assessment is provided in the Test Administration Manuals that accompany each test and test series. Information specific to individual CASAS assessment is provided in the Test Administration Manuals that accompany each test and test series. These test administration manuals contain detailed information regarding test administration, testing accommodations, scoring, and interpretation of results. These test administration manuals contain detailed information regarding test administration, testing accommodations, scoring, and interpretation of results. Information is also available about each test that describes its targeted population and the programs that use the particular assessment. Information is also available about each test that describes its targeted population and the programs that use the particular assessment.

Chapter 2. Development of the CASAS Item Bank s Item Taxonomy Item Taxonomy Item Development Item Development Psychometric Theory Psychometric Theory Item Banks Development Item Banks Development

Item Taxonomy Each item is identified by: Each item is identified by: Item Description Item Description Item Code Item Code Content Area Content Area Competency Area Competency Area Competency Statement Competency Statement Task Type Task Type Difficulty Level Difficulty Level

Test Item Development Selection and Training of Initial Item Writers Selection and Training of Initial Item Writers Item Pilot Testing and Clinical Try-outs Item Pilot Testing and Clinical Try-outs Item Field-Testing Item Field-Testing

Psychometric Theory The psychometric methodology used to establish this difficulty level comes from the Single Parameter or Rasch model of Item Response Theory (IRT) where each test item is assigned a scaled difficulty level on a common scale. A given scaled difficulty of an item (e.g. 200) is such that a person having the same ability level as that item will have a 50% chance of having a correct response to that item. A given scaled difficulty of an item (e.g. 200) is such that a person having the same ability level as that item will have a 50% chance of having a correct response to that item. To have a greater probability of passing the item of that difficulty (e.g.80+%), the person would have to have an ability level a standard deviation above that score. To have a greater probability of passing the item of that difficulty (e.g.80+%), the person would have to have an ability level a standard deviation above that score. CASAS chose to integrate the use of both methodologies --IRT and classical item analysis. CASAS chose to integrate the use of both methodologies --IRT and classical item analysis.

Psychometric Theory (cont) Item Response Theory offers many advantages over traditional testing methodology. There are three primary advantages: (1) given a large pool of items all measuring the same trait, an estimate of the examinee’s ability is independent of the sample of items used to derive that estimate; (1) given a large pool of items all measuring the same trait, an estimate of the examinee’s ability is independent of the sample of items used to derive that estimate; (2) given a large population of examinees, an item’s difficulty estimate is independent of the sample of examinees used to derive that estimate; and (2) given a large population of examinees, an item’s difficulty estimate is independent of the sample of examinees used to derive that estimate; and (3) a statistic is provided for each examinee indicating the precision of the ability estimate (Hambleton and Swaminathan, 1985). (3) a statistic is provided for each examinee indicating the precision of the ability estimate (Hambleton and Swaminathan, 1985).

Psychometric Theory (cont) IRT defines the relationship between an underlying trait or ability being measured and observable test performance in terms of a mathematical model--a logistical ogive curve. In the most general case, an examinee’s ability is a function of the item difficulty, the item difficulty, the item’s discrimination, and the item’s discrimination, and the probability of correctly answering an item by chance. the probability of correctly answering an item by chance.

Figure 2.7 Item Response Curve from Best Test Design, p. 14 (Used with permission from author)

Psychometric Theory (cont) Score Comparability The characteristic of the Rasch and other IRT models, which makes them appropriate for item banking, is that they: separate a difficulty calibration for an item from the ability of the group taking the item. This makes it possible to do vertical equating of items. separate a difficulty calibration for an item from the ability of the group taking the item. This makes it possible to do vertical equating of items. This also makes it possible to break the items free from the test in which they appear and relate them to a more general curriculum-based scale. This also makes it possible to break the items free from the test in which they appear and relate them to a more general curriculum-based scale. Thus, allowing for the measurement of growth between the administration of two different sets of items to the same examinee over a specified time period. Thus, allowing for the measurement of growth between the administration of two different sets of items to the same examinee over a specified time period.

Item Bank Development ITEM BANK CALIBRATION One major task in building and maintaining an item bank is to place all the items in a given learning domain onto a common scale which involves calibrating the level of difficulty of each item. An item bank can be developed by computing the item difficulty estimates from all of the examinees’ responses to all items. An item bank can be developed by computing the item difficulty estimates from all of the examinees’ responses to all items. However, establishing an item bank typically requires many more items than can be given in one test or far more than a single examinee can be realistically expected to answer. However, establishing an item bank typically requires many more items than can be given in one test or far more than a single examinee can be realistically expected to answer. CASAS chose to develop calibration forms having similar content and a range of difficulty judged by instructors, expert in teaching that domain, to be appropriate to students participating in the calibration study. CASAS chose to develop calibration forms having similar content and a range of difficulty judged by instructors, expert in teaching that domain, to be appropriate to students participating in the calibration study. On all initial forms, more than 95 percent of test examinees responded to all items. On all initial forms, more than 95 percent of test examinees responded to all items.

Initial Calibration of Forms CASAS conducted the initial calibration of items in the fall of 1980 based on ten test forms. All forms contained basic life skills items measured in a functional life skills context. Since math in a functional context requires the ability to read, these items were included on the reading scale. A total of 4,115 students enrolled in adult basic education programs, including high school completion, participated in this initial item calibration of 422 items.

Item Linking Procedures In order to place all items on a single scale: a set of common items, or linking items, was embedded among forms. a set of common items, or linking items, was embedded among forms. One calibration form was chosen as the “anchor” test to which all other tests were directly linked to establish the common scale. One calibration form was chosen as the “anchor” test to which all other tests were directly linked to establish the common scale. The choice of an anchor form was made following an earlier decision to focus on the development and selection of life skills competencies appropriate to a mid-range achievement level, intermediate ABE and ESL. This population was chosen because it had more experience in the classroom and with taking tests and was judged to be broadly representative of adult learners in general. The choice of an anchor form was made following an earlier decision to focus on the development and selection of life skills competencies appropriate to a mid-range achievement level, intermediate ABE and ESL. This population was chosen because it had more experience in the classroom and with taking tests and was judged to be broadly representative of adult learners in general. The anchor form was also designed so that these learners would successfully respond to more than 50 percent of the items. The anchor form was also designed so that these learners would successfully respond to more than 50 percent of the items. The linking items on the anchor form were used to adjust the difficulties of non-linking items on each of the other tests in the anchor series. The linking items on the anchor form were used to adjust the difficulties of non-linking items on each of the other tests in the anchor series. The anchor series of forms included also beginning and advanced levels of ABE and ESL. The anchor series of forms included also beginning and advanced levels of ABE and ESL.

Calibration of Forms The actual calibration of items included: only those item response sets for students who had responded correctly to more than 20 percent and fewer than 90 percent of the items on the field test form. only those item response sets for students who had responded correctly to more than 20 percent and fewer than 90 percent of the items on the field test form. The exclusion of student responses for this lower success range minimized the influence of including results for those who may have been guessing. The exclusion of student responses for this lower success range minimized the influence of including results for those who may have been guessing. One additional restriction eliminated results for students who did not have at least one correct answer on the last half of the test. One additional restriction eliminated results for students who did not have at least one correct answer on the last half of the test. 863 student item response sets were then included for the anchor form. 863 student item response sets were then included for the anchor form. The remaining nine forms all met the minimum requirement of having at least 300 examinees respond to each item. The remaining nine forms all met the minimum requirement of having at least 300 examinees respond to each item. In addition to individual item responses on these item calibration forms, demographic and program descriptor information (including age, sex, ethnicity, primary language, number of years of school completed and program level enrollment) was collected for all students in the initial item calibrations. In addition to individual item responses on these item calibration forms, demographic and program descriptor information (including age, sex, ethnicity, primary language, number of years of school completed and program level enrollment) was collected for all students in the initial item calibrations. In the spring of 1981, 16 additional item calibration forms were administered to 4,606 students enrolled in Adult Basic Education, English as a Second Language, and high school completion programs. Items from the fall 1980 item calibrations were included in these forms to serve as linking items for the item calibration process. Items from these two administrations were extensively analyzed and those that met the assumptions of the Rasch model were then included in the initial CASAS item bank (See chapter 3 for detailed psychometric data on the initial calibration forms). In the spring of 1981, 16 additional item calibration forms were administered to 4,606 students enrolled in Adult Basic Education, English as a Second Language, and high school completion programs. Items from the fall 1980 item calibrations were included in these forms to serve as linking items for the item calibration process. Items from these two administrations were extensively analyzed and those that met the assumptions of the Rasch model were then included in the initial CASAS item bank (See chapter 3 for detailed psychometric data on the initial calibration forms).

Calibration of Forms (cont.) All initial field tests were carefully reviewed for model fit, ability fit, and item bias All initial field tests were carefully reviewed for model fit, ability fit, and item bias Both Item Characteristic Curves (ICCs) and Test Characteristic Curves were (TCCs) were generated for a visual inspection of model and ability fit and item bias including: Both Item Characteristic Curves (ICCs) and Test Characteristic Curves were (TCCs) were generated for a visual inspection of model and ability fit and item bias including: Gender bias Gender bias Ethnic bias Ethnic bias Language bias Language bias

Chapter 3. Validity & Psychometric Properties of the CASAS Item Banks This chapter contains evidence relating to the validity and psychometric properties of the overall CASAS Item Banks. This chapter contains evidence relating to the validity and psychometric properties of the overall CASAS Item Banks. Later chapters and periodic supplements will present evidence relating to the validity of specific instruments constructed from the Item Banks. Later chapters and periodic supplements will present evidence relating to the validity of specific instruments constructed from the Item Banks.

Introduction to Validity The Standards for Educational and Psychological Testing (1999) state that validity refers to: the appropriateness, the appropriateness, meaningfulness, and meaningfulness, and usefulness of the specific inferences made from test scores. usefulness of the specific inferences made from test scores. There are various evidences of validity, with construct validity encompassing the overriding issue of proper utilization and construction of test items, and construct validity encompassing the overriding issue of proper utilization and construction of test items, and content-related and criterion-related validity as sub- components. content-related and criterion-related validity as sub- components. an ideal validation includes several types of evidence an ideal validation includes several types of evidence

Psychometric Properties Content Validity Content Validity Competency Validation Studies Competency Validation Studies Unidimensionality of Item Banks Unidimensionality of Item Banks Parameter Invariance Parameter Invariance Differential Item Functioning Differential Item Functioning Criterion-Related Validity Criterion-Related Validity

Content Validity--Evidence for Consensus Definitions and Alignments Across States of Trait Being Measured Over Time Consensus Definitions and Alignments Across States of Trait Being Measured Over Time Use of Psychometrically Sound Item Development Procedures Use of Psychometrically Sound Item Development Procedures Unidimensionality of the Item Banks Unidimensionality of the Item Banks

Competency Validation Studies A number of recent studies have been conducted throughout the United States to reaffirm that the content and competencies addressed in the initial development of the item bank are still valid and relevant to current needs of learners States have undertaken the task of identifying critical skill needs as defined by stakeholder groups. 1. Iowa--The Iowa Adult Basic Skills Survey: Final Report (IABSS), 2. Indiana--Validation of Foundation Skills 3. Connecticut--Targeting Education: The Connecticut Adult Basic Skills Survey 4. California--CABSS Report: California Adult Basic Skills Survey 5. SCANS and CASAS Competencies Relationship Study

Content Validity 1 st Step The first step in analyzing the content validity of an item is in ensuring that that item measures the skill or competency it is charged with measuring. All CASAS test items have gone through a rigorous process to ensure that the competency or skill being assessed by a particular item does in fact measure what it was intended to measure.

Content Validity 2 nd Step The second step in content validity analysis is ensuring that the skills and competencies addressed through an item are in fact aligned with the target goals of the instrument. Tests developed from the CASAS item banks are based on specified life skill competency statements and assess a learner's proficiency in performing specified tasks involving solving life skill problems, or applying basic reading and math skills. Tests developed from the CASAS item banks are based on specified life skill competency statements and assess a learner's proficiency in performing specified tasks involving solving life skill problems, or applying basic reading and math skills. All CASAS competencies have been identified as priority competencies by field practitioners based on learner and program goals. All CASAS competencies have been identified as priority competencies by field practitioners based on learner and program goals.

Content Validity—Item Development Facets of item development include: content and contextual relevance, content and contextual relevance, item appropriateness, item appropriateness, wording, wording, effective distractor usage, and effective distractor usage, and proper content weighting. proper content weighting. It is important to ensure that the CASAS item banks have no misalignment between measurement content and instruction and that the items and test instruments constructed from the banks accurately measure all skills necessary to certify a certain range or level of skill.

Content Validity—Item Writing Trained item writers develop items from item specifications. Trained item writers develop items from item specifications. All items received at least two initial reviews and were evaluated again after revisions and refinements were completed for field-testing. All items received at least two initial reviews and were evaluated again after revisions and refinements were completed for field-testing. Expert reviewers verified that each item matched its competency and specific objective and Expert reviewers verified that each item matched its competency and specific objective and checked for non-triviality, checked for non-triviality, factual accuracy, and factual accuracy, and possible ethnic or gender bias. possible ethnic or gender bias.

Content Validity—Clinical Try-Out Before actual field-testing, items were given a clinical try-out in a limited number of classroom situations appropriate to the item content and perceived item difficulty level. This process identified any obvious weaknesses or ambiguities in the item. Students in the clinical try-out took the test and then were questioned regarding how and why they made their responses to the individual items. Students in the clinical try-out took the test and then were questioned regarding how and why they made their responses to the individual items. Teachers were also asked to identify any items that deviated from their curriculum. Teachers were also asked to identify any items that deviated from their curriculum. Items were then revised based on the clinical try out and readied for inclusion on item field test forms. Items were then revised based on the clinical try out and readied for inclusion on item field test forms.

Content Validity--Item Calibration Following the field test of the items: classical item statistics were reviewed including each item’s classical item statistics were reviewed including each item’s p-value, p-value, point biserial coefficient, and point biserial coefficient, and discrimination index, discrimination index, items were calibrated using the one-parameter model items were calibrated using the one-parameter model

Item Calibration During the calibration process, all items were examined with respect to two mean-square residual summary statistics, two mean-square residual summary statistics, infit and infit and outfit. outfit. Although no hard-and-fast rules were utilized to identify misfitting items, those items with either infit or outfit values less than.7 or greater than 1.3 were reviewed and eliminated if not essential to the measurement of the competency statement. Although no hard-and-fast rules were utilized to identify misfitting items, those items with either infit or outfit values less than.7 or greater than 1.3 were reviewed and eliminated if not essential to the measurement of the competency statement.

Unidimensionality of the Item Banks Fundamental to all IRT models is the notion of unidimensionality — that is, test performance can be defined in terms of a single latent trait. The assumption is that the items in a test are homogenous and are measuring a single trait.

Unidimensionality of the Item Banks Research in based on several of the procedures used by Soireci, Rogers, Swaminathan, Meara, and Robin (2000) Determine the degree to which the CASAS item bank, in the areas of reading and math, may be considered unidimensional with a single underlying latent variable, Functional Adult Life Skills, or should it, in fact, be considered as having two underlying latent variables, reading and math. Determine the degree to which the CASAS item bank, in the areas of reading and math, may be considered unidimensional with a single underlying latent variable, Functional Adult Life Skills, or should it, in fact, be considered as having two underlying latent variables, reading and math. Examine the dimensionality of the listening tests and their underlying structure. Examine the dimensionality of the listening tests and their underlying structure.

Unidimensionality of the Item Banks Raw Score Correlational Analysis Raw Score Correlational Analysis Principal Components with Tetrachoric Coefficients Principal Components with Tetrachoric Coefficients Confirmatory Factor Analyses Confirmatory Factor Analyses Goodness of Fit Index (GFI) Goodness of Fit Index (GFI) Adjusted Goodness of Fit Index (AGFI) Adjusted Goodness of Fit Index (AGFI) Root Mean Square Error (RMSE) Root Mean Square Error (RMSE) Root Mean Square Error of Approximation (RMSEA) Root Mean Square Error of Approximation (RMSEA)

Raw Score Correlational Analysis Of the eleven correlations between the math and reading raw scores, only one was below.50 and four were above.60. The correlations were not disattenuated due to the magnitude of both the math and reading and total combined alphas. The median correlation was.59, which would indicate that the math and reading adult life skill items were not measuring the same construct.

Principal Components Analysis Data from the 11 combined math and reading booklets were analyzed using principal components analysis. Each set of items was composed of both math and reading items in an adult life skills context. The math and reading items were also independently analyzed. The sample sizes for each booklet ranged from a low of 261 to a high of 11,138. Data from the five listening booklets were also subjected to a principal components analysis. Data from the 11 combined math and reading booklets were analyzed using principal components analysis. Each set of items was composed of both math and reading items in an adult life skills context. The math and reading items were also independently analyzed. The sample sizes for each booklet ranged from a low of 261 to a high of 11,138. Data from the five listening booklets were also subjected to a principal components analysis. When analyzed separately, the 1st eigenvalues for reading and math accounted for more of the total variance than when math and reading were combined. When analyzed separately, the 1st eigenvalues for reading and math accounted for more of the total variance than when math and reading were combined.

Factor Analysis In the case of the one-factor model, the single factor was “adult life skills problem solving.” In the case of the one-factor model, the single factor was “adult life skills problem solving.” In the two factor model, the factors were reading and math in an “adult life skills problem solving” context. In the two factor model, the factors were reading and math in an “adult life skills problem solving” context. All model-data fit results are based on the product-moment correlation. All model-data fit results are based on the product-moment correlation.

Principal Components Analysis (Form 30—Beginning Level Test) Form 30 – Reading and Math Combined. The first principal component (eigenvalue = 18.02) accounted for 45 percent of the total variance with the second (eigenvalue = 1.76) accounting for 4 percent. All items had loadings greater than.40 on the first component. Form 30 – Reading and Math Combined. The first principal component (eigenvalue = 18.02) accounted for 45 percent of the total variance with the second (eigenvalue = 1.76) accounting for 4 percent. All items had loadings greater than.40 on the first component. Form 30 – Reading. The first principal component (eigenvalue = 10.24) accounted for 51 percent of the total variance with the second (eigenvalue =.94) accounting for 5 percent. Form 30 – Reading. The first principal component (eigenvalue = 10.24) accounted for 51 percent of the total variance with the second (eigenvalue =.94) accounting for 5 percent. Form 30 – Math. The first principal component (eigenvalue = 9.33) accounted for 47 percent of the total variance with the second (eigenvalue = 1.19) accounting for 6 percent. Form 30 – Math. The first principal component (eigenvalue = 9.33) accounted for 47 percent of the total variance with the second (eigenvalue = 1.19) accounting for 6 percent. The three scree plots for Form 30 are presented in Figure 3.1.

Scree Plots for a Beginning Level Test—Combined and Separate

Principal Components Analysis (Form 35—Advanced Level Test) Form 35. Reading and Math Combined. The first principal component (eigenvalue = 12.35) accounted for 17 percent of the total variance with the second (eigenvalue = 3.62) accounting for five percent of the variance. Twenty-seven of the 38 reading items and 13 of the 35 math items had loadings greater than.40 on the first component. Of the remaining 11 reading items, six had loadings above.30 on the first component. Of the 22 remaining math items, 13 had loadings above.30 on the first component. Form 35. Reading and Math Combined. The first principal component (eigenvalue = 12.35) accounted for 17 percent of the total variance with the second (eigenvalue = 3.62) accounting for five percent of the variance. Twenty-seven of the 38 reading items and 13 of the 35 math items had loadings greater than.40 on the first component. Of the remaining 11 reading items, six had loadings above.30 on the first component. Of the 22 remaining math items, 13 had loadings above.30 on the first component. Form 35 – Reading. The first principal component (eigenvalue = 8.69) accounted for 23 percent of the total variance with the second (eigenvalue = 1.62) accounting for 4 percent. Form 35 – Reading. The first principal component (eigenvalue = 8.69) accounted for 23 percent of the total variance with the second (eigenvalue = 1.62) accounting for 4 percent. Form 35 – Math. The first principal component (eigenvalue = 8.88) accounted for 20 percent of the total variance with the second (eigenvalue = 1.78) accounting for 5 percent. Form 35 – Math. The first principal component (eigenvalue = 8.88) accounted for 20 percent of the total variance with the second (eigenvalue = 1.78) accounting for 5 percent. The three scree plots for Form 35 are presented in Figure 3.4.

Scree Plots for a Advanced Level Test—Combined and Separate

Confirmatory Factor Analyses--Description of Four Fit Statistics The first three fit statistics are overall model fit indices and the fourth is a comparative fit measures which compares the proposed model to a null model. The first three fit statistics are overall model fit indices and the fourth is a comparative fit measures which compares the proposed model to a null model. The Goodness of Fit Index (GFI) is a measure of the proportion of variance and covariance that the hypothesized model is able to explain The Goodness of Fit Index (GFI) is a measure of the proportion of variance and covariance that the hypothesized model is able to explain the Adjusted Goodness of Fit Index (AGFI) considers the degrees of freedom in computing the measure. the Adjusted Goodness of Fit Index (AGFI) considers the degrees of freedom in computing the measure. The range of the GFI and AGFI is from 0 (poor fit) to 1 (perfect fit). A recommended minimum value for GFI is 0.90 and 0.80 for AGFI (Segars and Grover, 1993). The range of the GFI and AGFI is from 0 (poor fit) to 1 (perfect fit). A recommended minimum value for GFI is 0.90 and 0.80 for AGFI (Segars and Grover, 1993). The Root Mean Square Residual (RMR) is an average of the residuals between observed and estimated input matrices. The Root Mean Square Residual (RMR) is an average of the residuals between observed and estimated input matrices. The smaller the value of RMR, the better the fit. The smaller the value of RMR, the better the fit. The maximum recommended value for RME is 0.10 (Chau, 1997). The maximum recommended value for RME is 0.10 (Chau, 1997). The Root Mean Square Error of Approximation (RMSEA) is a comparative fit measure that reflects the extent that the proposed model does not fit the data. The Root Mean Square Error of Approximation (RMSEA) is a comparative fit measure that reflects the extent that the proposed model does not fit the data. A value less than.05 suggests that the model is a reasonable approximation to the data (Browne and Cudeck, 1993). A value less than.05 suggests that the model is a reasonable approximation to the data (Browne and Cudeck, 1993).

Summary of Confirmatory Factor Analyses Results Life Skills Series One Factor ModelTwo-Factor Model FormGFIAGFIRMRRMSEAGFIAGFIRMRRMSEA Reading and Math Listening

Summary of Confirmatory Factor Analyses Results Employment Competency Series One Factor ModelTwo-Factor Model FormGFIAGFIRMRRMSEAGFIAGFIRMRRMSEA Reading and Math Listening

Confirmatory Factor Analyses-Results The two-factor model (reading and math) for both the Life Skills and ECS Series provided a better fit than did the one-factor model without regard to the four fit statistics.

Parameter Invariance The correlations between the CASAS bank item difficulties for math form 15 was.79 for the CCC group and.81 for other youth. Similarly, for Form 15 reading, the correlations were.76 and.81. The correlations between the CASAS item bank difficulties for math form 35 was.84 for the CYA group and.87 for other youth. Similarly, for Form 35 reading, the correlations were.85 and.89 respectively. All inter- correlations are found in Tables 3.9 and The correlations between the CASAS bank item difficulties for math form 15 was.79 for the CCC group and.81 for other youth. Similarly, for Form 15 reading, the correlations were.76 and.81. The correlations between the CASAS item bank difficulties for math form 35 was.84 for the CYA group and.87 for other youth. Similarly, for Form 35 reading, the correlations were.85 and.89 respectively. All inter- correlations are found in Tables 3.9 and These results would indicate a fair degree of temporal stability with respect to the item difficulties during the past 20 years. These results would indicate a fair degree of temporal stability with respect to the item difficulties during the past 20 years.

Table 3.9Inter-Correlations Between Bank Difficulties and Difficulties Generated from CYA on Form 35 Math and Form 35 Reading MathReading BankCYA Other Youth BankCYA Bank Bank CYA CYA Table 3.10 Inter-Correlations Between Bank Difficulties and Difficulties Generated from CCC and on Form 15 Math and Form 15 ReadingMathReadingBankCCC Other Youth BankCCC Bank Bank CYA CYA

Bank Difficulties and Difficulties Generated by Examinees for Both the Life Skills Series and ECS Life Skills Form N6691,3607,0622,059 ReadingItems Correlation MathItems Correlation Employability Competency System Form N , ReadingItems Correlation MathItems Correlation

Differential Item Functioning-- Description The Delta value indicates the average amount by which examinees in a focal group found an item more difficult than did a reference group. Positive values on this scale indicate that the item favors the focal group, that is, an item with a positive value is differentially easier for the focal group. Similarly, an item with a negative Delta differentially favors the reference group. The Delta value indicates the average amount by which examinees in a focal group found an item more difficult than did a reference group. Positive values on this scale indicate that the item favors the focal group, that is, an item with a positive value is differentially easier for the focal group. Similarly, an item with a negative Delta differentially favors the reference group. Items having a Delta less than an absolute value of 1.0 are used as needed in order to meet the content requirements of the test. Items having a Delta value between 1.0 and 1.5 are subjected to review by content specialists. Items having a Delta greater than 1.5 are only used in a test if no other item from the required domain has a lower value and the item content is deemed critical to the assessment. Items having a Delta less than an absolute value of 1.0 are used as needed in order to meet the content requirements of the test. Items having a Delta value between 1.0 and 1.5 are subjected to review by content specialists. Items having a Delta greater than 1.5 are only used in a test if no other item from the required domain has a lower value and the item content is deemed critical to the assessment.

Differential Item Functioning The results of the analyses show a general lack of ethnic or gender bias by a vast majority of the items used in tests examined. However, some items showed bias and would require further examination and review.

Summary of Mantel-Haenszel Analysis for Gender Indicating the Number of Items by Form within Delta Difference Ranges For the Life Skills Series Delta Difference Range Form Total Number of Items Test Items with Absolute Value Less than 1.0 Test Items with Absolute Value Between 1.0 and 1.5 Test Items with Absolute Value Greater than 1.5 and Less than 2.0 Reading Math Listening

Mantel-Haenszel Analysis for Ethnicity (Anglo - Hispanic) Indicating the Number of Items by Form within Delta Difference Ranges Life Skills Series Delta Difference Range Form Total Number of Items Test Items with Absolute Value Less than 1.0 Test Items with Absolute Value Between 1.0 and 1.5 Test Items with Absolute Value Greater than 1.5 and Less than 2.0 Reading Math Listening 5134 *** *insufficient number for analysis

Criterion-Related Validity CASAS Skill Level Descriptors and Historical Background CASAS Skill Level Descriptors and Historical Background CASAS and Other National Reference Scales CASAS and Other National Reference Scales Performance on CASAS Related to GED Performance on CASAS Related to GED CASAS Related to Work Keys CASAS Related to Work Keys CASAS Related to Years of Schooling CASAS Related to Years of Schooling CASAS Related to Educational Attainment CASAS Related to Educational Attainment

Criterion-Related Validity Criterion-related, or predictive, validity assesses the ability or effectiveness of an instrument in predicting something it should theoretically be able to predict. Criterion-related, or predictive, validity assesses the ability or effectiveness of an instrument in predicting something it should theoretically be able to predict. The criterion for CASAS tests developed in 1980 was directed at determining a learner's appropriate level of placement into ABE, ESL and the preparatory high school curriculum. The criterion for CASAS tests developed in 1980 was directed at determining a learner's appropriate level of placement into ABE, ESL and the preparatory high school curriculum.

CASAS Skill Level Descriptors The CASAS Skill Level Descriptors show a continuum of skills from beginning through advanced adult secondary. They provide descriptions of adults' general job-related ability in reading, mathematics, oral communication, and writing. The Skill Level Descriptors explain in general terms what most learners can accomplish at the CASAS scale score level in a specific skill area. The CASAS Skill Level Descriptors show a continuum of skills from beginning through advanced adult secondary. They provide descriptions of adults' general job-related ability in reading, mathematics, oral communication, and writing. The Skill Level Descriptors explain in general terms what most learners can accomplish at the CASAS scale score level in a specific skill area. This scale has been verified and validated on more than three million adult and youth learners. The CASAS scale is divided into five levels: A (Beginning Literacy) to E (Advanced Secondary), each encompassing a range of scores. Each level is defined by a CASAS scale score range with corresponding competency descriptors of performance in employment and adult life skills contexts. This scale has been verified and validated on more than three million adult and youth learners. The CASAS scale is divided into five levels: A (Beginning Literacy) to E (Advanced Secondary), each encompassing a range of scores. Each level is defined by a CASAS scale score range with corresponding competency descriptors of performance in employment and adult life skills contexts.

Relationship Between CASAS and Mainstreat English Language Training (MELT) CASASMELTPossible Program ScoresLevelPlacementMELT Description IESL Pre-Literate Orientation Functions minimally, if at all, in English IIESL Beginning ( Level 1) Functions in a very limited way in situations related to immediate need IIIESL Beginning (Level 2) Functions with some difficulty in situations related to immediate needs IVESL Intermediate (Level 1) Can satisfy basic survival needs and a few very routine social demands VESL Intermediate (Level 2) Can satisfy basic survival needs and some limited social demands VIESL Advanced (Level 1) Can satisfy most survival needs and limited social demands. 225+VIIESL Advanced (Level 2) Can satisfy survival needs and routine work and social demands.

Student Performance Level (SPL) SPL descriptions along with the CASAS achievement scale provide a sound basis for articulating instructional program levels. The relationship among the SPLs, the literacy sections of BEST, and the CASAS reading tests are shown in the following table.

SPL Levels BEST scores CASAS Scores < > Relationship Among SPL Levels, BEST Scores, and CASAS Scores

Relationship Among CASAS, NRS *, NALS**, Work Keys, SPL***, and Years of School Completed CASAS Levels CASAS Score Ranges NRS Levels and Names for ABE NRS Levels and Names for ESL NALS Levels SPL Levels Work Keys Levels Years of School Completed A 180 and below 1Beginning ESL Literacy 11 Below 3 1 to 2 A 181 – 200 1Beginning ABE Literacy 2Beginning ESL 1 2 and 3 Below 3 1 to 2 B 201 – 210 2Beginning Basic Education 3Low Intermediate ESL 14 Below 3 3 to 5 B 211 – 220 3Low Intermediate Basic Education 4High Intermediate ESL 15 Below 3 6 to 7 C 221 – 235 4High Intermediate Basic Education 5Low Advanced ESL to 10 D 236 – 245 5Low Adult Secondary Education 6High Advanced ESL 2/ to 12 E 246 and above 6High Adult Secondary Education *National Reporting System (WIA Title II) **National Adult Literacy Survey ***Student Performance Levels

Performance on CASAS Related to GED A clear monotonic increasing relationship was still found between CASAS reading scores and GED reading scores and CASAS math scores and GED math scores. Also, a similar relationship was found between CASAS reading scores and overall GED results averaged across the five test content areas.

CASAS Reading Mean Test Scores Associated with GED Reading Score Ranges GED Score Range CASAS Reading Test Mean N CASAS S.D. Less than

CASAS Math Mean Test Scores Associated with GED Math Score Ranges GED Score Range CASAS Math Test Mean N CASAS S.D. Less than

CASAS Reading Mean Test Scores Associated with GED Total Score Ranges GED Score Range CASAS Math Test Mean NSD Less than

CASAS to Work Keys The data show that as a learner’s scores on the CASAS reading and mathematical scales increased, the ACT Work Keys Level tended to increase as well. The Pearson correlation coefficient between the two measures was.71 for reading and.70 for mathematics.

CASAS to Years of Schooling and Degree Two small scale studies were conducted in Iowa One comparing CASAS Reading Scores with highest grade completed, and One comparing CASAS Reading Scores with highest grade completed, and The other compared both CASAS Reading and Math Scores with highest degree completed. The other compared both CASAS Reading and Math Scores with highest degree completed. Comparable data has been observed over the years when comparing CASAS Reading and/or Math Scores with number of years prior schooling

Iowa Population Mean Scale Scores by Highest Grade Completed Highest Grade Completed Number%Reading 8 or less * ± ± * * * Statistically significant different from subsequent level at the.05 level. ± Statistically significant difference from the second subsequent level at the.05 level

Iowa Population Mean Scale Scores by Highest Degree Earned Highest Degree Completed CASAS Number % of sampleReadingMath None High School GED Vocational/ Technical AA/AS

CASAS to Years of Schooling and Degree The results demonstrate that while CASAS scale scores are not precise equivalents for grade levels completed, there is a clear correlation between the two, and that CASAS scale scores in reading on the ECS Series do translate to higher grade levels completed.

Chapters 4 & 5 Life Skills and Employability Test Series CASAS tests assess learner attainment of a range of specific competencies presented in functional contexts. Tests can be used both to check proficiency in skill areas and to measure learning progress. Tests can be used both to check proficiency in skill areas and to measure learning progress. There are two main series of pretests and post-tests designed to monitor learning progress — the Life Skills Series and the Employability Series, which differ largely in content focus. There are two main series of pretests and post-tests designed to monitor learning progress — the Life Skills Series and the Employability Series, which differ largely in content focus. The Life Skills Series covers a wide range of content areas, including employment. The Life Skills Series covers a wide range of content areas, including employment. The Employability Series contains primarily employment-related content. The Employability Series contains primarily employment-related content. Both series include reading and math tests for both native English speakers and ESL learners. Both series include reading and math tests for both native English speakers and ESL learners.

Appraisal Tests Appraisals are used as an initial assessment to get a general idea of a learner’s reading, math, or listening comprehension skills. These test results guide placement into the appropriate instructional level and identify the appropriate progress test level. These test results guide placement into the appropriate instructional level and identify the appropriate progress test level. Test items span a wider range of difficulty ranging from at the lower end of the scale to the 240s at the upper end than do the pre- and post-tests. Test items span a wider range of difficulty ranging from at the lower end of the scale to the 240s at the upper end than do the pre- and post-tests. The appraisals, either the Life Skills Appraisal or the ECS Appraisal, assess reading comprehension and math. The appraisals, either the Life Skills Appraisal or the ECS Appraisal, assess reading comprehension and math. Listening comprehension of ESL learners may be tested with the ESL Appraisal. Listening comprehension of ESL learners may be tested with the ESL Appraisal.

Use of Series Either test series can be used in a pre- and post-test design to provide standardized information about learning gains. The progress-testing model may be articulated by following these temporal steps: (1) Place {Appraisal}, (1) Place {Appraisal}, (2) Pretest {establish baseline skill levels to begin instruction}, (2) Pretest {establish baseline skill levels to begin instruction}, (3) Instruct {Ongoing informal assessment and instruction}, (3) Instruct {Ongoing informal assessment and instruction}, (4) Monitor {post-test}, (4) Monitor {post-test}, (5) Certify {certification test or a level completion test to confirm the learner’s skill level in a promotion or exit paradigm}. (5) Certify {certification test or a level completion test to confirm the learner’s skill level in a promotion or exit paradigm}.

Reading Tests Reading comprehension tests assess reading skills in a functional life skills context using documents, documents, signs, signs, charts, charts, forms, forms, procedures, procedures, reading passages, or reading passages, or other realistic presentations. other realistic presentations. Depending on the level of difficulty of the item, examinees are to scan, scan, locate detail, locate detail, interpret, interpret, analyze, or analyze, or evaluate the selection to answer questions. evaluate the selection to answer questions. There is no time limit for the tests, but most students finish within one hour.

Math Tests Items on the math tests require the practical application of math skills. Typical items involve locating information in a chart, locating information in a chart, table, or table, or graph graph to perform a calculation. to perform a calculation. Tests also include word problems and word problems and other situational applications, as well as other situational applications, as well as some computation items. some computation items. Items range in difficulty from Items range in difficulty from locating numerical information to locating numerical information to application of formulas and application of formulas and basic algebra. basic algebra. There is no time limit for the tests, but most students finish within one hour.

Listening Tests Listening comprehension tests incorporate a variety of item types. Simpler items use pictures as part of the cue or as answer choices. Simpler items use pictures as part of the cue or as answer choices. Other item types include responding to a question or statement, Other item types include responding to a question or statement, identifying an equivalent statement, identifying an equivalent statement, completing a dialogue, and completing a dialogue, and interpreting information from a dialogue or statement. interpreting information from a dialogue or statement. Tests are administered via audiotape.

Organization of Charts and Tables Describing Both Test Series For each test series and tests within each series, Test Forms within each series identifies: test form numbers, test form numbers, test level, test level, number of test items on each test, number of test items on each test, test use, and test use, and subsequent tables identifies specific competencies measured on each test subsequent tables identifies specific competencies measured on each test

Descriptive statistics for each series are presented-- Reading Life Skills Descriptive Statistics for the Life Skills Reading Series Form # N # of Items Items Mean Raw ScoreStandardDeviationMeanP-Value Mean Point BiserialKR , , , , X2, , , X 27, , , , , Note: These numbers were run based on data from 1996 through The initial item analyses were run for all items with sample sizes larger than 300

Form #N # of Items Raw Score Mean Standard DeviationMean P-Value Mean Point BiserialKR-20 31M M M M M M M M Note. These numbers were run based on data from 1996 through The initial item analyses were run for all items with sample sizes larger than 300 Descriptive statistics for each series are presented-- Math Life Skills

Form #N # of Items Raw Score Mean Standard Deviation Mean P- Value Mean Point BiserialKR , , , , , , Note: The 51L and 52L tests appear in both the ECS and Life Skills series * Note: These numbers were run based on data from 1996 through Initial item analyses were run for all items with sample sizes larger than 300 Descriptive statistics for each series are presented-- Listening Life Skills

Form #N No. of Items Mean Raw Score Standard Deviation Mean P-Value Mean Point BiserialKR , , , , , , * Note. These numbers were run based on data from 1996 through The initial item were run for all items with sample sizes larger than 300analyses Descriptive statistics for each series are presented-- ECS Reading

Form #N No. of Items Raw Score Mean Standard Deviation Mean P- Value Mean Point BiserialKR , Note. These numbers were run based on data from 1996 through Initial item analyses were run for all items with sample sizes larger than 300. Descriptive statistics for each series are presented-- ECS Math

Form #N# of Items Raw Score Mean Standard Deviation Mean P- Value Mean Point Biserial KR , , , , Note. The 51L and 52L tests appear in both the Life Skills and ECS series Note. These numbers were run based on data from 1996 through * Initial item analyses were run for all items with sample sizes larger than 300. Descriptive statistics for each series are presented-- ECS Math

Raw Score Conversion Tables Tables showing the raw score, raw score, scale score, and scale score, and standard error for each test form. standard error for each test form. CASAS considers the scores between the dotted lines as the most accurate. The scores at the end of each test with black diamond (  ) markings are scale estimates above the accurate range. Test users are strongly encouraged to use scores that fall within the accurate ranges.

Test Information Function The shape of the Test Information Function depends on the purpose of the test. A Test Information Function, which is basically fairly flat across an ability range, would measure ability with somewhat equal precision across that range. A function of this nature would be desirable for a general pre- post type assessment over a wide range of ability. In contrast, a Test Information Function, which is peaked, would provide the most information for examinees whose abilities fall near the peak of the function. A function of this nature would be desirable for a certification test. In general, CASAS tests provide quite acceptable precision/information across the ability range that each test was designed to measure. This, coupled with CASAS’s definition of the accurate range (a S.E. equal to or less than 5.5) helps to insure ability measures which are good approximations of the examinee’s true score.

Chapter 6 and Beyond Later chapters and periodic supplements will present evidence relating to the validity of new test series and specific instruments constructed from the Item banks. Later chapters and periodic supplements will present evidence relating to the validity of new test series and specific instruments constructed from the Item banks.