Presentation is loading. Please wait.

Presentation is loading. Please wait.

CASAS Technical Manual 3 rd Edition Presentation at 2004 CASAS Summer Institute Drs. John Martois and Richard Stiles.

Similar presentations


Presentation on theme: "CASAS Technical Manual 3 rd Edition Presentation at 2004 CASAS Summer Institute Drs. John Martois and Richard Stiles."— Presentation transcript:

1 CASAS Technical Manual 3 rd Edition Presentation at 2004 CASAS Summer Institute Drs. John Martois and Richard Stiles

2 Organization of Manual Introduction Introduction Chapter 1. Background Information About CASAS Chapter 1. Background Information About CASAS Chapter 2. Development of CASAS Item Banks Chapter 2. Development of CASAS Item Banks Chapter 3. Validity & Psychometric Properties of the CASAS Item Banks Chapter 3. Validity & Psychometric Properties of the CASAS Item Banks Chapter 4. Life Skills Series Chapter 4. Life Skills Series Chapter 5. Employability Competency System Series Chapter 5. Employability Competency System Series Chapter 6. (future test series and specific tests) Chapter 6. (future test series and specific tests)

3 Introduction The CASAS Technical Manual provides descriptive background and psychometric information about the item bank and selected tests and test series developed from the banks.

4 Introduction (cont) The manual begins with an overview of the CASAS system. The manual begins with an overview of the CASAS system. Succeeding chapters describe the development of the item banks and the associated competency statements, the psychometric model underlying CASAS — Item Response Theory — and the test development process including test reliability and validity. Succeeding chapters describe the development of the item banks and the associated competency statements, the psychometric model underlying CASAS — Item Response Theory — and the test development process including test reliability and validity.

5 Current (1999) version of the AERA, APA, NCME Standards for educational and psychological testing was used as a guide in developing of the CASAS Technical Manual, 3 rd Edition

6 Intent of Test Standards “…promote the sound and ethical use of tests and “…promote the sound and ethical use of tests and To provide a basis for evaluating the quality of testing practices.” To provide a basis for evaluating the quality of testing practices.” (STANDARDS for educational and psychological testing. AERA, APA, and NCME 1999)

7 Users of the APA, NCME, AERA Test Standards Test developers Test developers Test publishers Test publishers Test administrators Test administrators Test results users and decision-makers Test results users and decision-makers Test interpreters for clients Test interpreters for clients Test takers Test sponsors or contractors for testing Test buyers, selectors, and reviewers

8 Chapter 1. Background Information About CASAS Overview of CASAS: the Organization and the System Overview of CASAS: the Organization and the System Formation of CASAS, the Organization and the System Formation of CASAS, the Organization and the System Theoretical Framework Theoretical Framework Competency and Results-Oriented Education Competency and Results-Oriented Education CASAS National Consortium CASAS National Consortium Transportability and Adaptability of CASAS Transportability and Adaptability of CASAS National Validation of CASAS Assessment System National Validation of CASAS Assessment System

9 Overview of CASAS: the Organization and the System CASAS provides learner-centered curriculum management, assessment, evaluation, data management and reporting systems to education and training programs in the public and private sector. CASAS provides learner-centered curriculum management, assessment, evaluation, data management and reporting systems to education and training programs in the public and private sector. The system is designed to serve primarily adult learners functioning below the high school graduation level. The system is designed to serve primarily adult learners functioning below the high school graduation level. Assessment system is used to place learners into levels of program, monitor their progress in program, and certify competency attainment. Assessment system is used to place learners into levels of program, monitor their progress in program, and certify competency attainment.

10 Formation of CASAS, the Organization and the System Established by the California Department of Education in 1980 as a consortium of local adult education and literacy providers including community based organizations, public libraries, community colleges, correctional institutions, and adult schools. Established by the California Department of Education in 1980 as a consortium of local adult education and literacy providers including community based organizations, public libraries, community colleges, correctional institutions, and adult schools. In 1984, CASAS was nationally validated by the U.S. Department of Education, Joint Dissemination Review Panel. In 1984, CASAS was nationally validated by the U.S. Department of Education, Joint Dissemination Review Panel. Currently it is a division of a non-profit for public benefit organization, the Foundation for Educational Achievement housed in San Diego, California. Currently it is a division of a non-profit for public benefit organization, the Foundation for Educational Achievement housed in San Diego, California. CASAS has both a California and a National Consortium that provide guidance for its research and development. CASAS has both a California and a National Consortium that provide guidance for its research and development.

11 Theoretical Content Framework; Competency and Results-Oriented Education Initially used the frame of the Adult Performance Level Study (APL) conducted in the mid-70’s Initially used the frame of the Adult Performance Level Study (APL) conducted in the mid-70’s Incorporated item types from the California High School Proficiency Examination Incorporated item types from the California High School Proficiency Examination Adopted and helped form many of the tenants of the competency-based adult education movement which lead to Results-oriented education where educators are accountable for learners to attain exit outcomes. Adopted and helped form many of the tenants of the competency-based adult education movement which lead to Results-oriented education where educators are accountable for learners to attain exit outcomes. CASAS assesses basic skills in a functional context. CASAS assesses basic skills in a functional context.

12 CASAS National Consortium Meeting twice annually, the consortium identifies priorities for development, participates in field-testing and evaluating new assessment and curriculum management products and processes, develops training needed for implementation, and shares successful strategies and outcomes in their respective states or territories. Meeting twice annually, the consortium identifies priorities for development, participates in field-testing and evaluating new assessment and curriculum management products and processes, develops training needed for implementation, and shares successful strategies and outcomes in their respective states or territories. The consortium involvement represents a wide variety of state administered adult education programs ensuring relevant assessment and curriculum management validated across diverse populations and programs. The consortium involvement represents a wide variety of state administered adult education programs ensuring relevant assessment and curriculum management validated across diverse populations and programs.

13 Transportability and Adaptability of CASAS Implemented by education programs in all 50 states. Implemented by education programs in all 50 states. Designed to assess progress toward individual learner goals as well as monitor learning progress at the class, site, agency and state levels. Designed to assess progress toward individual learner goals as well as monitor learning progress at the class, site, agency and state levels. Provides articulation among program levels and a uniform method for reporting learning progress. Provides articulation among program levels and a uniform method for reporting learning progress. Designed to accommodate new populations being served by adult and alternative education programs and has responded to the assessment needs of state and national initiatives. Designed to accommodate new populations being served by adult and alternative education programs and has responded to the assessment needs of state and national initiatives. Implementation has resulted in better program accountability at local, state, and national levels. Implementation has resulted in better program accountability at local, state, and national levels.

14 National Validation of CASAS Reviewed and Approved by the Federal Joint Dissemination and Review Panel (USDOE) in 1984 Reviewed and Approved by the Federal Joint Dissemination and Review Panel (USDOE) in 1984 The three claims of CASS implementation were upheld by the Program Effectiveness Panel (PEP) USDOE in 1993 The three claims of CASS implementation were upheld by the Program Effectiveness Panel (PEP) USDOE in 1993 Three Claims Supported with the Adoption of CASAS Three Claims Supported with the Adoption of CASAS Learning Gains Learning Gains Student Persistence Student Persistence Goal Attainment Goal Attainment

15 General Information About CASAS Item Banks and Tests The CASAS multiple-choice item banks, covering the domains of reading, mathematics, and listening in functional contexts for adults and youth provide test items that have undergone rigorous psychometric evaluations using both Classical and Modern Test Theory. The CASAS multiple-choice item banks, covering the domains of reading, mathematics, and listening in functional contexts for adults and youth provide test items that have undergone rigorous psychometric evaluations using both Classical and Modern Test Theory. The psychometric methodology used to establish this difficulty level comes from the Single Parameter or Rasch model of Item Response Theory (IRT) where each test item is assigned a scaled difficulty level on a common scale. The psychometric methodology used to establish this difficulty level comes from the Single Parameter or Rasch model of Item Response Theory (IRT) where each test item is assigned a scaled difficulty level on a common scale.

16 General Information (cont) Information on Specific Tests and Test Series Information specific to individual CASAS assessment is provided in the Test Administration Manuals that accompany each test and test series. Information specific to individual CASAS assessment is provided in the Test Administration Manuals that accompany each test and test series. These test administration manuals contain detailed information regarding test administration, testing accommodations, scoring, and interpretation of results. These test administration manuals contain detailed information regarding test administration, testing accommodations, scoring, and interpretation of results. Information is also available about each test that describes its targeted population and the programs that use the particular assessment. Information is also available about each test that describes its targeted population and the programs that use the particular assessment.

17 Chapter 2. Development of the CASAS Item Bank s Item Taxonomy Item Taxonomy Item Development Item Development Psychometric Theory Psychometric Theory Item Banks Development Item Banks Development

18 Item Taxonomy Each item is identified by: Each item is identified by: Item Description Item Description Item Code Item Code Content Area Content Area Competency Area Competency Area Competency Statement Competency Statement Task Type Task Type Difficulty Level Difficulty Level

19 Test Item Development Selection and Training of Initial Item Writers Selection and Training of Initial Item Writers Item Pilot Testing and Clinical Try-outs Item Pilot Testing and Clinical Try-outs Item Field-Testing Item Field-Testing

20 Psychometric Theory The psychometric methodology used to establish this difficulty level comes from the Single Parameter or Rasch model of Item Response Theory (IRT) where each test item is assigned a scaled difficulty level on a common scale. A given scaled difficulty of an item (e.g. 200) is such that a person having the same ability level as that item will have a 50% chance of having a correct response to that item. A given scaled difficulty of an item (e.g. 200) is such that a person having the same ability level as that item will have a 50% chance of having a correct response to that item. To have a greater probability of passing the item of that difficulty (e.g.80+%), the person would have to have an ability level a standard deviation above that score. To have a greater probability of passing the item of that difficulty (e.g.80+%), the person would have to have an ability level a standard deviation above that score. CASAS chose to integrate the use of both methodologies --IRT and classical item analysis. CASAS chose to integrate the use of both methodologies --IRT and classical item analysis.

21 Psychometric Theory (cont) Item Response Theory offers many advantages over traditional testing methodology. There are three primary advantages: (1) given a large pool of items all measuring the same trait, an estimate of the examinee’s ability is independent of the sample of items used to derive that estimate; (1) given a large pool of items all measuring the same trait, an estimate of the examinee’s ability is independent of the sample of items used to derive that estimate; (2) given a large population of examinees, an item’s difficulty estimate is independent of the sample of examinees used to derive that estimate; and (2) given a large population of examinees, an item’s difficulty estimate is independent of the sample of examinees used to derive that estimate; and (3) a statistic is provided for each examinee indicating the precision of the ability estimate (Hambleton and Swaminathan, 1985). (3) a statistic is provided for each examinee indicating the precision of the ability estimate (Hambleton and Swaminathan, 1985).

22 Psychometric Theory (cont) IRT defines the relationship between an underlying trait or ability being measured and observable test performance in terms of a mathematical model--a logistical ogive curve. In the most general case, an examinee’s ability is a function of the item difficulty, the item difficulty, the item’s discrimination, and the item’s discrimination, and the probability of correctly answering an item by chance. the probability of correctly answering an item by chance.

23 Figure 2.7 Item Response Curve from Best Test Design, p. 14 (Used with permission from author)

24 Psychometric Theory (cont) Score Comparability The characteristic of the Rasch and other IRT models, which makes them appropriate for item banking, is that they: separate a difficulty calibration for an item from the ability of the group taking the item. This makes it possible to do vertical equating of items. separate a difficulty calibration for an item from the ability of the group taking the item. This makes it possible to do vertical equating of items. This also makes it possible to break the items free from the test in which they appear and relate them to a more general curriculum-based scale. This also makes it possible to break the items free from the test in which they appear and relate them to a more general curriculum-based scale. Thus, allowing for the measurement of growth between the administration of two different sets of items to the same examinee over a specified time period. Thus, allowing for the measurement of growth between the administration of two different sets of items to the same examinee over a specified time period.

25 Item Bank Development ITEM BANK CALIBRATION One major task in building and maintaining an item bank is to place all the items in a given learning domain onto a common scale which involves calibrating the level of difficulty of each item. An item bank can be developed by computing the item difficulty estimates from all of the examinees’ responses to all items. An item bank can be developed by computing the item difficulty estimates from all of the examinees’ responses to all items. However, establishing an item bank typically requires many more items than can be given in one test or far more than a single examinee can be realistically expected to answer. However, establishing an item bank typically requires many more items than can be given in one test or far more than a single examinee can be realistically expected to answer. CASAS chose to develop calibration forms having similar content and a range of difficulty judged by instructors, expert in teaching that domain, to be appropriate to students participating in the calibration study. CASAS chose to develop calibration forms having similar content and a range of difficulty judged by instructors, expert in teaching that domain, to be appropriate to students participating in the calibration study. On all initial forms, more than 95 percent of test examinees responded to all items. On all initial forms, more than 95 percent of test examinees responded to all items.

26 Initial Calibration of Forms CASAS conducted the initial calibration of items in the fall of 1980 based on ten test forms. All forms contained basic life skills items measured in a functional life skills context. Since math in a functional context requires the ability to read, these items were included on the reading scale. A total of 4,115 students enrolled in adult basic education programs, including high school completion, participated in this initial item calibration of 422 items.

27 Item Linking Procedures In order to place all items on a single scale: a set of common items, or linking items, was embedded among forms. a set of common items, or linking items, was embedded among forms. One calibration form was chosen as the “anchor” test to which all other tests were directly linked to establish the common scale. One calibration form was chosen as the “anchor” test to which all other tests were directly linked to establish the common scale. The choice of an anchor form was made following an earlier decision to focus on the development and selection of life skills competencies appropriate to a mid-range achievement level, intermediate ABE and ESL. This population was chosen because it had more experience in the classroom and with taking tests and was judged to be broadly representative of adult learners in general. The choice of an anchor form was made following an earlier decision to focus on the development and selection of life skills competencies appropriate to a mid-range achievement level, intermediate ABE and ESL. This population was chosen because it had more experience in the classroom and with taking tests and was judged to be broadly representative of adult learners in general. The anchor form was also designed so that these learners would successfully respond to more than 50 percent of the items. The anchor form was also designed so that these learners would successfully respond to more than 50 percent of the items. The linking items on the anchor form were used to adjust the difficulties of non-linking items on each of the other tests in the anchor series. The linking items on the anchor form were used to adjust the difficulties of non-linking items on each of the other tests in the anchor series. The anchor series of forms included also beginning and advanced levels of ABE and ESL. The anchor series of forms included also beginning and advanced levels of ABE and ESL.

28 Calibration of Forms The actual calibration of items included: only those item response sets for students who had responded correctly to more than 20 percent and fewer than 90 percent of the items on the field test form. only those item response sets for students who had responded correctly to more than 20 percent and fewer than 90 percent of the items on the field test form. The exclusion of student responses for this lower success range minimized the influence of including results for those who may have been guessing. The exclusion of student responses for this lower success range minimized the influence of including results for those who may have been guessing. One additional restriction eliminated results for students who did not have at least one correct answer on the last half of the test. One additional restriction eliminated results for students who did not have at least one correct answer on the last half of the test. 863 student item response sets were then included for the anchor form. 863 student item response sets were then included for the anchor form. The remaining nine forms all met the minimum requirement of having at least 300 examinees respond to each item. The remaining nine forms all met the minimum requirement of having at least 300 examinees respond to each item. In addition to individual item responses on these item calibration forms, demographic and program descriptor information (including age, sex, ethnicity, primary language, number of years of school completed and program level enrollment) was collected for all students in the initial item calibrations. In addition to individual item responses on these item calibration forms, demographic and program descriptor information (including age, sex, ethnicity, primary language, number of years of school completed and program level enrollment) was collected for all students in the initial item calibrations. In the spring of 1981, 16 additional item calibration forms were administered to 4,606 students enrolled in Adult Basic Education, English as a Second Language, and high school completion programs. Items from the fall 1980 item calibrations were included in these forms to serve as linking items for the item calibration process. Items from these two administrations were extensively analyzed and those that met the assumptions of the Rasch model were then included in the initial CASAS item bank (See chapter 3 for detailed psychometric data on the initial calibration forms). In the spring of 1981, 16 additional item calibration forms were administered to 4,606 students enrolled in Adult Basic Education, English as a Second Language, and high school completion programs. Items from the fall 1980 item calibrations were included in these forms to serve as linking items for the item calibration process. Items from these two administrations were extensively analyzed and those that met the assumptions of the Rasch model were then included in the initial CASAS item bank (See chapter 3 for detailed psychometric data on the initial calibration forms).

29 Calibration of Forms (cont.) All initial field tests were carefully reviewed for model fit, ability fit, and item bias All initial field tests were carefully reviewed for model fit, ability fit, and item bias Both Item Characteristic Curves (ICCs) and Test Characteristic Curves were (TCCs) were generated for a visual inspection of model and ability fit and item bias including: Both Item Characteristic Curves (ICCs) and Test Characteristic Curves were (TCCs) were generated for a visual inspection of model and ability fit and item bias including: Gender bias Gender bias Ethnic bias Ethnic bias Language bias Language bias

30 Chapter 3. Validity & Psychometric Properties of the CASAS Item Banks This chapter contains evidence relating to the validity and psychometric properties of the overall CASAS Item Banks. This chapter contains evidence relating to the validity and psychometric properties of the overall CASAS Item Banks. Later chapters and periodic supplements will present evidence relating to the validity of specific instruments constructed from the Item Banks. Later chapters and periodic supplements will present evidence relating to the validity of specific instruments constructed from the Item Banks.

31 Introduction to Validity The Standards for Educational and Psychological Testing (1999) state that validity refers to: the appropriateness, the appropriateness, meaningfulness, and meaningfulness, and usefulness of the specific inferences made from test scores. usefulness of the specific inferences made from test scores. There are various evidences of validity, with construct validity encompassing the overriding issue of proper utilization and construction of test items, and construct validity encompassing the overriding issue of proper utilization and construction of test items, and content-related and criterion-related validity as sub- components. content-related and criterion-related validity as sub- components. an ideal validation includes several types of evidence an ideal validation includes several types of evidence

32 Psychometric Properties Content Validity Content Validity Competency Validation Studies Competency Validation Studies Unidimensionality of Item Banks Unidimensionality of Item Banks Parameter Invariance Parameter Invariance Differential Item Functioning Differential Item Functioning Criterion-Related Validity Criterion-Related Validity

33 Content Validity--Evidence for Consensus Definitions and Alignments Across States of Trait Being Measured Over Time Consensus Definitions and Alignments Across States of Trait Being Measured Over Time Use of Psychometrically Sound Item Development Procedures Use of Psychometrically Sound Item Development Procedures Unidimensionality of the Item Banks Unidimensionality of the Item Banks

34 Competency Validation Studies A number of recent studies have been conducted throughout the United States to reaffirm that the content and competencies addressed in the initial development of the item bank are still valid and relevant to current needs of learners States have undertaken the task of identifying critical skill needs as defined by stakeholder groups. 1. Iowa--The Iowa Adult Basic Skills Survey: Final Report (IABSS), 2. Indiana--Validation of Foundation Skills 3. Connecticut--Targeting Education: The Connecticut Adult Basic Skills Survey 4. California--CABSS Report: California Adult Basic Skills Survey 5. SCANS and CASAS Competencies Relationship Study

35 Content Validity 1 st Step The first step in analyzing the content validity of an item is in ensuring that that item measures the skill or competency it is charged with measuring. All CASAS test items have gone through a rigorous process to ensure that the competency or skill being assessed by a particular item does in fact measure what it was intended to measure.

36 Content Validity 2 nd Step The second step in content validity analysis is ensuring that the skills and competencies addressed through an item are in fact aligned with the target goals of the instrument. Tests developed from the CASAS item banks are based on specified life skill competency statements and assess a learner's proficiency in performing specified tasks involving solving life skill problems, or applying basic reading and math skills. Tests developed from the CASAS item banks are based on specified life skill competency statements and assess a learner's proficiency in performing specified tasks involving solving life skill problems, or applying basic reading and math skills. All CASAS competencies have been identified as priority competencies by field practitioners based on learner and program goals. All CASAS competencies have been identified as priority competencies by field practitioners based on learner and program goals.

37 Content Validity—Item Development Facets of item development include: content and contextual relevance, content and contextual relevance, item appropriateness, item appropriateness, wording, wording, effective distractor usage, and effective distractor usage, and proper content weighting. proper content weighting. It is important to ensure that the CASAS item banks have no misalignment between measurement content and instruction and that the items and test instruments constructed from the banks accurately measure all skills necessary to certify a certain range or level of skill.

38 Content Validity—Item Writing Trained item writers develop items from item specifications. Trained item writers develop items from item specifications. All items received at least two initial reviews and were evaluated again after revisions and refinements were completed for field-testing. All items received at least two initial reviews and were evaluated again after revisions and refinements were completed for field-testing. Expert reviewers verified that each item matched its competency and specific objective and Expert reviewers verified that each item matched its competency and specific objective and checked for non-triviality, checked for non-triviality, factual accuracy, and factual accuracy, and possible ethnic or gender bias. possible ethnic or gender bias.

39 Content Validity—Clinical Try-Out Before actual field-testing, items were given a clinical try-out in a limited number of classroom situations appropriate to the item content and perceived item difficulty level. This process identified any obvious weaknesses or ambiguities in the item. Students in the clinical try-out took the test and then were questioned regarding how and why they made their responses to the individual items. Students in the clinical try-out took the test and then were questioned regarding how and why they made their responses to the individual items. Teachers were also asked to identify any items that deviated from their curriculum. Teachers were also asked to identify any items that deviated from their curriculum. Items were then revised based on the clinical try out and readied for inclusion on item field test forms. Items were then revised based on the clinical try out and readied for inclusion on item field test forms.

40 Content Validity--Item Calibration Following the field test of the items: classical item statistics were reviewed including each item’s classical item statistics were reviewed including each item’s p-value, p-value, point biserial coefficient, and point biserial coefficient, and discrimination index, discrimination index, items were calibrated using the one-parameter model items were calibrated using the one-parameter model

41 Item Calibration During the calibration process, all items were examined with respect to two mean-square residual summary statistics, two mean-square residual summary statistics, infit and infit and outfit. outfit. Although no hard-and-fast rules were utilized to identify misfitting items, those items with either infit or outfit values less than.7 or greater than 1.3 were reviewed and eliminated if not essential to the measurement of the competency statement. Although no hard-and-fast rules were utilized to identify misfitting items, those items with either infit or outfit values less than.7 or greater than 1.3 were reviewed and eliminated if not essential to the measurement of the competency statement.

42 Unidimensionality of the Item Banks Fundamental to all IRT models is the notion of unidimensionality — that is, test performance can be defined in terms of a single latent trait. The assumption is that the items in a test are homogenous and are measuring a single trait.

43 Unidimensionality of the Item Banks Research in 2002--based on several of the procedures used by Soireci, Rogers, Swaminathan, Meara, and Robin (2000) Determine the degree to which the CASAS item bank, in the areas of reading and math, may be considered unidimensional with a single underlying latent variable, Functional Adult Life Skills, or should it, in fact, be considered as having two underlying latent variables, reading and math. Determine the degree to which the CASAS item bank, in the areas of reading and math, may be considered unidimensional with a single underlying latent variable, Functional Adult Life Skills, or should it, in fact, be considered as having two underlying latent variables, reading and math. Examine the dimensionality of the listening tests and their underlying structure. Examine the dimensionality of the listening tests and their underlying structure.

44 Unidimensionality of the Item Banks Raw Score Correlational Analysis Raw Score Correlational Analysis Principal Components with Tetrachoric Coefficients Principal Components with Tetrachoric Coefficients Confirmatory Factor Analyses Confirmatory Factor Analyses Goodness of Fit Index (GFI) Goodness of Fit Index (GFI) Adjusted Goodness of Fit Index (AGFI) Adjusted Goodness of Fit Index (AGFI) Root Mean Square Error (RMSE) Root Mean Square Error (RMSE) Root Mean Square Error of Approximation (RMSEA) Root Mean Square Error of Approximation (RMSEA)

45 Raw Score Correlational Analysis Of the eleven correlations between the math and reading raw scores, only one was below.50 and four were above.60. The correlations were not disattenuated due to the magnitude of both the math and reading and total combined alphas. The median correlation was.59, which would indicate that the math and reading adult life skill items were not measuring the same construct.

46 Principal Components Analysis Data from the 11 combined math and reading booklets were analyzed using principal components analysis. Each set of items was composed of both math and reading items in an adult life skills context. The math and reading items were also independently analyzed. The sample sizes for each booklet ranged from a low of 261 to a high of 11,138. Data from the five listening booklets were also subjected to a principal components analysis. Data from the 11 combined math and reading booklets were analyzed using principal components analysis. Each set of items was composed of both math and reading items in an adult life skills context. The math and reading items were also independently analyzed. The sample sizes for each booklet ranged from a low of 261 to a high of 11,138. Data from the five listening booklets were also subjected to a principal components analysis. When analyzed separately, the 1st eigenvalues for reading and math accounted for more of the total variance than when math and reading were combined. When analyzed separately, the 1st eigenvalues for reading and math accounted for more of the total variance than when math and reading were combined.

47 Factor Analysis In the case of the one-factor model, the single factor was “adult life skills problem solving.” In the case of the one-factor model, the single factor was “adult life skills problem solving.” In the two factor model, the factors were reading and math in an “adult life skills problem solving” context. In the two factor model, the factors were reading and math in an “adult life skills problem solving” context. All model-data fit results are based on the product-moment correlation. All model-data fit results are based on the product-moment correlation.

48 Principal Components Analysis (Form 30—Beginning Level Test) Form 30 – Reading and Math Combined. The first principal component (eigenvalue = 18.02) accounted for 45 percent of the total variance with the second (eigenvalue = 1.76) accounting for 4 percent. All items had loadings greater than.40 on the first component. Form 30 – Reading and Math Combined. The first principal component (eigenvalue = 18.02) accounted for 45 percent of the total variance with the second (eigenvalue = 1.76) accounting for 4 percent. All items had loadings greater than.40 on the first component. Form 30 – Reading. The first principal component (eigenvalue = 10.24) accounted for 51 percent of the total variance with the second (eigenvalue =.94) accounting for 5 percent. Form 30 – Reading. The first principal component (eigenvalue = 10.24) accounted for 51 percent of the total variance with the second (eigenvalue =.94) accounting for 5 percent. Form 30 – Math. The first principal component (eigenvalue = 9.33) accounted for 47 percent of the total variance with the second (eigenvalue = 1.19) accounting for 6 percent. Form 30 – Math. The first principal component (eigenvalue = 9.33) accounted for 47 percent of the total variance with the second (eigenvalue = 1.19) accounting for 6 percent. The three scree plots for Form 30 are presented in Figure 3.1.

49 Scree Plots for a Beginning Level Test—Combined and Separate

50 Principal Components Analysis (Form 35—Advanced Level Test) Form 35. Reading and Math Combined. The first principal component (eigenvalue = 12.35) accounted for 17 percent of the total variance with the second (eigenvalue = 3.62) accounting for five percent of the variance. Twenty-seven of the 38 reading items and 13 of the 35 math items had loadings greater than.40 on the first component. Of the remaining 11 reading items, six had loadings above.30 on the first component. Of the 22 remaining math items, 13 had loadings above.30 on the first component. Form 35. Reading and Math Combined. The first principal component (eigenvalue = 12.35) accounted for 17 percent of the total variance with the second (eigenvalue = 3.62) accounting for five percent of the variance. Twenty-seven of the 38 reading items and 13 of the 35 math items had loadings greater than.40 on the first component. Of the remaining 11 reading items, six had loadings above.30 on the first component. Of the 22 remaining math items, 13 had loadings above.30 on the first component. Form 35 – Reading. The first principal component (eigenvalue = 8.69) accounted for 23 percent of the total variance with the second (eigenvalue = 1.62) accounting for 4 percent. Form 35 – Reading. The first principal component (eigenvalue = 8.69) accounted for 23 percent of the total variance with the second (eigenvalue = 1.62) accounting for 4 percent. Form 35 – Math. The first principal component (eigenvalue = 8.88) accounted for 20 percent of the total variance with the second (eigenvalue = 1.78) accounting for 5 percent. Form 35 – Math. The first principal component (eigenvalue = 8.88) accounted for 20 percent of the total variance with the second (eigenvalue = 1.78) accounting for 5 percent. The three scree plots for Form 35 are presented in Figure 3.4.

51 Scree Plots for a Advanced Level Test—Combined and Separate

52 Confirmatory Factor Analyses--Description of Four Fit Statistics The first three fit statistics are overall model fit indices and the fourth is a comparative fit measures which compares the proposed model to a null model. The first three fit statistics are overall model fit indices and the fourth is a comparative fit measures which compares the proposed model to a null model. The Goodness of Fit Index (GFI) is a measure of the proportion of variance and covariance that the hypothesized model is able to explain The Goodness of Fit Index (GFI) is a measure of the proportion of variance and covariance that the hypothesized model is able to explain the Adjusted Goodness of Fit Index (AGFI) considers the degrees of freedom in computing the measure. the Adjusted Goodness of Fit Index (AGFI) considers the degrees of freedom in computing the measure. The range of the GFI and AGFI is from 0 (poor fit) to 1 (perfect fit). A recommended minimum value for GFI is 0.90 and 0.80 for AGFI (Segars and Grover, 1993). The range of the GFI and AGFI is from 0 (poor fit) to 1 (perfect fit). A recommended minimum value for GFI is 0.90 and 0.80 for AGFI (Segars and Grover, 1993). The Root Mean Square Residual (RMR) is an average of the residuals between observed and estimated input matrices. The Root Mean Square Residual (RMR) is an average of the residuals between observed and estimated input matrices. The smaller the value of RMR, the better the fit. The smaller the value of RMR, the better the fit. The maximum recommended value for RME is 0.10 (Chau, 1997). The maximum recommended value for RME is 0.10 (Chau, 1997). The Root Mean Square Error of Approximation (RMSEA) is a comparative fit measure that reflects the extent that the proposed model does not fit the data. The Root Mean Square Error of Approximation (RMSEA) is a comparative fit measure that reflects the extent that the proposed model does not fit the data. A value less than.05 suggests that the model is a reasonable approximation to the data (Browne and Cudeck, 1993). A value less than.05 suggests that the model is a reasonable approximation to the data (Browne and Cudeck, 1993).

53 Summary of Confirmatory Factor Analyses Results Life Skills Series One Factor ModelTwo-Factor Model FormGFIAGFIRMRRMSEAGFIAGFIRMRRMSEA Reading and Math 30.96.95.05.03.97.04.02 31.79.77.11.07.93.92.07.04 33.54.50.09.12.73.71.06.08 35.58.56.07.10.78.77.05.06 37.62.60.07.09.77.76.05.06 Listening 51.77.73.05.09 53.88.87.04.07 55.93.92.04.06

54 Summary of Confirmatory Factor Analyses Results Employment Competency Series One Factor ModelTwo-Factor Model FormGFIAGFIRMRRMSEAGFIAGFIRMRRMSEA Reading and Math 120.88.86.10.06.91.90.08.05 130.57.53.08.13.77.75.05.08 11.76.74.12.08.86.89.08.09 13.45.41.11.14.68.66.07.09 15.58.55.07.10.81.80.05.06 17.86.85.09.05.90.08.04 Listening 51.77.73.05.09 63.78.75.08.10 65.79.75.07.11

55 Confirmatory Factor Analyses-Results The two-factor model (reading and math) for both the Life Skills and ECS Series provided a better fit than did the one-factor model without regard to the four fit statistics.

56 Parameter Invariance The correlations between the CASAS bank item difficulties for math form 15 was.79 for the CCC group and.81 for other youth. Similarly, for Form 15 reading, the correlations were.76 and.81. The correlations between the CASAS item bank difficulties for math form 35 was.84 for the CYA group and.87 for other youth. Similarly, for Form 35 reading, the correlations were.85 and.89 respectively. All inter- correlations are found in Tables 3.9 and 3.10. The correlations between the CASAS bank item difficulties for math form 15 was.79 for the CCC group and.81 for other youth. Similarly, for Form 15 reading, the correlations were.76 and.81. The correlations between the CASAS item bank difficulties for math form 35 was.84 for the CYA group and.87 for other youth. Similarly, for Form 35 reading, the correlations were.85 and.89 respectively. All inter- correlations are found in Tables 3.9 and 3.10. These results would indicate a fair degree of temporal stability with respect to the item difficulties during the past 20 years. These results would indicate a fair degree of temporal stability with respect to the item difficulties during the past 20 years.

57 Table 3.9Inter-Correlations Between Bank Difficulties and Difficulties Generated from CYA on Form 35 Math and Form 35 Reading MathReading BankCYA Other Youth BankCYA Bank1.00.84.87Bank1.00.85.89 CYA1.00.98CYA1.00.97 1.00 1.00 Table 3.10 Inter-Correlations Between Bank Difficulties and Difficulties Generated from CCC and on Form 15 Math and Form 15 ReadingMathReadingBankCCC Other Youth BankCCC Bank1.00.79.81Bank1.00.76.81 CYA1.00.98CYA1.00.98 1.00 1.00

58 Bank Difficulties and Difficulties Generated by Examinees for Both the Life Skills Series and ECS Life Skills Form31333537 N6691,3607,0622,059 ReadingItems24323840 Correlation.88.87.87.51 MathItems24303536 Correlation.84.84.88.85 Employability Competency System Form11131517 N2619191,969613 ReadingItems25343830 Correlation.93.66.79.74 MathItems24313132 Correlation.74.66.78.86

59 Differential Item Functioning-- Description The Delta value indicates the average amount by which examinees in a focal group found an item more difficult than did a reference group. Positive values on this scale indicate that the item favors the focal group, that is, an item with a positive value is differentially easier for the focal group. Similarly, an item with a negative Delta differentially favors the reference group. The Delta value indicates the average amount by which examinees in a focal group found an item more difficult than did a reference group. Positive values on this scale indicate that the item favors the focal group, that is, an item with a positive value is differentially easier for the focal group. Similarly, an item with a negative Delta differentially favors the reference group. Items having a Delta less than an absolute value of 1.0 are used as needed in order to meet the content requirements of the test. Items having a Delta value between 1.0 and 1.5 are subjected to review by content specialists. Items having a Delta greater than 1.5 are only used in a test if no other item from the required domain has a lower value and the item content is deemed critical to the assessment. Items having a Delta less than an absolute value of 1.0 are used as needed in order to meet the content requirements of the test. Items having a Delta value between 1.0 and 1.5 are subjected to review by content specialists. Items having a Delta greater than 1.5 are only used in a test if no other item from the required domain has a lower value and the item content is deemed critical to the assessment.

60 Differential Item Functioning The results of the analyses show a general lack of ethnic or gender bias by a vast majority of the items used in tests examined. However, some items showed bias and would require further examination and review.

61 Summary of Mantel-Haenszel Analysis for Gender Indicating the Number of Items by Form within Delta Difference Ranges For the Life Skills Series Delta Difference Range Form Total Number of Items Test Items with Absolute Value Less than 1.0 Test Items with Absolute Value Between 1.0 and 1.5 Test Items with Absolute Value Greater than 1.5 and Less than 2.0 Reading 28303000 31242040 33323110 35383701 37404000 Math 31241743 33302080 35352663 37362925 Listening 51343400 53302910 55252410

62 Mantel-Haenszel Analysis for Ethnicity (Anglo - Hispanic) Indicating the Number of Items by Form within Delta Difference Ranges Life Skills Series Delta Difference Range Form Total Number of Items Test Items with Absolute Value Less than 1.0 Test Items with Absolute Value Between 1.0 and 1.5 Test Items with Absolute Value Greater than 1.5 and Less than 2.0 Reading 3124 2112 3332 2921 3538 3530 3740 3352 Math 3124 1402 3330 2351 3535 2462 3736 3510 Listening 5134 *** 5330 2251 5525 1571 *insufficient number for analysis

63 Criterion-Related Validity CASAS Skill Level Descriptors and Historical Background CASAS Skill Level Descriptors and Historical Background CASAS and Other National Reference Scales CASAS and Other National Reference Scales Performance on CASAS Related to GED Performance on CASAS Related to GED CASAS Related to Work Keys CASAS Related to Work Keys CASAS Related to Years of Schooling CASAS Related to Years of Schooling CASAS Related to Educational Attainment CASAS Related to Educational Attainment

64 Criterion-Related Validity Criterion-related, or predictive, validity assesses the ability or effectiveness of an instrument in predicting something it should theoretically be able to predict. Criterion-related, or predictive, validity assesses the ability or effectiveness of an instrument in predicting something it should theoretically be able to predict. The criterion for CASAS tests developed in 1980 was directed at determining a learner's appropriate level of placement into ABE, ESL and the preparatory high school curriculum. The criterion for CASAS tests developed in 1980 was directed at determining a learner's appropriate level of placement into ABE, ESL and the preparatory high school curriculum.

65 CASAS Skill Level Descriptors The CASAS Skill Level Descriptors show a continuum of skills from beginning through advanced adult secondary. They provide descriptions of adults' general job-related ability in reading, mathematics, oral communication, and writing. The Skill Level Descriptors explain in general terms what most learners can accomplish at the CASAS scale score level in a specific skill area. The CASAS Skill Level Descriptors show a continuum of skills from beginning through advanced adult secondary. They provide descriptions of adults' general job-related ability in reading, mathematics, oral communication, and writing. The Skill Level Descriptors explain in general terms what most learners can accomplish at the CASAS scale score level in a specific skill area. This scale has been verified and validated on more than three million adult and youth learners. The CASAS scale is divided into five levels: A (Beginning Literacy) to E (Advanced Secondary), each encompassing a range of scores. Each level is defined by a CASAS scale score range with corresponding competency descriptors of performance in employment and adult life skills contexts. This scale has been verified and validated on more than three million adult and youth learners. The CASAS scale is divided into five levels: A (Beginning Literacy) to E (Advanced Secondary), each encompassing a range of scores. Each level is defined by a CASAS scale score range with corresponding competency descriptors of performance in employment and adult life skills contexts.

66 Relationship Between CASAS and Mainstreat English Language Training (MELT) CASASMELTPossible Program ScoresLevelPlacementMELT Description 165-180IESL Pre-Literate Orientation Functions minimally, if at all, in English. 181-190IIESL Beginning ( Level 1) Functions in a very limited way in situations related to immediate need. 191-200IIIESL Beginning (Level 2) Functions with some difficulty in situations related to immediate needs. 201-208IVESL Intermediate (Level 1) Can satisfy basic survival needs and a few very routine social demands. 209-215VESL Intermediate (Level 2) Can satisfy basic survival needs and some limited social demands. 216-224VIESL Advanced (Level 1) Can satisfy most survival needs and limited social demands. 225+VIIESL Advanced (Level 2) Can satisfy survival needs and routine work and social demands.

67 Student Performance Level (SPL) SPL descriptions along with the CASAS achievement scale provide a sound basis for articulating instructional program levels. The relationship among the SPLs, the literacy sections of BEST, and the CASAS reading tests are shown in the following table.

68 SPL Levels BEST scores CASAS Scores 0 0 - 2 < 165 1 3 - 7 165 - 185 2 8 -21 186 - 190 3 22 - 35 191 - 200 4 36 - 46 201 - 208 5 47 - 53 209 - 216 6 54 - 65 217 - 223 7 > 66 224 - 231 Relationship Among SPL Levels, BEST Scores, and CASAS Scores

69 Relationship Among CASAS, NRS *, NALS**, Work Keys, SPL***, and Years of School Completed CASAS Levels CASAS Score Ranges NRS Levels and Names for ABE NRS Levels and Names for ESL NALS Levels SPL Levels Work Keys Levels Years of School Completed A 180 and below 1Beginning ESL Literacy 11 Below 3 1 to 2 A 181 – 200 1Beginning ABE Literacy 2Beginning ESL 1 2 and 3 Below 3 1 to 2 B 201 – 210 2Beginning Basic Education 3Low Intermediate ESL 14 Below 3 3 to 5 B 211 – 220 3Low Intermediate Basic Education 4High Intermediate ESL 15 Below 3 6 to 7 C 221 – 235 4High Intermediate Basic Education 5Low Advanced ESL 263 8 to 10 D 236 – 245 5Low Adult Secondary Education 6High Advanced ESL 2/374 11 to 12 E 246 and above 6High Adult Secondary Education 38413+ *National Reporting System (WIA Title II) **National Adult Literacy Survey ***Student Performance Levels

70 Performance on CASAS Related to GED A clear monotonic increasing relationship was still found between CASAS reading scores and GED reading scores and CASAS math scores and GED math scores. Also, a similar relationship was found between CASAS reading scores and overall GED results averaged across the five test content areas.

71 CASAS Reading Mean Test Scores Associated with GED Reading Score Ranges GED Score Range CASAS Reading Test Mean N CASAS S.D. Less than 400 2345309.90 401-4252372828.93 426-44923838010.76 450-4762396719.54 477-49424052310.64 495-51024240310.14 511-52424324711.09 525-5402442479.69 541-5562442119.33 557-5762441018.39 577-6002443089.82 601-6382462639.77 639+2476359.79

72 CASAS Math Mean Test Scores Associated with GED Math Score Ranges GED Score Range CASAS Math Test Mean N CASAS S.D. Less than 400 2261688.43 401-4252281098.11 426-4492302237.98 450-4762324009.67 477-4942332828.12 495-5102353178.51 511-5242361857.13 525-5402372797.34 541-5562391227.59 557-576239729.08 577-6002401177.68 601-638240948.39 639+2431757.93

73 CASAS Reading Mean Test Scores Associated with GED Total Score Ranges GED Score Range CASAS Math Test Mean NSD Less than 425 2312059.40 426-4492352648.83 450-4762374499.35 477-4942394039.01 495-5402423229.23 511-52424328610.36 525-54024430410.01 541-55624527610.03 557-5762462779.91 577-60024723310.51 601-6382482479.51 639+25019710.93

74 CASAS to Work Keys The data show that as a learner’s scores on the CASAS reading and mathematical scales increased, the ACT Work Keys Level tended to increase as well. The Pearson correlation coefficient between the two measures was.71 for reading and.70 for mathematics.

75 CASAS to Years of Schooling and Degree Two small scale studies were conducted in Iowa One comparing CASAS Reading Scores with highest grade completed, and One comparing CASAS Reading Scores with highest grade completed, and The other compared both CASAS Reading and Math Scores with highest degree completed. The other compared both CASAS Reading and Math Scores with highest degree completed. Comparable data has been observed over the years when comparing CASAS Reading and/or Math Scores with number of years prior schooling

76 Iowa Population Mean Scale Scores by Highest Grade Completed Highest Grade Completed Number%Reading 8 or less9712229 * 910713233 ± 1011414235 ± 1111815237 * 1228835241 * 13+8611245 * Statistically significant different from subsequent level at the.05 level. ± Statistically significant difference from the second subsequent level at the.05 level

77 Iowa Population Mean Scale Scores by Highest Degree Earned Highest Degree Completed CASAS Number % of sampleReadingMath None38048232219 High School23930240226 GED12115243228 Vocational/ Technical213246233 AA/AS131248234

78 CASAS to Years of Schooling and Degree The results demonstrate that while CASAS scale scores are not precise equivalents for grade levels completed, there is a clear correlation between the two, and that CASAS scale scores in reading on the ECS Series do translate to higher grade levels completed.

79 Chapters 4 & 5 Life Skills and Employability Test Series CASAS tests assess learner attainment of a range of specific competencies presented in functional contexts. Tests can be used both to check proficiency in skill areas and to measure learning progress. Tests can be used both to check proficiency in skill areas and to measure learning progress. There are two main series of pretests and post-tests designed to monitor learning progress — the Life Skills Series and the Employability Series, which differ largely in content focus. There are two main series of pretests and post-tests designed to monitor learning progress — the Life Skills Series and the Employability Series, which differ largely in content focus. The Life Skills Series covers a wide range of content areas, including employment. The Life Skills Series covers a wide range of content areas, including employment. The Employability Series contains primarily employment-related content. The Employability Series contains primarily employment-related content. Both series include reading and math tests for both native English speakers and ESL learners. Both series include reading and math tests for both native English speakers and ESL learners.

80 Appraisal Tests Appraisals are used as an initial assessment to get a general idea of a learner’s reading, math, or listening comprehension skills. These test results guide placement into the appropriate instructional level and identify the appropriate progress test level. These test results guide placement into the appropriate instructional level and identify the appropriate progress test level. Test items span a wider range of difficulty ranging from 180- 190 at the lower end of the scale to the 240s at the upper end than do the pre- and post-tests. Test items span a wider range of difficulty ranging from 180- 190 at the lower end of the scale to the 240s at the upper end than do the pre- and post-tests. The appraisals, either the Life Skills Appraisal or the ECS Appraisal, assess reading comprehension and math. The appraisals, either the Life Skills Appraisal or the ECS Appraisal, assess reading comprehension and math. Listening comprehension of ESL learners may be tested with the ESL Appraisal. Listening comprehension of ESL learners may be tested with the ESL Appraisal.

81 Use of Series Either test series can be used in a pre- and post-test design to provide standardized information about learning gains. The progress-testing model may be articulated by following these temporal steps: (1) Place {Appraisal}, (1) Place {Appraisal}, (2) Pretest {establish baseline skill levels to begin instruction}, (2) Pretest {establish baseline skill levels to begin instruction}, (3) Instruct {Ongoing informal assessment and instruction}, (3) Instruct {Ongoing informal assessment and instruction}, (4) Monitor {post-test}, (4) Monitor {post-test}, (5) Certify {certification test or a level completion test to confirm the learner’s skill level in a promotion or exit paradigm}. (5) Certify {certification test or a level completion test to confirm the learner’s skill level in a promotion or exit paradigm}.

82 Reading Tests Reading comprehension tests assess reading skills in a functional life skills context using documents, documents, signs, signs, charts, charts, forms, forms, procedures, procedures, reading passages, or reading passages, or other realistic presentations. other realistic presentations. Depending on the level of difficulty of the item, examinees are to scan, scan, locate detail, locate detail, interpret, interpret, analyze, or analyze, or evaluate the selection to answer questions. evaluate the selection to answer questions. There is no time limit for the tests, but most students finish within one hour.

83 Math Tests Items on the math tests require the practical application of math skills. Typical items involve locating information in a chart, locating information in a chart, table, or table, or graph graph to perform a calculation. to perform a calculation. Tests also include word problems and word problems and other situational applications, as well as other situational applications, as well as some computation items. some computation items. Items range in difficulty from Items range in difficulty from locating numerical information to locating numerical information to application of formulas and application of formulas and basic algebra. basic algebra. There is no time limit for the tests, but most students finish within one hour.

84 Listening Tests Listening comprehension tests incorporate a variety of item types. Simpler items use pictures as part of the cue or as answer choices. Simpler items use pictures as part of the cue or as answer choices. Other item types include responding to a question or statement, Other item types include responding to a question or statement, identifying an equivalent statement, identifying an equivalent statement, completing a dialogue, and completing a dialogue, and interpreting information from a dialogue or statement. interpreting information from a dialogue or statement. Tests are administered via audiotape.

85 Organization of Charts and Tables Describing Both Test Series For each test series and tests within each series, Test Forms within each series identifies: test form numbers, test form numbers, test level, test level, number of test items on each test, number of test items on each test, test use, and test use, and subsequent tables identifies specific competencies measured on each test subsequent tables identifies specific competencies measured on each test

86 Descriptive statistics for each series are presented-- Reading Life Skills Descriptive Statistics for the Life Skills Reading Series Form # N # of Items Items Mean Raw ScoreStandardDeviationMeanP-Value Mean Point BiserialKR-20 2718,3693023.386.93.779.850.931 2812,2643024.516.28.817.875.925 3128,9962416.966.20.706.598.919 3217,2592415.236.49.635.594.918 32X2,4472715.846.86.587.534.902 3329,5003218.688.04.584.535.918 3427,6733219.787.70.618.519.911 34 X 27,7213522.227.05.635.614.885 3527,6053820.938.11.551.458.897 3621,1013821.057.74.554.441.886 377,3634017.229.06.431.466.908 384,8764018.529.19.463.474.912 Note: These numbers were run based on data from 1996 through 2000. The initial item analyses were run for all items with sample sizes larger than 300

87 Form #N # of Items Raw Score Mean Standard DeviationMean P-Value Mean Point BiserialKR-20 31M1772415.515.94.646.540.891 32M1202415.336.05.639.555.901 33M5673017.416.74.580.483.885 34M2693018.316.62.610.486.886 35M6763516.647.58.475.455.889 36M3723515.768.29.450.487.907 37M4423612.386.38.344.383.838 38M3823614.707.46.408.429.877 Note. These numbers were run based on data from 1996 through 2000. The initial item analyses were run for all items with sample sizes larger than 300 Descriptive statistics for each series are presented-- Math Life Skills

88 Form #N # of Items Raw Score Mean Standard Deviation Mean P- Value Mean Point BiserialKR-20 518,9153419.998.80.588.544.927 525,7163420.788.16.611.519.916 535,6553016.697.26.556.501.896 544,3243016.446.66.548.458.870 553,2732513.185.76.527.474.855 562,6162513.685.52.547.453.838 Note: The 51L and 52L tests appear in both the ECS and Life Skills series * Note: These numbers were run based on data from 1996 through 2000. Initial item analyses were run for all items with sample sizes larger than 300 Descriptive statistics for each series are presented-- Listening Life Skills

89 Form #N No. of Items Mean Raw Score Standard Deviation Mean P-Value Mean Point BiserialKR-20 111,0592516.316.95.653.618.931 121,2932516.017.64.641.659.945 131,8723420.878.96.614.569.936 141,3723417.949.57.528.589.943 153,4303823.807.58.626.444.886 161,9423823.068.93.607.501.918 171683019.376.50.646.472.878 186193016.377.41.546.510.902 * Note. These numbers were run based on data from 1996 through 2000. The initial item were run for all items with sample sizes larger than 300analyses Descriptive statistics for each series are presented-- ECS Reading

90 Form #N No. of Items Raw Score Mean Standard Deviation Mean P- Value Mean Point BiserialKR-20 113432418.125.56.755.553.899 122102416.247.02.677.651.939 138393117.098.11.551.554.925 144213113.929.82.449.650.955 151,1763121.076.41.680.479.884 163473118.427.91.594.551.921 176133214.216.30.444.559.856 185293215.966.86.499.591.877 Note. These numbers were run based on data from 1996 through 2000. Initial item analyses were run for all items with sample sizes larger than 300. Descriptive statistics for each series are presented-- ECS Math

91 Form #N# of Items Raw Score Mean Standard Deviation Mean P- Value Mean Point Biserial KR-20 518,9153419.998.80.588.544.927 525,7163420.788.16.611.519.916 631,2103016.595.60.553.395.812 641,1593017.985.76.599.410.826 659262615.274.79.587.388.773 667392615.834.37.609.360.729 Note. The 51L and 52L tests appear in both the Life Skills and ECS series Note. These numbers were run based on data from 1996 through 2000. * Initial item analyses were run for all items with sample sizes larger than 300. Descriptive statistics for each series are presented-- ECS Math

92 Raw Score Conversion Tables Tables showing the raw score, raw score, scale score, and scale score, and standard error for each test form. standard error for each test form. CASAS considers the scores between the dotted lines as the most accurate. The scores at the end of each test with black diamond (  ) markings are scale estimates above the accurate range. Test users are strongly encouraged to use scores that fall within the accurate ranges.

93 Test Information Function The shape of the Test Information Function depends on the purpose of the test. A Test Information Function, which is basically fairly flat across an ability range, would measure ability with somewhat equal precision across that range. A function of this nature would be desirable for a general pre- post type assessment over a wide range of ability. In contrast, a Test Information Function, which is peaked, would provide the most information for examinees whose abilities fall near the peak of the function. A function of this nature would be desirable for a certification test. In general, CASAS tests provide quite acceptable precision/information across the ability range that each test was designed to measure. This, coupled with CASAS’s definition of the accurate range (a S.E. equal to or less than 5.5) helps to insure ability measures which are good approximations of the examinee’s true score.

94

95

96

97 Chapter 6 and Beyond Later chapters and periodic supplements will present evidence relating to the validity of new test series and specific instruments constructed from the Item banks. Later chapters and periodic supplements will present evidence relating to the validity of new test series and specific instruments constructed from the Item banks.


Download ppt "CASAS Technical Manual 3 rd Edition Presentation at 2004 CASAS Summer Institute Drs. John Martois and Richard Stiles."

Similar presentations


Ads by Google