Standardization the properties of objective tests.

Standardization the properties of objective tests

Properties of Objective Tests There are three standards by which you can judge an objective test There are three standards by which you can judge an objective test Standardization Standardization Reliability Reliability Validity Validity

Properties of Objective Tests Standardization – scoring & use of scores does not vary across situations Standardization – scoring & use of scores does not vary across situations Reliability – scores are consistent and remain stable over time Reliability – scores are consistent and remain stable over time Validity – the test measures what it intends to measure Validity – the test measures what it intends to measure

Standardization Principles Objective Scoring Objective Scoring Directions Directions Consistency Consistency Accuracy and timeliness Accuracy and timeliness

Standardization Principles Administration Administration Appropriate conditions specified Appropriate conditions specified Materials Materials Probing / Coaching Probing / Coaching

Standardization Principles Guidelines for interpretation and use Guidelines for interpretation and use With whom? With whom? For what purpose? For what purpose? What do high and low scores mean? What do high and low scores mean?

Standardization Principles Norm tables Norm tables Based on large Based on large Representative samples Representative samples From a defined population From a defined population

Standardization Principles Specialized norm tables Specialized norm tables Subgroup differences Subgroup differences For example: age, gender, race, primary language, etc. For example: age, gender, race, primary language, etc.

Standardization Principles Raw scores and standard scores provided where appropriate Raw scores and standard scores provided where appropriate Standard scores Standard scores Percentile ranks Percentile ranks Age standardized scores Age standardized scores

Standardization Principles Technical manual Technical manual Test development process Test development process Guidelines for administration, scoring, and interpretation Guidelines for administration, scoring, and interpretation Norm tables Norm tables Meets standards for Ed. & Psych. tests Meets standards for Ed. & Psych. tests

Norm Tables Meaningful for interpretation when: Meaningful for interpretation when: Norm referenced interpretation meets the goal of the test Norm referenced interpretation meets the goal of the test Not a criterion referenced test Not a criterion referenced test

Norm Tables Meaningful for interpretation when: Meaningful for interpretation when: Relative position in a group has interpretative meaning Relative position in a group has interpretative meaning Examinee is a member of the population Examinee is a member of the population

Norm Tables Meaningful for interpretation when: Meaningful for interpretation when: The norm sample is large and representative of the population The norm sample is large and representative of the population The right norm table is used The right norm table is used

Norm Tables All those taking the test for a given administration may work as a norm sample for an admissions or personnel selection purpose All those taking the test for a given administration may work as a norm sample for an admissions or personnel selection purpose

Norm Tables However, the correct reference group varies by the purpose However, the correct reference group varies by the purpose Career counseling Career counseling Placement in the appropriate courses Placement in the appropriate courses Selection for a remedial program Selection for a remedial program

Interpreting Standard Scores Raw score is transformed into a standard score Raw score is transformed into a standard score z = (score – mean)/SD z = (score – mean)/SD z score = SDs units away from mean z score = SDs units away from mean Includes measure of middle and spread Includes measure of middle and spread

Interpreting Standard Scores z = 0, average score z = 0, average score z <=-1, low score z <=-1, low score z >=1, high score z >=1, high score z is converted to some other scaling: z is converted to some other scaling: Mean50100500 Mean50100500 SD1015100 SD1015100

Interpreting Standard Scores pp. 42,43,48 in book give guidelines pp. 42,43,48 in book give guidelines Easiest to use when converted to percentiles Easiest to use when converted to percentiles % of population that scores at or below a given score % of population that scores at or below a given score Can be thought of as a rank out of 100 members of the population Can be thought of as a rank out of 100 members of the population

Interpreting Standard Scores Common interpretation strategies: Common interpretation strategies: Normal range is middle 68% of the population (T=40-60, z=-1 to 1, etc.) Normal range is middle 68% of the population (T=40-60, z=-1 to 1, etc.) Low and high scores fall outside this range (lower and upper 16%) Low and high scores fall outside this range (lower and upper 16%)

Interpreting Standard Scores Common interpretation strategies: Common interpretation strategies: Normal range is middle 50% of the population (Quartiles 2 & 3) Normal range is middle 50% of the population (Quartiles 2 & 3) Low and high scores fall outside this range (Quartiles 1 and 4) Low and high scores fall outside this range (Quartiles 1 and 4)

Interpreting Standard Scores Safer to make broad classification like “Low”, “Within the normal, or expected, range”, or “High” than fine distinctions. Safer to make broad classification like “Low”, “Within the normal, or expected, range”, or “High” than fine distinctions. All scores have some measurement error in them. All scores have some measurement error in them. Look for patterns across the battery, across multiple sources. Look for patterns across the battery, across multiple sources.

An Example from WCCS Christina, a 1 st grade student at our school, took the Stanford Achievement Test last year. Here are her Word Study Skills subtest scores. Christina, a 1 st grade student at our school, took the Stanford Achievement Test last year. Here are her Word Study Skills subtest scores.

Percent Correct The number of correct responses, or the raw score, is divided by the total number of questions, then multiplied by 100 and expressed as a percentage. The number of correct responses, or the raw score, is divided by the total number of questions, then multiplied by 100 and expressed as a percentage.

Percent Correct Christina gave the correct answer to 83.33% of the questions on the Word Study Skills section of the test. Christina gave the correct answer to 83.33% of the questions on the Word Study Skills section of the test.

Scaled Score The raw score is standardized and normalized, then rescaled to the desired scaling. The raw score is standardized and normalized, then rescaled to the desired scaling. z = (Raw Score – Mean) / SD z = (Raw Score – Mean) / SD Scaled Score ≈ 500 + (100*z) Scaled Score ≈ 500 + (100*z)

Scaled Score Scaled Scores have many convenient properties from a statistical standpoint. Scaled Scores have many convenient properties from a statistical standpoint. However, for most people, percentile ranks are easier to understand. However, for most people, percentile ranks are easier to understand.

Scaled Score Christina scored more than one Standard Deviation above average. Her scores are in the above average range. Christina scored more than one Standard Deviation above average. Her scores are in the above average range.

Percentile Rank A percentile rank is a statement of the percentage of persons in a given group who fall at or below a given score. A percentile rank is a statement of the percentage of persons in a given group who fall at or below a given score. The most common way of reporting test scores and the easiest to use. The most common way of reporting test scores and the easiest to use.

Percentile Rank Christina scored as well or better than 81% of all students in the nation who took this section of the test. Christina scored as well or better than 81% of all students in the nation who took this section of the test.

Percentile Rank Christina scored as well or better than 57% of all students in ACSI schools who took this section of the test. Christina scored as well or better than 57% of all students in ACSI schools who took this section of the test.

Percentile Rank This pattern is typical for our students on average. This pattern is typical for our students on average. –≈ 80 th percentile nationally –≈ 60 th percentile for ACSI students –What does this mean?

Stanine Standard score of nine units Standard score of nine units Developed by the military to contain test score information in one column on an IBM punch card Developed by the military to contain test score information in one column on an IBM punch card Nine groups (1-9), ½ SD, range of PRs Nine groups (1-9), ½ SD, range of PRs

Stanine Christina’s scores fall in the 7 th stanine, or above average compared to all students nationally. Christina’s scores fall in the 7 th stanine, or above average compared to all students nationally. Christina’s scores fall in the 5 th stanine, or average for ACSI students. Christina’s scores fall in the 5 th stanine, or average for ACSI students.

Grade Equivalent Scores Attempt to translate test scores into the grade (grade and month) when the score is typical. Attempt to translate test scores into the grade (grade and month) when the score is typical. Have an intrinsic appeal. Have an intrinsic appeal. Are problematic statistically. Are problematic statistically. Based on extrapolations. Based on extrapolations.

Grade Equivalent Scores Christina, a 1 st grade student at our school, in the area of Word Study Skills, is performing at the level of a typical 3 rd grade student in the seventh month of the school year (on the 1 st grade test). Christina, a 1 st grade student at our school, in the area of Word Study Skills, is performing at the level of a typical 3 rd grade student in the seventh month of the school year (on the 1 st grade test).

An SAT Example Mark, a 12 th grade student at our school, took the SAT test last year. Here are his scores. Mark, a 12 th grade student at our school, took the SAT test last year. Here are his scores.

An SAT Example Section mean ≈ 500, SD ≈ 100 Section mean ≈ 500, SD ≈ 100 Range = 200-800 (-3z to +3z) Range = 200-800 (-3z to +3z) Total mean ≈ 1000, SD ≈ 200 Total mean ≈ 1000, SD ≈ 200 Range = 400-1600 Range = 400-1600

An SAT Example Mark scored a 620 on the verbal section of the test. His score was more than one Standard Deviation above the mean and is considered above average. Mark scored a 620 on the verbal section of the test. His score was more than one Standard Deviation above the mean and is considered above average.

An SAT Example Mark’s score on the verbal section of the test was as good or better than 83% of the students who took the test. Mark’s score on the verbal section of the test was as good or better than 83% of the students who took the test.

An SAT Example Mark scored a 570 on the quantitative section of the test. His score was within the normal range and is considered average. Mark scored a 570 on the quantitative section of the test. His score was within the normal range and is considered average.

An SAT Example Mark’s score on the quantitative section of the test was as good or better than 66% of the students who took the test. Mark’s score on the quantitative section of the test was as good or better than 66% of the students who took the test.

An SAT Example Mark scored a 1190 total score and his score was within the normal range and is considered average. Mark scored a 1190 total score and his score was within the normal range and is considered average.

An SAT Example Mark’s total score was as good or better than 61% of the students who took the test. Mark’s total score was as good or better than 61% of the students who took the test.

General Principles Tests do not measure innate ability Tests do not measure innate ability Test scores result from a combination of: Test scores result from a combination of: –Innate ability –Environmental influences –Test taker motivation –Properties of the test itself

Cautions about Interpretation A low score in one norm group may be high in another, and vice versa. A low score in one norm group may be high in another, and vice versa. A low score on one test will not necessarily lead to a high score on another test. A low score on one test will not necessarily lead to a high score on another test.

Cautions about Interpretation Interpretation is part art or clinical intuition and experience. Interpretation is part art or clinical intuition and experience. Become familiar with case studies in manuals. Become familiar with case studies in manuals.

Standardization the properties of objective tests.

Similar presentations

Presentation on theme: "Standardization the properties of objective tests."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Standardization the properties of objective tests.

Similar presentations

Presentation on theme: "Standardization the properties of objective tests."— Presentation transcript:

Similar presentations

About project

Feedback