Test Design & Construction

Test Design & Construction
RSCH 6109: Assessment & Evaluation Methods Test Design & Construction Purpose & Framework Test Specifications or Blueprint Item Construction Field Testing Evaluation & Revision Classification of Items Response Formats Scoring Procedures Select Item Writing Guidelines MCQs Likert Rating Scales

AERA, APA, & NCME (1999). Standards for Educational and Psychological Testing; Washington, DC: American Educational Research Association Drummond, R.J. (2004). Appraisal Procedures for Counselors and Helping Professionals, 5th Ed.; New Jersey: Pearson Publishing Step 1 Delineate Purpose Phase 1 Establish the need Phase 2 Define the objectives & test parameters Step 2 Develop test specs or blueprint Phase 3 Seek advisory committee input Step 3 Develop items or tasks & field test Phase 4 Write questions Phase 5 Field test Phase 6 Review items Step 4 Assemble & evaluate test Phase 7 Assemble final version Phase 8 Secure technical data

Step 1: Delineate the Purpose & Framework The purpose and framework delineate what the test is intended to measure. Step 2: Prepare the Table of Specifications The table of specifications typically describes the specific format of the items, the response format, and the type of scoring procedures. Step 3: Develop Test Items or Tasks

Defining the Purpose RSCH 6109: Assessment & Evaluation Methods
Like a mission statement for the test Define the construct to be measured Define the population with whom the test is to be used Determine the target audience for the information the test provides, the test users Define the nature of the decisions to be made based on the information the test provides

What is a Construct? RSCH 6109: Assessment & Evaluation Methods
A construct is an unobservable quality, ability, or attribute We believe from theory that each person possesses some “amount” of the construct We can’t directly observe or measure the “amount” or level We rely on outward behaviors as indicators of the latent, or underlying construct Contrast Blood Pressure and Depression

Defining the Content Domain
RSCH 6109: Assessment & Evaluation Methods Defining the Content Domain Theory Literature Expert opinion Qualitative research The goal is to include all aspects of the construct you intend to measure

Step 1: Delineate the Purpose & Framework Example: (Optimal) The purpose of the Counselor Achievement Test (CAT) is to assess counseling students’ knowledge, skills, and abilities for effective counseling services. The framework of the CAT is modeled after the National Counselor’s Exam (NCE) and includes eight content areas. The CAT will consist of selected- and constructed-response items, as well as performance tasks. The CAT will be a criterion referenced measure. (Typical) The purpose of the Study Habits Scale (SHS) is to assess college students’ habits of study. The SHS includes (between 18 and 30) items. The framework of the SHS is based on the work of Blai (1993). The SHS is a self-report measure designed to identify students’ study attitudes and behaviors.

Sample Table of Specifications for CAT
Test Design & Construction Step 2: Develop the test specifications or blueprint The table of specifications or test blueprint typically describes the number of items, the specific classification of the items and response format, and the type of scoring procedures. Sample Table of Specifications for CAT TTL# Content Area Item Classification* Format (#of Items) K C AP AN S E 3 Human growth and development 1 MCQ (2) Constructed Response (1) Social and cultural foundations Helping relationships Group Work Career and lifestyle development Appraisal Research and program evaluation Professional orientation & ethics 24 6 2 *Refers to Bloom’s Taxonomy of Educational Objectives (1956). K=knowledge, C=comprehension, A=application, A=analysis, S=synthesis, and E=evaluation

Developing Items RSCH 6109: Assessment & Evaluation Methods
Determine the target length in time to administer and number of items Consider intended use and practical constraints – cost, complexity of scoring, etc. Consider the purpose and the stakes involved in decision making Initially write at least twice as many items as needed Contrast a screening test with a diagnostic test

Screening Tests RSCH 6109: Assessment & Evaluation Methods Short
Easy to administer Inexpensive Easy to score Maximizes Sensitivity Makes the correct decision when the condition of interest is present – Minimizes false negatives.

Diagnostic Tests RSCH 6109: Assessment & Evaluation Methods Longer
More complex to administer More expensive Harder to score Maximizes Specificity Makes the correct decision when the condition of interest is not present – Minimizes false positives.

Step 2: Develop the test specifications or blueprint The table of specifications typically describes the specific classification of the items, the response format, and the type of scoring procedures. Item Classifications: Bloom and Krathwohl (1956) Knowledge  Comprehension  Application  Analysis  Synthesis  Evaluation Define, Identify, List, Name Convert, Explain, Summarize Compute, Determine, Solve Analyze, differentiate, Relate Design, Devise, Formulate, Plan Compare, Critique, Evaluate, Judge

Bloom, et al’s Taxonomy of Educational Objectives (Cognitive Domain)
Knowledge Remembering previously learned material. Requires recall of facts, procedures, Define, Recall, Identify, List, Name rules or events. Comprehension Grasping the meaning of material. Requires reformulation, restatement, translation, Convert, Explain, Summarize or interpretation of content or identification of relationships. Application Using information in concrete situations. Requires use of information in a setting Compute, Demonstrate, Solve or context other than where it was learned. Analysis Breaking down material into parts. Requires recognition of logical errors, Analyze, Infer, Differentiate, Relate comparison of components, or differentiation between components. Synthesis Putting parts together into whole. Requires production of something original, Design, Construct, Combine, Formulate solution to an unfamiliar problem, or combination of parts in an unusual way. Evaluation Judging the value of a thing for a given purpose using definitive criteria. Requires Discriminate, Critique, Evaluate,Judge formation of judgements about the worth or value of ideas, products, or procedures that have a specific purpose. There are several taxonomies which exist to assist instructors in the development of learning and assessment objectives. In this context, we refer to taxonomy as a system for classifying educational objectives according to shared characteristics and natural relationships. The most well know taxonomy was developed by Benjamin Bloom and colleagues in the mid 1950s. Three different taxonomies for cognitive, affective, and psychomotor domains were developed. The most widely used is the cognitive domain which includes six levels of classification. Review each level.

Response Formats: Selected-Response Response sets are provided and the user is forced to select among the choices. Examples include: MCQ, T/F, Yes/No, Matching, and Likert Ratings Constructed-Response No response sets are provided and the user is forced to provide a unique response. Examples include: Short Answer & Extended Answer. Performance Tasks No response sets are provided and the user is required to develop a product or perform some task or set of tasks. Examples include: Restricted and Extended Performance Tasks.

Selected-Response Formats
1. Multiple Choice Questions (MCQ) Multiple choice items include a question or STEM followed by a number of possible responses or OPTIONS. These options make-up the RESPONSE SET of the item. 2. True – False Questions True – false items include a stem and two discrete options. These options can be “True-False”, “Yes-No”, “Always-Never”, etc. 3. Matching Items Matching exercises consist of two columns of information. The student is required to select the item in the second column which best reflects the item in the first column. 4. Likert Rating Scale Items Likert ratings include a scale ranging from one extreme to another. The anchors of the scale vary depending on the nature of the statement.

Constructed-Response Formats (Optimal Performance)
1. Short Answer Questions Completion or short answer formats consist of questions that can be answered with a word or short phrase, or a statement having one or more omitted words. 2. Limited Essay Questions Limited essay questions consist of tasks or items requiring students to give brief, concise responses. 3. Extended Essay Questions Extended essay questions consist of tasks or items that allow students freedom to choose the form and scope of their responses.

Format Advantages Disadvantages
MCQ Assesses broad range of skills in a limited amount of time. Scoring can be done quickly and objectively. True-False Numerous items can be administered in a brief amount of time. Easy to write and objective to score. Matching Assessed a broad range of skills in a limited time. Scoring can be done quickly and objectively. Short Answer Numerous items can be administered in a short time. Moderately easy to write and score items. Guessing is difficult. Essay Assesses broad range of skills, particularly higher order cognitive skills. Guessing is difficult. Difficult and time consuming to write higher order cognitive items. Most items assess knowledge thru comprehension. Guessing reduces validity of scores. Limited in complexity. Guessing reduces validity of scores. Not appropriate for optimal performance measures. Higher order cognitive skills are difficult to assess. Guessing reduces validity of scores. Limited to items that require very few words. Spelling errors can make scoring difficult. Time consuming to administer and score. Limited content can be sampled during a test period. Scoring can be subjective.

Step 2: Develop the test specifications or blueprint Scoring Procedures: Selected-Response Typically, selected response items include 1 correct answer (a.k.a., dichotomous scoring). However, some tests may weigh responses differently. Rating scale items are typically added together for a total score. For example, ten 5-point Likert rating scale items would yield a score range from 10 to 50. Typically, a higher score denotes stronger agreement, satisfaction, etc. with the overall construct.

Step 2: Develop the test specifications or blueprint Scoring Procedures (continued): Constructed-Response These formats are relatively more subjective, time consuming, and expensive to score. Short-answer items require a list of acceptable answers. Extended response items typically require a scoring rubric. A scoring rubric is a table describing the criteria for scoring, including detailed descriptions for varying degrees of performance. The scoring rubric may yield a holistic or analytic score. Holistic scores refer to the overall impression of the response (or behavior) and analytic scores refer to the discrete dimensions of the response (or behavior). Holistic scores yield one overall score and analytic scores typically yield sub-scores as well as an overall score. Performance tasks vary depending on the nature and complexity of the tasks. Scoring procedures may require a checklist, Likert rating scale, or rubric.

Supplemental Information: MCQ Alternatives
1. Confidence Weighting Student is asked to indicate what he believes is the correct answer and how certain he is it is correct. Confident items are weighed more heavily than less confident items. 2. Answer Until Correct (AUC) Student chooses alternatives until the correct response is selected. Once selected, the student moves on to the next item.

Supplemental Information: MCQ Alternatives
3. Elimination & Inclusion Scoring Student is asked to either cross out all the alternatives that are incorrect (elimination) or circle the alternatives that are most likely correct (inclusion). 4. Multiple-Answer Format Student is told that any number of the options might be correct. Each item is scored by subtracting the number of incorrect answers from the number of correct answers.

Sample Item: Confidence Weighting
Please respond to the following items by circling the letter that corresponds to the correct response. In addition, please rate your level of confidence with your response to each item by circling the corresponding confidence level. What is the main advantage of using a table of specifications when preparing an achievement test? A. It reduces the amount of time required. (+0) B. It improves the sampling of content. (+1) C. It makes the construction of test items easier. (+0) D. It increases the objectivity of the test. (+0) Please circle the number that corresponds to the best descriptor for your level of confidence with the answer chosen: Extremely Fairly Neutral Fairly Extremely Confident Confident Unconfident Unconfident Scoring Guide: Multiply the correct answer by the level of confidence. For this example, the student would receive 4 out of a possible 5 points.

Sample Item: AUC Please answer the following items by removing the overlay that corresponds to your response. If the answer chosen reveals an “INCORRECT” response, continue selecting until you reveal the “CORRECT” response. Once you have identified the “CORRECT” response you have completed the item and should move on to the next question. What is the main advantage of using a table of specifications when preparing an achievement test? A. It reduces the amount of time required. B. It improves the sampling of content. C. It makes the construction of test items easier. D. It increases the objectivity of the test. Scoring Guide: 1st Attempt = 100% 2nd Attempt = 66% 3rd Attempt = 33% 4th Attempt = 0%

Sample Item: Elimination Scoring
Please respond to the following items by circling the letter that corresponds to the correct response. In addition, please draw a line through those items that you confidently believe are incorrect. What is the main advantage of using a table of specifications when preparing an achievement test? Scoring Guide A. It reduces the amount of time required. (+05) B. It improves the sampling of content. (+85%) C. It makes the construction of test items easier. D. It increases the objectivity of the test.

RSCH 6109: Assessment & Evaluation Methods Test Design & Construction Purpose & Framework Test Specifications or Blueprint Item Construction Field Testing Evaluation & Revision Classification of Items Response Formats Scoring Procedures Select Item Writing Guidelines MCQs Likert Rating Scales

Test Design & Construction

Similar presentations

Presentation on theme: "Test Design & Construction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Test Design & Construction

Similar presentations

Presentation on theme: "Test Design & Construction"— Presentation transcript:

Similar presentations

About project

Feedback