Presentation on theme: "What is a Good Test 1. Validity: Does test measure what it is supposed to measure? 2. Reliability: Are the results consistent? 3. Objectivity: Can two."— Presentation transcript:
What is a Good Test 1. Validity: Does test measure what it is supposed to measure? 2. Reliability: Are the results consistent? 3. Objectivity: Can two or more people administer the test to the same group and get similar results? 4. Administrative Feasibility: cost, time, ease of administration
Criteria for a Good Test Validity Objectivity Administrative Feasibility Reliability
Validity Objectivity Administrative Feasibility Reliability Criteria for a Good Test
Validity Objectivity Administrative Feasibility Reliability Criteria for a Good Test
Purpose of Assignment 1. Start thinking about how these criteria effect your data collection and ultimately the inferences you make regarding your population 2. Recognize “other” issues that must be addressed during your data collection 3. Know when to “punt”
Scenario You are a fitness instructor at a local health club. Part of your job is to assess the fitness level of every new member. You are charged with assessing: 1. Cardiovascular fitness 2. Muscular strength 3. Body composition 4. Current level of physical activity
Conditions 1. Members consist of all ages, fitness levels, and cultural backgrounds 2. You have limited equipment; choose your measurement tool 3. the fitness analysis is mandatory for every member
Validity * Most important criterion to consider when evaluating a test. *Traditionally validity refers to the degree to which a test actually measures what it claims to measure. Refers more to the agreement between what the test measures and the performance, skill, or behavior the test is designed to measure. Means validity is specific to a particular use and group.
Validity *Evidence to support validity is reported as validity coefficient. *Coefficient is determined through correlation technique. *Closer the coefficient is to +1.00, the more the valid the test. *A test used as a substitute for another validated test should have a validity coefficient of 0.80 or higher. *Predictive tests with validity coefficients of 0.60 have been accepted.
Validity of Norm-Referenced Tests *Three types of validity evidence reported for norm- referenced tests. Validity of test is better accepted if more than one type of strong validity evidence is reported.
Validity of Norm-Referenced Tests Content Validity *Related to how well a test measures all skills and subject matter that have been presented to individuals. *To have content validity, test must be related to objectives of class, presentation, etc. (that for which the group is responsible). *The longer the test, the easier it is to have content validity. *Realistic test (sample test) must represent the total content of a longer test.
Validity of Norm-Referenced Tests Content Validity *Skills test must have content validity also. *Ask self - Does test measure what group has been taught? *May be provided through the use of experts on the area that you are testing.
Validity of Norm-Referenced Tests Content validity evidence sometimes called logical or face validity. When possible, it is best to use content validity evidence with other validity evidence.
Validity of Norm-Referenced Tests Criterion Validity Evidence Indicated by how well test scores correlate with a specific criterion (successful performance). Specific criterion may be future performance or current performance.
Validity of Norm-Referenced Tests Predictive Validity Evidence *Used to estimate future performance. *Generally, a predictor test is given and correlated with a criterion measure (variable that has been defined as indicating successful performance of a trait). *Criterion measure is obtained at a later date; may be after an instructional unit or after period of development.
Validity of Norm-Referenced Tests Predictive Validity Evidence *Definition of successful performance is sometimes difficult to estimate. *One method of determining successful performance is through a panel of experts; correlate experts’ ratings with performance on test. *SAT and ACT; predictor of success in college; criterion measure is success in college.
Validity of Norm-Referenced Tests Concurrent Validity *Immediate predictive validity; indicates how well individual currently performs a skill. *Test results correlated with a current criterion measurement. *Test and criterion measurement administered at approximately same time; procedure often used to estimate validity.
Validity of Norm-Referenced Tests Choice of criterion measure is important consideration in estimation of criterion validity evidence. Three criterion measures used most often. 1.Expert ratings 2.Tournament play 3.Previously validity test
Validity of Norm-Referenced Tests Construct Validity Evidence *Refers to the degree that the individual possesses a trait (construct) presumed to be reflected in the test performance. *Anxiety, intelligence, and motivation are constructs. *Examples - cardiovascular fitness and tennis skills *Construct validity can be demonstrated by comparing higher-skilled individuals with lesser-skilled individuals.
Validity of Criterion-Referenced Tests *Directly related to predetermined behavioral objectives. *Objectives must be stated in a clear, exact manner and be limited to small segments of instruction. *Test items should be constructed to parallel behavioral objectives. *Several test items for each objective; validity estimated by how well they measure the behavioral objective.
Validity of Criterion-Referenced Tests *May also determine C-R validity by testing prior to and after instruction; validity accepted if significant improvement after instruction or if the behavioral objectives are master by an acceptable number of individuals. *Success of C-R testing depends the predetermined standard of success; must be realistic, but high enough to require individuals to develop skill.
Validity of Criterion-Referenced Tests Domain-referenced validity evidence - technique used to validate C-R tests *The word domain used to represent the criterion behavior. *If test items represent the criterion behavior, test has logical validity (referred to as domain-referenced validity). Example: 1. Topspin tennis serve technique is analyzed. 2. Most important components of the serve form are included in the criterion behavior. 3. Successful performance is defined - form, number of successful serves out of attempted serves, and placement and speed of serves.
Validity of Criterion-Referenced Tests Decision validity *Used to validate C-R tests when a test’s purpose is to classify individuals as proficient or nonproficient. *Cutoff score is identify, and individuals scoring above the cutoff score are classified as proficient.
Factors Affecting Validity 1.The characteristics of the individuals being tested - Test is valid only for individuals of gender, age, and experience similar to those on whom the test was validated. 2. The criterion measure (variable that has been defined as indicating successful performance of a trait) selected - Different measures correlated with the same set of scores will produce different correlation coefficients (expert ratings, tournament play, previous validated tests)
Factors Affecting Validity 3. Reliability - Test must be reliable to be valid. 4. Administrative procedures - Validity will be affected if unclear directions are given, or if all individuals do not perform the test the same way.
Reliability *Refers to consistency of a test. *Reliable test should obtain approximately the same results each time it is administered. *Individuals may not obtain the same score on the second administration of a test (fatigue, motivation, environmental conditions, and measurement error may affect scores), but the order of the scores will be approximately the same if test has reliability.
Reliability *To have a high degree of validity, a test must have a high degree of reliability. *Objective measures have higher reliability than subjective measures.
Methods of Estimating Reliability of Norm-Referenced Tests Test-Retest Method *Requires two administration of same test to the same group of individuals. *Calculate correlation coefficient between the two sets of scores (intraclass correlation coefficient best). *Greatest source of error in this method is caused by changes in individuals being tested. *Appropriate time interval between administration of tests sometimes difficult to determine.
Methods of Estimating Reliability of Norm-Referenced Tests Parallel Forms Method *Requires the administration of parallel or equivalent forms of a test to the same group and calculation of the correlation coefficient. *Of both forms of test are administered during the same test period or in two sessions separated by a short time period.
Methods of Estimating Reliability of Norm-Referenced Tests Parallel Forms Method *Primary problem with this method - difficult to construct two tests that are parallel in content and item characteristics. *If both tests administered within short time of each other, learning, motivation, and testing conditions do not influence correlation coefficient. *Reliability of most standardized tests estimated through this method.
Methods of Estimating Reliability of Norm-Referenced Tests Split-Half Method *Test is split into halves; scores of the halves are correlated. *Requires only one administration of test. *Common practice is to correlate odd-numbered items with even-numbered items. *Reliability coefficient is for a test of only half the length of original test. *Reliability usually increases as length of test increases. Spearman-Brown formula often use to estimate reliability of full test.
Methods of Estimating Reliability of Norm-Referenced Tests Spearman-Brown Formula Reliability of full test = 2 x reliability of half test 1 + reliability of half test Example: Reliability of two halves of a test =.70 Reliability of full test = 2 x.70 = 1.4 =.82 1 +.70 1.7 *Split-half method may produce an inflated correlation coefficient, but it is frequently used to estimate reliability coefficients of knowledge test.
Methods of Estimating Reliability of Norm-Referenced Tests Kuder-Richardson Formula 21 *Many ways to split a test to compute “half-test” scores for correlation purposes; for each split different correlation coefficient probably would be obtained. *K-R 21 estimates the average correlation that might be obtained if all possible split-half combinations of a group of items were correlated. *Basic assumptions: 1. Test items can be scored 1 for correct and 0 for wrong. 2. The total score is the sum of the item scores.
Methods of Estimating Reliability of Norm-Referenced Tests Kuder-Richardson Formula 21 r kr = n 1 - X(n - X) n = number of items n - 1 n(s 2 ) X = test mean (average number of items answered correctly s 2 = test variance Example: n = 50, X = 40, s 2 = 25 r kr = 50 1 - 40(50 - 40) = 1.02 1 - 40(10) = 1.02(1 -.40) 49 50(25) 1000 r kr = 1.02(.60) =.61
Reliability of Criterion-Referenced Tests *Defined as consistency of classification (how consistently the test classified individuals as masters or nonmasters. *Determined in much the same way as reliability of norm- referenced tests (test-retest, parallel forms, split-half, or K- R formulas). *C-R reliability applies to a single cluster of items (each cluster is intended to measure the attainment of a different objective). *Reliability coefficient estimated for each cluster.
Reliability of Criterion-Referenced Tests Also be estimated through the proportion of agreement coefficient. - Test administered to a group; based on the results of test scores, each individual is classified as a master or nonmaster. On another day, group is administered same test again, and again each person is classified as master or nonmaster. - Proportion of agreement determined by how many group members are classified as masters and nonmasters on both test days.
Factors Affecting Reliability 1.Method of scoring - The more objective the test, the higher the reliability. 2. The heterogeneity of the group - Reliability coefficients based on test scores from a group ranging in abilities will be overestimated. 3. The length of the test - The longer the test, the greater the reliability. 4. Administrative procedures - The directions must be clear, all individuals should be ready, motivated to do well, and perform the test in the same way; testing environment should be favorable to good performance.
Objectivity *Test has high objectivity when two or more persons can administer the same test to the same group and obtain approximately the same results. *Specific form of reliability. *Determined by test-retest (different individuals administer the test) correlational procedure. *Certain forms of measurement are more objective than others.
Objectivity *More likely to take place with: 1. Complete and clear instructions for administration and scoring. 2. Administration of test by trained administrators. 3. Use of simple measurement procedures. 4. Use of appropriate mechanical tools of measurement. 5. Numerical scores; phrases or terms less likely to reflect objectivity.
Administrative Feasibility Administrative considerations may determine which test you use. 1. Cost 2. Time 3. Ease of administration 4. Scoring 5. Norms Good sports skills test will be similar to game performance.