Presentation on theme: "What is a Good Test Validity: Does test measure what it is supposed to measure? Reliability: Are the results consistent? Objectivity: Can two or more."— Presentation transcript:
1What is a Good TestValidity: Does test measure what it is supposed to measure?Reliability: Are the results consistent?Objectivity: Can two or more people administer the test to the same group and get similar results?Administrative Feasibility: cost, time, ease of administration
2Criteria for a Good Test ValidityObjectivityAdministrativeFeasibilityReliability
3Criteria for a Good Test ValidityObjectivityAdministrativeFeasibilityReliability
4Criteria for a Good Test ValidityObjectivityAdministrativeFeasibilityReliability
5Purpose of Assignment Know when to “punt” Start thinking about how these criteria effect your data collection and ultimately the inferences you make regarding your populationRecognize “other” issues that must be addressed during your data collectionKnow when to “punt”
6ScenarioYou are a fitness instructor at a local health club. Part of your job is to assess the fitness level of every new member. You are charged with assessing:Cardiovascular fitnessMuscular strengthBody compositionCurrent level of physical activity
7Conditions1. Members consist of all ages, fitness levels, and cultural backgrounds2. You have limited equipment; choose your measurement tool3. the fitness analysis is mandatory for every member
8Validity*Most important criterion to consider when evaluating a test.*Traditionally validity refers to the degree to which a test actually measures what it claims to measure.Refers more to the agreement between what the test measures and the performance, skill, or behavior the test is designed to measure.Means validity is specific to a particular use and group.
9Validity*Evidence to support validity is reported as validity coefficient.*Coefficient is determined through correlation technique.*Closer the coefficient is to +1.00, the more the valid thetest.*A test used as a substitute for another validated testshould have a validity coefficient of 0.80 or higher.*Predictive tests with validity coefficients of 0.60 havebeen accepted.
10Validity of Norm-Referenced Tests *Three types of validity evidence reported for norm-referenced tests.Validity of test is better accepted if more than one type of strong validity evidence is reported.
11Validity of Norm-Referenced Tests Content Validity*Related to how well a test measures all skills and subjectmatter that have been presented to individuals.*To have content validity, test must be related to objectivesof class, presentation, etc. (that for which the group isresponsible).*The longer the test, the easier it is to have content validity.*Realistic test (sample test) must represent the total contentof a longer test.
12Validity of Norm-Referenced Tests Content Validity*Skills test must have content validity also.*Ask self - Does test measure what group has beentaught?*May be provided through the use of experts on the area that you are testing.
13Validity of Norm-Referenced Tests Content validity evidence sometimes called logical or face validity.When possible, it is best to use content validity evidence with other validity evidence.
14Validity of Norm-Referenced Tests Criterion Validity EvidenceIndicated by how well test scores correlate with a specific criterion (successful performance).Specific criterion may be future performance or current performance.
15Validity of Norm-Referenced Tests Predictive Validity Evidence*Used to estimate future performance.*Generally, a predictor test is given and correlated with acriterion measure (variable that has been defined asindicating successful performance of a trait).*Criterion measure is obtained at a later date; may be afteran instructional unit or after period of development.
16Validity of Norm-Referenced Tests Predictive Validity Evidence*Definition of successful performance is sometimesdifficult to estimate.*One method of determining successful performance isthrough a panel of experts; correlate experts’ ratingswith performance on test.*SAT and ACT; predictor of success in college; criterionmeasure is success in college.
17Validity of Norm-Referenced Tests Concurrent Validity*Immediate predictive validity; indicates how wellindividual currently performs a skill.*Test results correlated with a current criterionmeasurement.*Test and criterion measurement administered atapproximately same time; procedure often used toestimate validity.
18Validity of Norm-Referenced Tests Choice of criterion measure is important consideration in estimation of criterion validity evidence.Three criterion measures used most often.Expert ratingsTournament playPreviously validity test
19Validity of Norm-Referenced Tests Construct Validity Evidence*Refers to the degree that the individual possesses a trait(construct) presumed to be reflected in the testperformance.*Anxiety, intelligence, and motivation are constructs.*Examples - cardiovascular fitness and tennis skills*Construct validity can be demonstrated by comparinghigher-skilled individuals with lesser-skilled individuals.
20Validity of Criterion-Referenced Tests *Directly related to predetermined behavioral objectives.*Objectives must be stated in a clear, exact manner andbe limited to small segments of instruction.*Test items should be constructed to parallel behavioralobjectives.*Several test items for each objective; validity estimatedby how well they measure the behavioral objective.
21Validity of Criterion-Referenced Tests *May also determine C-R validity by testing prior to andafter instruction; validity accepted if significantimprovement after instruction or if the behavioralobjectives are master by an acceptable number ofindividuals.*Success of C-R testing depends the predeterminedstandard of success; must be realistic, but high enoughto require individuals to develop skill.
22Validity of Criterion-Referenced Tests Domain-referenced validity evidence - technique used to validate C-R tests*The word domain used to represent the criterion behavior.*If test items represent the criterion behavior, test has logicalvalidity (referred to as domain-referenced validity).Example:1. Topspin tennis serve technique is analyzed.2. Most important components of the serve form are includedin the criterion behavior.3. Successful performance is defined - form, number ofsuccessful serves out of attempted serves, and placementand speed of serves.
23Validity of Criterion-Referenced Tests Decision validity*Used to validate C-R tests when a test’s purpose is toclassify individuals as proficient or nonproficient.*Cutoff score is identify, and individuals scoring above thecutoff score are classified as proficient.
24Factors Affecting Validity The characteristics of the individuals being tested - Testis valid only for individuals of gender, age, and experience similar to those on whom the test was validated.2. The criterion measure (variable that has been defined as indicating successful performance of a trait) selected - Different measures correlated with the same set of scores will produce different correlation coefficients (expert ratings, tournament play, previous validated tests)
25Factors Affecting Validity 3. Reliability - Test must be reliable to be valid.4. Administrative procedures - Validity will be affectedif unclear directions are given, or if all individuals donot perform the test the same way.
26Reliability *Refers to consistency of a test. *Reliable test should obtain approximately the same resultseach time it is administered.*Individuals may not obtain the same score on the secondadministration of a test (fatigue, motivation, environmentalconditions, and measurement error may affect scores), butthe order of the scores will be approximately the same iftest has reliability.
27Reliability*To have a high degree of validity, a test must have a highdegree of reliability.*Objective measures have higher reliability than subjectivemeasures.
28Methods of Estimating Reliability of Norm-Referenced Tests Test-Retest Method*Requires two administration of same test to the samegroup of individuals.*Calculate correlation coefficient between the two sets ofscores (intraclass correlation coefficient best).*Greatest source of error in this method is caused bychanges in individuals being tested.*Appropriate time interval between administration of testssometimes difficult to determine.
29Methods of Estimating Reliability of Norm-Referenced Tests Parallel Forms Method*Requires the administration of parallel or equivalentforms of a test to the same group and calculation of thecorrelation coefficient.*Of both forms of test are administered during the sametest period or in two sessions separated by a short timeperiod.
30Methods of Estimating Reliability of Norm-Referenced Tests Parallel Forms Method*Primary problem with this method - difficult toconstruct two tests that are parallel in content and itemcharacteristics.*If both tests administered within short time of eachother, learning, motivation, and testing conditions donot influence correlation coefficient.*Reliability of most standardized tests estimated throughthis method.
31Methods of Estimating Reliability of Norm-Referenced Tests Split-Half Method*Test is split into halves; scores of the halves are correlated.*Requires only one administration of test.*Common practice is to correlate odd-numbered items witheven-numbered items.*Reliability coefficient is for a test of only half the length oforiginal test.*Reliability usually increases as length of test increases.Spearman-Brown formula often use to estimate reliability of full test.
32Methods of Estimating Reliability of Norm-Referenced Tests Spearman-Brown FormulaReliability of full test = 2 x reliability of half test1 + reliability of half testExample:Reliability of two halves of a test = .70Reliability of full test = 2 x = 1.4 = .82*Split-half method may produce an inflated correlationcoefficient, but it is frequently used to estimate reliabilitycoefficients of knowledge test.
33Methods of Estimating Reliability of Norm-Referenced Tests Kuder-Richardson Formula 21*Many ways to split a test to compute “half-test” scores forcorrelation purposes; for each split different correlationcoefficient probably would be obtained.*K-R 21 estimates the average correlation that might beobtained if all possible split-half combinations of a group ofitems were correlated.*Basic assumptions:1. Test items can be scored 1 for correct and 0 for wrong.2. The total score is the sum of the item scores.
34Methods of Estimating Reliability of Norm-Referenced Tests Kuder-Richardson Formula 21rkr = n X(n - X) n = number of itemsn n(s2) X = test mean (average numberof items answered correctlys2 = test varianceExample:n = 50, X = 40, s2 = 25rkr = ( ) = (10) = 1.02( )(25)rkr = 1.02(.60) = .61
35Reliability of Criterion-Referenced Tests *Defined as consistency of classification (how consistentlythe test classified individuals as masters or nonmasters.*Determined in much the same way as reliability of norm-referenced tests (test-retest, parallel forms, split-half, or K-R formulas).*C-R reliability applies to a single cluster of items (eachcluster is intended to measure the attainment of a differentobjective).*Reliability coefficient estimated for each cluster.
36Reliability of Criterion-Referenced Tests Also be estimated through the proportion of agreement coefficient.- Test administered to a group; based on the results oftest scores, each individual is classified as a master ornonmaster. On another day, group is administered sametest again, and again each person is classified as masteror nonmaster.- Proportion of agreement determined by how manygroup members are classified as masters andnonmasters on both test days.
37Factors Affecting Reliability Method of scoring - The more objective the test, the higher the reliability.2. The heterogeneity of the group - Reliability coefficients based on test scores from a group ranging in abilities will be overestimated.3. The length of the test - The longer the test, the greater the reliability.4. Administrative procedures - The directions must be clear, all individuals should be ready, motivated to do well, and perform the test in the same way; testing environment should be favorable to good performance.
38Objectivity *Test has high objectivity when two or more persons can administer the same test to the same group and obtainapproximately the same results.*Specific form of reliability.*Determined by test-retest (different individuals administerthe test) correlational procedure.*Certain forms of measurement are more objective thanothers.
39Objectivity *More likely to take place with: 1. Complete and clear instructions for administrationand scoring.2. Administration of test by trained administrators.3. Use of simple measurement procedures.4. Use of appropriate mechanical tools ofmeasurement.5. Numerical scores; phrases or terms less likely toreflect objectivity.
40Administrative Feasibility Administrative considerations may determine which test you use.1. Cost2. Time3. Ease of administration4. Scoring5. NormsGood sports skills test will be similar to game performance.