Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores.

Presentation on theme: "Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores."— Presentation transcript:

Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores. Test-retest reliability (coefficient of stability) - Correlate two administrations of the same test. Parallel form reliability (coefficient of equivalence) - Correlate two forms of the same test Split half reliability (Spearman-Brown prophecy formula) - Correlate two halves of the test Internal consistency reliability (Cronbach α) - Correlate every item with every other item.

TX1X1 Reliability is the extent to which your observed score represents your true score E = X – T The test yielding the score of X 1 is more reliable than that giving X 2 X2X2

Reliability is the extent to which individual differences or rank ordering of individuals based on the observed scores represent that based on the true scores. One operations of this definition is the correlation between observed scores and true scores,  XT, which is called reliability index. Another operation is the squared correlation between observed score and true score or the proportion of observed score variance that is true score variance, or proportion of the consistent rank ordering,  XT 2 T X

In reality, it is the extent to which two tests yield similar results or similar rank ordering of the individuals,   XX’ X’ X Test-retest Parallel form Split half Internal consistency

When ρxx' = 1, 1.the measurement has been made without error (e=0 for all examinees). 2.X = T for all examinees. 3.all observed score variance reflects true-score variance. 4.all difference between observed scores are true score differences. 5.the correlation between observed scores and true scores is 1. 6.the correlation between observed scores and errors is zero.

When ρxx’ = 0, 1.only random error is included in the measurement. 2.X = E for all examinees. 3.all observed score variance reflects error variance. 4.all difference between observed scores are errors of measurement. 5.the correlation between observed scores and true scores is 0. 6.the correlation between the observed scores and errors is 1.

When ρxx’ is between zero and 1, 1.the measurement include some error and some truth. 2.X = T + E. 3.observed score variance include true-score and error variance. 4.difference between scores reflect true-score differences and error. 5.the correlation between observed scores and true scores is reliability. 6.the correlation between observed scores and error is the square root of 1 – reliability.

Validity - The extent to which a test or instrument truly measures what it is expected to measure. - The use of a bathroom scale to measure weight is valid whereas the use of a bathroom scale to measure height is invalid. Content validity refers to the extent to which the items on a test are representative of a specified domain content. Achievement and aptitude (but not personality and attitude) tests are concerned with content validity. Construct validity refers to the extent to which items on a test are representative of the underlying construct, e.g., personality or attribute. Personality and attitude tests are concerned with construct validity. The process to establish construct validity is referred to as construct validation. Criterion related validity, including predictive validity and concurrent validity, refers to the extent to which a test correlates with future behaviors which the test is intended to predict.

0.66 0.82 0.74 0.67 0.82 0.79 0.72 0.38 0.80 0.56 0.43 -0.04 0.65 0.64 0.65 0.05 0.18 0.42 Easygoing Responsivenes s 0.54 0.65 0.40 0.61 0.58 0.46 0.39 0.52 0.64 0.37 0.82 0.57 0.63 0.74 0.67 0.68 0.64 0.69 0.65 0.48 0.59 0.68 0.62 0.48 0.59 0.68 0.62 Authoritative Parenting Authoritarian Parenting 0.84 0.94 0.85 1.04 0.88 0.93 0.78 1.01 - 0.38 Physical Punishment Non Reasoning Authoritarian Directiveness Verbal Hostility Warmth Inductive Reasoning Democratic Participation Construct Validity: Internal Structure

Communication Avoidance Social Withdrawal Assertive Leadership Behavioral Aggression Verbal Aggression.55.58.73.96.94.90.70.60.65.67.69.87.89.82 Perceived Social Competence Time 1 Perceived Social Competence Time 2 Peer Acceptance Time 1 Peer Acceptance Time 2.59.50.54.65.62.66 Single Indicator.54.24 -.38 -.16 -.24 -.13 -.35 -.13.23.27.17 -.27 -.15 -.17 -.20 Construct Validity: Network Relations

A-LevelUniversity GPA SAT A-Level Criterion-Related Validity Concurrent Predictive

Rejected Selected Qualifying score Selected group Test Scores Criterion Distribution of criterion scores for selected group Distribution of scores on the criterion if no examinees were excluded Restriction of Range Effect

Download ppt "Reliability - The extent to which a test or instrument gives consistent measurement - The strength of the relation between observed scores and true scores."

Similar presentations