Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.

Reliability and Validity

Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge the relative success (or failure) in measuring various concepts? Reliability – consistency of measurement Reliability – consistency of measurement Validity – confidence in measures and design Validity – confidence in measures and design

Reliability and Validity Reliability focuses on measurement Reliability focuses on measurement Validity also extends to: Validity also extends to: Precision in the design of the study – ability to isolate causal agents while controlling other factors Precision in the design of the study – ability to isolate causal agents while controlling other factors (Internal Validity) (Internal Validity) Ability to generalized from the unique and idiosyncratic settings, procedures and participants to other populations and conditions Ability to generalized from the unique and idiosyncratic settings, procedures and participants to other populations and conditions (External Validity) (External Validity)

Reliability Consistency of Measurement Consistency of Measurement Reproducibility over time Reproducibility over time Consistency between different coders/observers Consistency between different coders/observers Consistency among multiple indicators Consistency among multiple indicators Estimates of Reliability Estimates of Reliability Statistical coefficients that tell use how consistently we measured something Statistical coefficients that tell use how consistently we measured something

Measurement Validity Are we really measuring concept we defined? Are we really measuring concept we defined? Is it a valid way to measure the concept? Is it a valid way to measure the concept? Many different approaches to validation Many different approaches to validation Judgmental as well as empirical aspects Judgmental as well as empirical aspects

Key to Reliability and Validity Concept explication Concept explication Thorough meaning analysis Thorough meaning analysis Conceptual definition: Conceptual definition: Defining what a concept means Defining what a concept means Operational definition: Operational definition: Spelling out how we are going to measure concept Spelling out how we are going to measure concept

Four Aspects of Reliability: 1. Stability 1. Stability 2. Reproducibility 2. Reproducibility 3. Homogeneity 3. Homogeneity 4. Accuracy 4. Accuracy

1. Stability Consistency across time Consistency across time repeating a measure at a later time to examine the consistency repeating a measure at a later time to examine the consistency Compare time 1 and time 2 Compare time 1 and time 2

2. Reproducibility Consistency between observers Consistency between observers Equivalent application of measuring device Equivalent application of measuring device Do observers reach the same conclusion? Do observers reach the same conclusion? If we don’t get the same results, what are we measuring? If we don’t get the same results, what are we measuring? Lack of reliability can compromise validity Lack of reliability can compromise validity

3. Homogeneity Consistency between different measures of the same concept Consistency between different measures of the same concept Different items used to tap a given concept show similar results – ex. open-ended and closed-ended questions Different items used to tap a given concept show similar results – ex. open-ended and closed-ended questions

4. Accuracy Lack of mistakes in measurement Lack of mistakes in measurement Increased by clear, defined procedures Increased by clear, defined procedures Reduce complications that lead to errors Reduce complications that lead to errors Observers must have sufficient: Observers must have sufficient: Training Training Motivation Motivation Concentration Concentration

Increasing Reliability General: General: Training coders/interviewers/lab personnel Training coders/interviewers/lab personnel More careful concept explication (definitions) More careful concept explication (definitions) Specification of procedures/rules Specification of procedures/rules Reduce subjectivity (room for interpretation) Reduce subjectivity (room for interpretation) Survey measurement: Survey measurement: Increase the number of items in scale Increase the number of items in scale Weeding out bad items from “item pool” Weeding out bad items from “item pool” Content analysis coding: Content analysis coding: Improve definition of content categories Improve definition of content categories Eliminate bad coders Eliminate bad coders

Indicators of Reliability Test-retest Test-retest Make measurements more than once and see if they yield the same result Make measurements more than once and see if they yield the same result Split-half Split-half If you have multiple measures of a concept, split items into two scales, which should then be correlated If you have multiple measures of a concept, split items into two scales, which should then be correlated Cronbach’s Alpha or Mean Item-total Correlation Cronbach’s Alpha or Mean Item-total Correlation

Reliability and Validity Reliability is a necessary condition for validity Reliability is a necessary condition for validity If it is not reliable it cannot be valid If it is not reliable it cannot be valid Reliability is NOT a sufficient condition for validity Reliability is NOT a sufficient condition for validity If it is reliable it may not necessarily be valid If it is reliable it may not necessarily be valid Example: Example: Bathroom scale, old springs Bathroom scale, old springs

Not Reliable or Valid

Reliable but not Valid

Reliable and Valid

Types of Validity 1. Face validity 1. Face validity 2. Content validity 2. Content validity 3. Pragmatic (criterion) validity 3. Pragmatic (criterion) validity A. Concurrent validity A. Concurrent validity B. Predictive validity B. Predictive validity 4. Construct validity 4. Construct validity A. Testing of hypotheses A. Testing of hypotheses B. Convergent validity B. Convergent validity C. Discriminant validity C. Discriminant validity

Face Validity Subjective judgment of experts about: Subjective judgment of experts about: “what’s there” “what’s there” Do the measures make sense? Do the measures make sense? Compare each item to conceptual definition Compare each item to conceptual definition Do it represent the concept in question? Do it represent the concept in question? If not, it should be dropped If not, it should be dropped Is the measure valid “on its face” Is the measure valid “on its face”

Content Validity Subjective judgment of experts about: Subjective judgment of experts about: “what is not there” “what is not there” Start with conceptual definition of each dimension: Start with conceptual definition of each dimension: Is it represented by indicators at the operational level? Is it represented by indicators at the operational level? Are some over or underrepresented? Are some over or underrepresented? If current indicators are insufficient: If current indicators are insufficient: develop and add more indicators develop and add more indicators Example--Civic Participation questions: Example--Civic Participation questions: Did you vote in the last election? Did you vote in the last election? Do you belong to any civic groups? Do you belong to any civic groups? Have you ever attended a city council meeting? Have you ever attended a city council meeting? What about “protest participation” or “online organizing”? What about “protest participation” or “online organizing”?

Pragmatic Validity Empirical evidence used to test validity Empirical evidence used to test validity Compare measure to other indicators Compare measure to other indicators 1. Concurrent validity 1. Concurrent validity Does a measure predict simultaneous criterion? Does a measure predict simultaneous criterion? Validating new measure by comparing to existing measure Validating new measure by comparing to existing measure E.g., Does new intelligence test correlate with established test E.g., Does new intelligence test correlate with established test 2. Predictive validity 2. Predictive validity Does a measure predict future criterion? Does a measure predict future criterion? E.g., SAT scores: Do they predict college GPA? E.g., SAT scores: Do they predict college GPA?

Construct Validity Encompasses other elements of validity Encompasses other elements of validity Do measurements: Do measurements: A. Represent all dimensions of the concept A. Represent all dimensions of the concept B. Distinguish concept from other similar concepts B. Distinguish concept from other similar concepts Tied to meaning analysis of the concept Tied to meaning analysis of the concept Specifies the dimensions and indicators to be tested Specifies the dimensions and indicators to be tested Assessing construct validity Assessing construct validity A. Testing hypotheses A. Testing hypotheses B. Convergent validity B. Convergent validity C. Discriminant validity C. Discriminant validity

A. Testing Hypotheses When measurements are put into practice: When measurements are put into practice: Are hypotheses that are theoretically derived, supported by observations? Are hypotheses that are theoretically derived, supported by observations? If not, there is a problem with: If not, there is a problem with: A. Theory A. Theory B. Research design (internal validity) B. Research design (internal validity) C. Measurement (construct validity?) C. Measurement (construct validity?) In seeking to examine construct validity: In seeking to examine construct validity: Examine theoretical linkages of the concept to others Examine theoretical linkages of the concept to others Must identify antecedent and consequences Must identify antecedent and consequences What leads to the concept? What leads to the concept? What are the effects of the concept? What are the effects of the concept?

B. Convergent Validity Measuring a concept with different methods Measuring a concept with different methods If different methods yield the same results: If different methods yield the same results: than convergent validity is supported than convergent validity is supported E.g., Survey items measuring Participation: E.g., Survey items measuring Participation: Voting Voting Donating to money to candidates Donating to money to candidates Signing petitions Signing petitions Writing letters to the editor Writing letters to the editor Civic group memberships Civic group memberships Volunteer activities Volunteer activities

C. Discriminant (Divergent) Validity Measuring a concept to discriminate that concept from other closely related concepts Measuring a concept to discriminate that concept from other closely related concepts E.g., Measuring Maternalism and Paternalism as distinct concepts E.g., Measuring Maternalism and Paternalism as distinct concepts

Dimensions of Validity for Research Design Internal Internal Validity of research design Validity of research design Validity of sampling, measurement, procedures Validity of sampling, measurement, procedures External External Given the research design, how valid are Given the research design, how valid are Inferences made from the conclusions Inferences made from the conclusions Implications for real world Implications for real world

Internal and External Validity in Experimental Design Internal validity: Internal validity: Did the experimental treatment make a difference? Did the experimental treatment make a difference? Or is there an internal design flaw that invalidates the results? Or is there an internal design flaw that invalidates the results? External validity: External validity: Are the results generalizable? Are the results generalizable? Generalizable to: Generalizable to: What populations? What populations? What situations? What situations? Without internal validity, there is no external validity Without internal validity, there is no external validity

Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.

Similar presentations

Presentation on theme: "Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge.

Similar presentations

Presentation on theme: "Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge."— Presentation transcript:

Similar presentations

About project

Feedback