Presentation on theme: "Measurement in Psychology I: RELIABILITY Lawrence R. Gordon."— Presentation transcript:
Measurement in Psychology I: RELIABILITY Lawrence R. Gordon
Do you support the civil union legislation? §What are some of the ways in which you can ask this question? §How do you measure the response (operational definitions)?
Levels of Measurement §Nominal scales l giving names to data, putting into categories l Examples: sex, race labels; baseball uniform numbers §Ordinal scales l numbers give order but not distance l Examples: mailbox numbers; class rankings
Levels of Measurement (cont.) §Interval scales l numbers indicate order and distance (they are separated by equal distances or intervals) l Example: Fahrenheit temperature §Ratio scales l numbers indicate order, distance, AND have a true zero point (zero = there isn’t any) l Examples: height; weight; miles per hour; time
Levels of Measurement Example Auto race which started at 2 pm
Closed vs. Open Responses §Closed responses (a.k.a. forced choice) l Examples (rate civil union support on a scale 1 to 9) l Advantages you know what the responses will be (or what they should be!) because of restrictions on choice easy to empirically evaluate (relatively) gives data that gives a straightforward answer to how you ask your question coding not necessary, usually
Closed vs. Open Responses §Closed responses (a.k.a. forced choice) l Disadvantages may not be sensitive enough to get some interesting information will not give you as clear an indication of what participants think/feel/report §“Do you agree that same-sex couples should have the right to marry/civil union?” 123456789123456789 Disagree Agree Completely
Closed vs. Open Responses §Open responses (a.k.a. free response) Examples (Do you support the civil union legislation? Why?) §Example from the survey used the first day? §“Please describe yourself in 12 words or less” more on this in a bit... §Advantages gives any answer participant wants not restricted by choices
Closed vs. Open Responses §Open responses (cont.) l Disadvantages have to code to empirically evaluate (time intensive, need to find people who will do it) reliability issues!
Reliability §Consistency (stays the same) §Repeatable (get the same results again and again) §Measures need to be reliable to be good measures §Now, some nitty-gritty...
Reliability (cont.) §Measuring closed responses l you don’t need to put things into categories l reliable over time (do you get the same answers again and again?) l if the answers vary greatly from one time of measurement to the next, the measurement is not reliable
Reliability (cont.) §Measuring closed responses (cont.) l scales (sets of questions designed to measure something) need to be given multiple times, or in multiple forms, and the answers must remain similar for the scale to be reliable l Example (personality scale?) §Types of reliability l Stability (“test-retest reliability”) l Equivalence (“parallel forms reliability”) l Consistency (“split-half reliability”) l Homogeneity (“internal consistency reliability”)
Reliability Quick Example Any test, scale, inventory with items: E.g., a 50-item test, scored 0-50: Form A 9/4 9/4, Form A Examinee9/4 9/25Form A Form BOdd Even 1 George273527331512 2 Alice494649403019 3 Mary303530271317 4 Larry1610161979 5 Linda272427201017 6 Doug404240482218 7 Chuck211821351011 8 Judy423942351923 Test-retest: Form A, 9/4 vs 9/25 (“r=.92")Stability Parallel forms: Form A vs Form B, 9/4 (“r=.69")Equivalence Cross form: Form A 9/25 vs Form B 3/19 (“r=.72")Stab & Equiv Split-half: Odd vs Even, Form A 9/4 (“r=.79")Consistency Alpha reliability No example – data from all 50 itemsInternal consistency
Reliability (cont.) §Measuring open responses l Will often code into categories (Examples) l How do you assess reliability?
Reliability (cont.) §Measuring open responses (cont.) l Does everyone put the response into the same category? If yes, you have good inter-coder reliability l more specific operational definitions will increase this reliability §Coding personality responses into categories l Using positive, negative, and neutral descriptors
Reliability (cont.) §Measuring behavioral responses through observation l special cases of open response, can’t really control what participants do l coding and/or rating what you observe l reliability of ratings (interrater reliability? If all raters agree on the rating, then yes.) l need to be very clear on operational definitions §Baggage claim study (Scherer & Ceschi, 2000)
Assessing Reliability §Steps l decide on operational definitions of your variables and scale(s) of measurement l train your coders/raters, answer questions, and alleviate confusion l do the coding and rating l compare responses l were the measurements reliable?
Reliability Exercise §Measuring your personality §Looking for “big” traits l defining big traits and training coders l The Big Five Personality Factors 1. Open to Experience (O) vs. Closed to Experience (NO) 2. Conscientious (C) vs. Nonconscientious (NC) 3. Extraverted (E) vs. Introverted (NE) 4. Agreeable (A) vs. Unagreeable (NA) 5. Neurotic (N) vs. Nonneurotic (NN) §Which one best fits the description? §Do the coding!
Reliability Exercise §Measuring your personality §Looking for “big” traits §compare responses to other coders l intercoder reliability l List number on which you agreed l List number on which you disagreed l Calculate the percentages §were the measurements reliable?
And for next time… is reliability enough? §If your measurement is reliable, does that mean that it is good? §Does being reliable make your measurement valid?