Checking on reliability of the data we collect  Compare over time (test-retest)  Item analysis  Internal consistency  Inter-rater agreement

Compare over time Test-Retest reliability  One sample at two (or more) times  Very convincing in theory  Often hard to do in practice  Time interval? Memory effects? Special sample?  Correlation of time-1 answers with time-2 answers  Other approaches are often approximations of this idea

Split-half reliability  Easier than test-retest checks  Requires only one time point  Works when there is a scale or set of questions on a single topic  Divide the items into two sets (two halves)  Correlate the scores on the two halves  Often adjusted by the Spearman-Brown correction  Gives us an estimate of the test-retest reliability

Internal consistency reliability When there is a "scale" or set of questions on a single topic  Cronbach's coefficient alpha  a measure of "internal consistency"  Look at all of the items  Check the "average correlation"  Then adjust for the number of items  Find items that do not correlate with others  Check the item-total correlations  If low, delete these or move them elsewhere  Assess the overall internal consistency

Internal consistency reliability Comparing answers from different sources  Compare similar questions that appear in different parts of the questionnaire  Compare answers from different places during an interview  Compare interview responses with questionnaire responses  Compare questionnaires with actual observations

Inter-rater agreement  Useful in checking on coding open-ended answers, observations, etc.  Try this on a sample or pilot study  Check the overall percent agreement  Sometimes we adjust for "chance agreement" -- Cohen's Kappa  A very important step in lots of studies  If agreement is high, then okay to rely on one primary coder or rater  If not high, then perhaps we need more than one rater  Or perhaps we need to revise or clarify the coding rules  Then check on things again  There are often several iterations here  Keep going until the agreement is acceptable

Check out some examples  Bayley Scales of Infant Development  Inter-rater agreement example  Internal consistency example  Then try some clicker questions!

Observing students and teachers in classrooms. What type of reliability check is most important? 1.Inter-observer agreement (have more than one observer) 2.Time 1 - Time 2 (Observe at two or more times) 3.Consistency within the classroom sessions 4.Other 1.Inter-observer agreement (have more than one observer) 2.Time 1 - Time 2 (Observe at two or more times) 3.Consistency within the classroom sessions 4.Other

Coding transcripts from individual interviews What type of reliability check is most helpful? 1.Have multiple transcribers 2.Inter-rater agreement 3.Internal consistency checks 4.Other 1.Have multiple transcribers 2.Inter-rater agreement 3.Internal consistency checks 4.Other

Using answers from questionnaires. What type of reliability check is most important? 1.Inter-rater agreement 2.Internal consistency checks 3.Item-analysis checks 4.Other 1.Inter-rater agreement 2.Internal consistency checks 3.Item-analysis checks 4.Other

Using a mix of open-ended and closed- ended questions on a questionnaire. Why is this a good idea? 1.Internal consistency checks 2.Makes replying less boring 3.Terry has said this about 50 times, so it must be a good idea 4.Other 5.All of the above 1.Internal consistency checks 2.Makes replying less boring 3.Terry has said this about 50 times, so it must be a good idea 4.Other 5.All of the above

