Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine 30(4): 1-25, 2000.

Slides:



Advertisements
Similar presentations
A Spreadsheet for Analysis of Straightforward Controlled Trials
Advertisements

Research Skills Workshop Designing a Project
Client Assessment and Other New Uses of Reliability Will G Hopkins Physiology and Physical Education University of Otago, Dunedin NZ Reliability: the Essentials.
Validity and Reliability
Controlled-Trial Designs See Batterham and Hopkins, Sportscience 9, 33-39, Choice determined by availability of subjects, reliability of dependent.
Measurement Concepts Operational Definition: is the definition of a variable in terms of the actual procedures used by the researcher to measure and/or.
Lesson 10: Linear Regression and Correlation
RELIABILITY Reliability refers to the consistency of a test or measurement. Reliability studies Test-retest reliability Equipment and/or procedures Intra-
Correlation Chapter 6. Assumptions for Pearson r X and Y should be interval or ratio. X and Y should be normally distributed. Each X should be independent.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Independent & Dependent Variables
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Evaluating Hypotheses
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
Chapter 7 Correlational Research Gay, Mills, and Airasian
Relationships Among Variables
Chapter 6 Random Error The Nature of Random Errors
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
AM Recitation 2/10/11.
Introduction to Linear Regression and Correlation Analysis
Basic Statistics Michael Hylin. Scientific Method Start w/ a question Gather information and resources (observe) Form hypothesis Perform experiment and.
Comparing Means From Two Sets of Data
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Calculations of Reliability We are interested in calculating the ICC –First step: Conduct a single-factor, within-subjects (repeated measures) ANOVA –This.
Topic 6.1 Statistical Analysis. Lesson 1: Mean and Range.
Unanswered Questions in Typical Literature Review 1. Thoroughness – How thorough was the literature search? – Did it include a computer search and a hand.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Probabilistic and Statistical Techniques 1 Lecture 24 Eng. Ismail Zakaria El Daour 2010.
6.1 Statistical Analysis.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
ANOVA. Independent ANOVA Scores vary – why? Total variability can be divided up into 2 parts 1) Between treatments 2) Within treatments.
Uncertainty & Error “Science is what we have learned about how to keep from fooling ourselves.” ― Richard P. FeynmanRichard P. Feynman.
Statistical analysis Outline that error bars are a graphical representation of the variability of data. The knowledge that any individual measurement.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
AP STATISTICS Section 5.2 Designing Experiments. Objective: To be able to identify and use different experimental design techniques. Experimental Units:
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Research Methodology and Methods of Social Inquiry Nov 8, 2011 Assessing Measurement Reliability & Validity.
Reliability: Introduction. Reliability Session 1.Definitions & Basic Concepts of Reliability 2.Theoretical Approaches 3.Empirical Assessments of Reliability.
Reliability: Introduction. Reliability Session Definitions & Basic Concepts of Reliability Theoretical Approaches Empirical Assessments of Reliability.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Chapter 13 Understanding research results: statistical inference.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Test-Retest Reliability (ICC) and Day to Day Variation.
1 Measuring Agreement. 2 Introduction Different types of agreement Diagnosis by different methods  Do both methods give the same results? Disease absent.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Stats Methods at IC Lecture 3: Regression.
Clinical practice involves measuring quantities for a variety of purposes, such as: aiding diagnosis, predicting future patient outcomes, serving as endpoints.
Statistical analysis.
Psych 231: Research Methods in Psychology
Basic Estimation Techniques
Statistical analysis.
Validity and Reliability
Samples or groups for comparison
2 independent Groups Graziano & Raulin (1997).
Basic Estimation Techniques
Kin 304 Inferential Statistics
Two Sample t-test vs. Paired t-test
Validity and Reliability
Controlled-Trial Designs
Psych 231: Research Methods in Psychology
Experimental Design: The Basic Building Blocks
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Validity and Reliability
MGS 3100 Business Analysis Regression Feb 18, 2016
Chapter Ten: Designing, Conducting, Analyzing, and Interpreting Experiments with Two Groups The Psychologist as Detective, 4e by Smith/Davis.
Presentation transcript:

Measures of Reliability in Sports Medicine and Science Will G. Hopkins Sports Medicine 30(4): 1-25, 2000

Measurement Error & Reliability Measurement error makes observed value differ from the true value. Reliability refers to reproducibility of values in repeated trials on the same subjects. The purpose is to quantify random error or ‘noise’. The smaller the random error, the better the measure.

Measures of Reliability Within-Subject Variation – Affects the precision of estimates of change in the variable of an experiment. – The smaller the within-subject variation, the easier it is to measure change in performance. Change in the Mean has two components: – Random change due to sampling error – Systematic change (learning or training effect) Retest Correlation-- represents how closely one trail matches another trial.

Within-Subject Variation Within-subject variation is the random variation in an individual over trials. Given 6 trials of subject 1: 71, 76, 74, 79, 79, 76 The sd of the the within-subject variation is called the standard error of measurement (SEM). The SEM represents the ‘typical error’.

‘Typical Error’ To estimate ‘typical error’ use many subjects and a few trials.

Computing Typical Error Compute difference scores Compute SD of difference scores Divide SD of difference by Typical Error = 4.1 / Typical Error = 2.9

Typical Error as a Percentage For many measures the typical error gets bigger as the value gets bigger. Athlete 1 has a mean & typical error of:  4.4 Athlete 2 has a mean & typical error of:  6.1 When the typical error is expressed as a percent of their respective means the values are similar: 1.2 and 1.3% This form of typical error is a Coefficient of Variation. Since it is a dimensionless measure it allows direct comparison of reliability.

Change in the Mean The change in the mean as a measure of reliability is has two components: – Random change due to sampling error. – Systematic change due to: learning effects, fatigue, lack of motivation or training effects. Be sure to give the subjects sufficient training to acclimate to the experiment before beginning, to avoid learning effects.

Retest Correlation The retest correlation (r) is not as good of measure of reliability as ‘typical error’. The retest r is sensitive to heterogeneity (spread) of values between participants. The ‘typical error’ can be estimated from a sample that isn’t even particularly representative. You cannot compare the reliability of two measures based upon their retest r alone, the retest r can change with a different sample, if the hetergeneity is different.

Threshold for a ‘Real Change’ 1.5 to 2.0 times ‘typical error’ represents a real change. Ex: if ‘typical error’ for the sum of 7 skinfolds is 1.6 mm an observed change of at least 2 to 3 mm would indicate a real change. The value of ‘typical error’ must come from a short time period (1-2 days for skinfold), in which there is no change in the subjects between trials.

Estimation of Sample Size To use ‘typical error’, the sample duration must be the same as intended study. The ‘typical error’ of the dependent variable represents the noise that obsures the change in the mean from pre to post. Using ‘typical error’ the sample sizes will tend to be unrealistical large. Sample size should be chosen to give adequate precision for an outcome. Precision is defined by 95% confidence intervals. – The range in which the true value is 95% likely to occur.

Estimation of Sample Size In a (pre - post) design, statistical theory predicts a confidence limits:  t 0.975, n-1  s   2 /  n – n is sample size – s is ‘typical error’ – t is t statistic Equating this to the confidence limits representing adequate precision (  d) n = 2(t  s / d) 2 = 8 s 2 / d 2

Sample Size and Reliability Sample size is proportional the ‘typical error’ squared. Reduce ‘typical error’ and you need fewer subjects. When the ‘typical error’ equals the smallest worthwile effect (s = d) you only need 10 subjects. A test with twice the typical error would require 4 times the subjects.

Estimation of Individual Differences Individual differences occur when the response to a treatment differs between subjects. To estimate individual differences (S diff ) S diff =  (2s 2 expt - 2s 2 ) where s expt is inflated typical error of experimental group and s is the typical error in control group (or from a reliability study).

Acceptable Likely Range for Typical Error 15 sub, 4 trials, typical error 1% True typical error = 1% * 1.24 to 1%  1.24 = 1.24 to sub, 3 trials reduces the factors to

Analysis of Simple Studies Analysis of reliability with 2 trials is straight forward: compute typical error from difference scores, and the change in the mean is simply the mean difference. For 3 or more trials, check for learning effects by comparing consecutive pairs (trials 1&2, trials 2&3…). Download the spreadsheet from SportSci.Org

Excel Reliability (sportsci.org) Typical error = 1.2 /  2 Typical error =.83

Intraclass Correlation ICC(3,1) For a retest correlation measure of reliability, the ICC (3,1) [Shrout & Fleiss] is unbiased for any sample size. Use of ICC is appropriate with more than 2 trials. To caluclate ‘typical error’ from ICC: s = S  (1 - r), where s is typical error and S is the ave sd for subjects in each trial and r is the ICC.

Reliability Between Different Equipment, Methods, Installations Use ICC (2,1) when retesting subjects on different equipment, methods or installations. The ICC (2,1) is derived from the fully- random model, where subjects and trials are considered as random effects. Researchers have often misapplied the ICC (2,1) to data from a single item of equipment.