The Psychometrics Behind Neurocognitive Evaluation for Concussion Philip Schatz, PhD Department of Psychology Saint Joseph’s University

The Psychometrics Behind Neurocognitive Evaluation for Concussion Philip Schatz, PhD Department of Psychology Saint Joseph’s University schatzSJU@gmail.com

Disclosures Consulting/support: International Brain Research Foundation Department of Defense Sports Concussion Center of New Jersey ImPACT Applications, Inc. Disclaimer: No role in the conceptualization, design, collection or analysis of data, manuscript preparation or decision to submit for publication.

Concussion Publications – to date

Concussion Publications-projected

Concussion Publications-psychometrics

1.Basics of correlation and variance 2.Psychometric properties of concussion tests, in context of: –common psychological tests –other tests 3.Psychometric properties of a two-factor theory of concussion Overview

Neither Reliable or Valid Highly Reliable but Not Valid Highly Reliable and Valid Reliability vs.Validity

Variance: What We Have Learned

Variance: What We Expect

Variance: What We Often See

Psychometric Issues Reliability in a nutshell: Test-retest reliability assumes: Fluctuations/changes are due to deficiencies in measure Human behavior does not deviate from Time 1->Time 2 We are measuring traits and not states

Psychometric Issues Test-retest reliability assumes: Fluctuations/changes are due to deficiencies in measure Human behavior does not deviate from Time 1->Time 2 We MAY BE measuring states and not traits Broglio, et al., 2007: 118 student “volunteers” completed: ImPACT, HeadMinders, CogSport, MACT One test session 40 subjects (34%) had invalid baselines Cameron, Schatz (unpublished thesis): 90 student “volunteers” completed ImPACT back-to-back One test session 18 subjects (20%) had invalid baselines An additional 15 subjects (21%) had “red flag” scores (<1.5 SD)

Psychometric Issues Reliability in a nutshell: Random error: Situational fluctuations or changes in mood or environment sleep, fatigue, diet, metabolism distractions, noise, equipment

Psychometric Issues Random error: Situational fluctuations or changes in mood or environment sleep, fatigue, diet, metabolism Athletes sleeping <7 hrs performed worse on 3/4 ImPACT composite scores, and endorse more symptoms (McClure, et al, In Review, AJSM) distractions, noise, equipment Athletes in Group Setting scored significantly worse than athletes tested in Individual setting on: Verbal: 83.4 vs 86.5 (p=.003) Visual: 71.6 vs 76.7 (p=.0001) Motor: 35.6 vs 38.4 (p=.0001) RT: 0.61 vs 0.57 (p=.001) ( Moser et al., 2001, AJSM)

Psychometric Issues Reliability in a nutshell: Systematic error: Factors that consistently effect measurement across sample practice effects increased exposure, familiarity with measure, device

Psychometric Issues Evidence of Systematic error on ImPACT?: e.g., practice effects No significant differences on Any Composite Score: Back-to-back Administrations (Cameron, Schatz: MS thesis) Pre-Season->Mid-Season->Post-Season (Miller et al., 2007) Significant improvement on: Processing Speed at 30 days, 1 year (Schatz, Ferris, 2013, Elbin et al, 2011) Vis Memory, RT at 1 year (Elbin et al, 2011)

Psychometric Issues

Reliability in a nutshell: Can we measure or distinguish between specific types of “error” Can we measure or distinguish between specific “error” at X 1 versus X 2 ?

Psychometric Issues Reliability in a nutshell: Can we measure or distinguish between specific types of “error” Can we measure or distinguish between specific “error” at X 1 versus X 2 ? Cameron (unpublished MS thesis): 90 student “volunteers” completed ImPACT back-to-back Using Iverson’s RCI cut-offs: 8% showed significant decreases at T2 on Verbal Mem 8% showed significant decreases at T2 on Visual Mem 7% showed significant decreases at T2 on Motor Speed 7% showed significant increases at T2 on Reaction Time 26% showed significantly worse performance at T2 on 1 composite score

Psychometric Issues: Reliability How do we measure reliability? Pearson’s r? Intra-class correlations? “There is literally no such thing as the reliability of a test, unqualified; the coefficient has meaning only when applied to specific populations” Streiner and Norman, 1995

Psychometric Issues: Reliability How do we measure reliability? Pearson’s r: general measure of strength of linear relationship considered a weak measure of reliability when group means are similar but there is variation in individual scores does not allow for correlation of multiple trials “inter-class” correlation, does not account for variation within trials cannot detect “systematic error” (e.g., practice effects; Weir, 2005)

Psychometric Issues: Reliability How do we measure reliability? Pearson’s r: Example considered a weak measure of reliability when group means are similar but there is variation in individual scores Back-to-back administrations of ImPACT Similar Group Means: 94.5 to 92.7 Similar Standard Deviation: 4.8 to 5.6 t(48)=1.22. p=.23 r=.01

Psychometric Issues: Reliability How do we measure reliability? Intra-Class Correlation Coefficient (ICC): originally developed for analysis of “inter-judge” (inter- rater) effects large differences between “judges” will result in low coefficients indicates proportion of variability in the measure (e.g., mean) that is due to variation between individuals as applied to test-retest reliability ICC is used to analyze “trial-to-trial” consistency Thus, reflective of the reliability of the measure

Psychometric Issues: Reliability Five published articles on reliability of ImPACT, listed chronologically: Iverson, G., Lovell, M. R., & Collins, M. W. (2003). Interpreting change on ImPACT following sport concussion. Clin Neuropsychol. Broglio, S. P., Ferrara, M. S., Macciocchi, S. N., Baumgartner, T. A., & Elliott, R. (2007). Test-retest reliability of computerized concussion assessment programs. J Athl Train Schatz, P. (2009). Long-term test-retest reliability of baseline cognitive assessments using ImPACT. Am J Sports Med Elbin, R. J., Schatz, P., & Covassin, T. (2011). One-Year Test-Retest Reliability of the Online Version of ImPACT in High School Athletes. Am J Sports Med Schatz, P., Ferris. C. (2013). One-month test-retest reliability of the ImPACT test battery. Arch Clin Neuropsych

Psychometric Issues: Reliability Update to Broglio’s 2007 study: Nakayama (MSU Dissertation) replicated Broglio’s 2007 study using only ImPACT. Nakayama used ACSM standard for “athletically active” 75mod-150vig min/wk cardio, 2-3 days/wk resistance training <3% of subjects had Invalid results (vs. 34% for Broglio) Higher ICCs across all Composite scores

ImPACT: Reliability Data

Test-retest reliability of other Working Memory measures: ImPACT (VrM)1 monthICC.79Schatz, Ferris, 2013 ImPACT (VrM)45 daysICC.76Nakayama, 2013 ImPACT (VrM)1 yearICC.62Elbin, et al, 2011 CogSport (WM):1 yearICC.51Collie, et al., 2001 ImPACT (VrM)2 yearsICC.46Schatz, 2010 ANAM (CPT)1 weekICC.32Segalowitz, et al 2007 CogSport (WM)1 hourICC.24Collie, et al., 2001 Digit Span60 daysr.70Barr, et al., 2003 WMS (LM)11 monthsr.70Tulsky, et al., 2003 WMS (VR)11 monthsr.62Tulsky, et al., 2003 WMS (PA)11 monthsr.57Tulsky, et al., 2003 RAVLT1 yearr.55Snow, et al, 1988 RVDLT-R1 monthr.45Benedict, 1997 Working Memory: Reliability Data

Reaction Time: Reliability Data

Test-retest reliability of other Pro. Speed/Coding measures: ImPACT (PS)1 monthICC.88Schatz, Ferris, 2013 ImPACT (PS)45 daysICC.86Nakayama, 2013 ImPACT (PS)1 yearICC.82Elbin, et al, 2011 CogSport (CM)1 weekICC.76Collie et al, 2003 ImPACT (PS)2 yearsICC.74Schatz, 2010 ANAM (CDS)1 weekICC.54Segalowitz, et al 2007 SDMT:10 daysr.74Hinton-Bayre, et al. 1997 Digit Symbol60 daysr.73Barr, et al., 2003 Tapping:6 monthsr.71Ruff, Parker, 1993 Trails B:60 daysr.65Valovich 2006, Barr 2003 BVMT-R55 daysr.60Benedict, 1997 PsychoMotor Speed: Reliability Data

Test-retest reliability of other/common tests: Systolic BP3 monthsr.50 Diastolic BP3 monthsr.53 Heart Rate4 visitsICC.56 Heart Rate1 weekICC.74 Gluc. Metabimmediate r.77 BESS60 days r.70 Field Sobriety/Blood ETOH: Actual BACimmediater.97 Saliva ETOH10 minsr.90 1-leg Standimmediater.61 Arrest Decis.immediater.54 Est. BACimmediater.68 Other Tests: Reliability Data

“State-Trait” Issues

Two-Factor Theory Rationale Verbal Memory: Information presented visually Can be encoded verbally Visual Memory: Information presented visually Can not be easily encoded verbally Reaction Time: Speed of responses: Simple Choice->Complex Choice Visual Motor Speed: Speed of information processing Confusion in interpretation Simplified by using “Memory”, “Speed”?

Two-Factor Theory Factor analysis: Reduce a larger number of variables to a smaller number of factors Analogy: see bumps under covers on bed, hear laughing one “cluster” of bumps moves in one direction the other “cluster” moves in another direction identify them as “Child 1” and “Child 2” each “Child” is a unique “Factor” Can also be used to select a subset of variables from a larger set, based on which variables have the highest correlations with the principal components (or factors)

Two-Factor Theory Factor analysis results: Baseline Group (N=22k)Concussion Group (N=560)

Two-Factor Theory

Factor analysis results (data from Schatz & Sandel, 2012) Baseline GroupConcussion Group

Two-Factor Theory Factor analysis results (data from Schatz & Sandel, 2012)

Two-Factor Theory Calculated Z-scores, using normative data (Mean, SD) for both baseline and post-concussion scores: Baseline: Z=Athlete’s Score – Baseline Mean Baseline SD Post-concussion: Z=Athlete’s Post-concussion Score – Baseline Mean Baseline SD Averaged Verbal/Visual, Visual Motor/Reaction Time

Two-Factor Theory Calculated Z-scores, using normative data (Mean, SD) for both baseline and post-concussion scores:

Highly Reliable and Valid Validity The extent to which a test measures what it is intended to measure. Traditionally achieved using a criterion group (e.g., clinical, diagnosed) and a control group (e.g., absence of diagnosis) Expressed in terms of “sensitivity” and “specificity”

Validity Calculating sensitivity Correct “ positive ” hits = 81.9% (e.g., the probability that a test result will be positive when a concussion is present)

Validity Calculating specificity Correct “ negative ” hits = 89.4% (e.g., the probability that a test result will be negative when a concussion is not present)

Sensitivity of “concussion” measures: Sensitivity ImPACT (online-72h)91%Schatz, Sandel, 2013 ImPACT (desktop-72h)82%Schatz, et. al., 2005 PnP, Posture, Sym96%Broglio, et al., 2007 ImPACT, Posture, Sym92%Broglio, et al., 2007 ImPACT (desktop-24h)79%Broglio, et al., 2007 HeadMinder CRI79%Broglio, et al., 2007 Symptoms68%Broglio, et al., 2007 Posture62%Broglio, et al., 2007 Pencil/Paper (battery)44%Broglio, et al., 2007 BESS, SAC, PnP56%McCrea, et al., 2005 PnP battery (Day 2)23%McCrea, et al., 2005 Validity Data

Sensitivity/Specificity measures: SensSpec. ImPACT (ONL-72hr)91%69%Schatz, Sandel, 2013 ImPACT (DT-72hr)82%89%Schatz, et. al., 2005 SAC (immediate)94%76%McCrea, et al., 2001 RapScrCon, Tr B (24h)70%74%DeMonte, et al, 2010 Full Battery (Day 2)56%79%McCrea, et al., 2005 ANAM/SOT*50%96%Register-Mihalik et al, 2012 Symptoms (Day 2)27%100%McCrea, et al., 2005 Symptom Clusters (D2)47%77%Lau, et al., 2011 BESS (Day 2)24%91%McCrea, et al., 2005 PnP Battery (Day 2)23%93%McCrea, et al., 2005 SAC (Day 2)22%89%McCrea, et al., 2005 *Sensory Organization Test Validity Data

Sensitivity/Specificity of common medical conditions: SensSpec. ImPACT (online)91%69%Schatz, Sandel, 2013 ImPACT (desktop)82%89%Schatz, et. al., 2005 Oxidative Stress (Alz)88%70%Lopez, et al., 2013 HBP (Hypertension)84%82%Nascimento, et al., 2011 Mammogram (1yr)82%91%Hofvind, et al., 2012 Echocardiogram77%61%Tanaka, et al., 2010 Stress Echo76%87%Sicari, et al., 2007 Prostate Exam75%44%Ojewola, et al, 2013 PSA Test (>4)72%46%Rashid, et al, 2012 Cholesterol ‘At-Risk’71%76%Gelsky, et al., 1994 Rapid Strep Test65%97%Gurol, et al., 2010 Validity Data

Applied to Validation Data (Schatz & Sandel, 2012): Two-Factor versus composite score and sub-scale score sensitivity and specificity. Two-Factor Theory

“Psychometric” Issues? Is concussion testing falling under a unique level of scrutiny? Is there an ulterior motive for the criticism of the psychometric properties of computer-based concussion tests and not other tests? Would a more reliable instrument be valid (e.g. crystallized intelligence) Is it necessary to focus solely on one measure (e.g., ImPACT), as part of a more comprehensive assessment, when: other measures have equal or worse psychometrics lone measures are not recommended for concussion diagnosis/management

Collaborators: Tracey Covassin, Ph.D. Mickey Collins, Ph.D. RJ Elbin, Ph.D. Robin Karpf, M.D. Anthony Kontos, Ph.D. Mark Lovell, Ph.D. Rosemarie Moser, Ph.D. Summer Ott, Psy.D. Gary Solomon, Ph.D. Student Collaborators: Nicole Cameron Charles Ferris Timothy Kelley Stacey Robertshaw Natalie Sandel

The Psychometrics Behind Neurocognitive Evaluation for Concussion Philip Schatz, PhD Department of Psychology Saint Joseph’s University

Similar presentations

Presentation on theme: "The Psychometrics Behind Neurocognitive Evaluation for Concussion Philip Schatz, PhD Department of Psychology Saint Joseph’s University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Psychometrics Behind Neurocognitive Evaluation for Concussion Philip Schatz, PhD Department of Psychology Saint Joseph’s University

Similar presentations

Presentation on theme: "The Psychometrics Behind Neurocognitive Evaluation for Concussion Philip Schatz, PhD Department of Psychology Saint Joseph’s University"— Presentation transcript:

Similar presentations

About project

Feedback