What is applied psychometrics?

Slides:

Advertisements

Similar presentations

Research Methods in Politics Chapter 14 1 Research Methods in Politics 14 Understanding Inferential Statistics.

Advertisements

Ecole Nationale Vétérinaire de Toulouse Linear Regression

STATISTICS Linear Statistical Models

STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.

STATISTICS Univariate Distributions

Measurements and Their Uncertainty 3.1

SJS SDI_161 Design of Statistical Investigations Stephen Senn Random Sampling I.

CS1512 Foundations of Computing Science 2 Lecture 20 Probability and statistics (2) © J R W Hunter,

1 Conclusions Ann Berrington University of Southampton.

1 Session 7 Standard errors, Estimation and Confidence Intervals.

1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, Gerrit Rooks Sociology of Innovation.

Chapter 7 Sampling and Sampling Distributions

Simple Linear Regression 1. review of least squares procedure 2

EC220 - Introduction to econometrics (chapter 1)

Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.

The basics for simulations

Chapter 4: Basic Estimation Techniques

1 Econ 240A Power Four Last Time Probability.

Chi-Square and Analysis of Variance (ANOVA)

Chapter 10 Estimating Means and Proportions

Chapter 4 Inference About Process Quality

Writing a Method Section

EC220 - Introduction to econometrics (review chapter)

EC220 - Introduction to econometrics (chapter 1)

Statistical Analysis SC504/HS927 Spring Term 2008

Psychology Practical (Year 2) PS2001 Correlation and other topics.

Putting Statistics to Work

AU 350 SAS 111 Audit Sampling C Delano Gray June 14, 2008.

Evaluation of precision and accuracy of a measurement

Chapter Twelve Multiple Regression and Model Building McGraw-Hill/Irwin Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.

Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.

Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.

Simple Linear Regression Analysis

Multiple Regression and Model Building

Chapter 16: Correlation.

January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.

Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Section 7-2 Estimating a Population Proportion Created by Erin.

Chapter 5 The Mathematics of Diversification

Item Response Theory in a Multi-level Framework Saralyn Miller Meg Oliphint EDU 7309.

EC220 - Introduction to econometrics (chapter 10)

© McGraw-Hill Higher Education. All rights reserved. Chapter 3 Reliability and Objectivity.

Chapter 4 – Reliability Observed Scores and True Scores Error

1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.

Overview of field trial analysis procedures National Research Coordinators Meeting Windsor, June 2008.

Item Response Theory. Shortcomings of Classical True Score Model Sample dependence Limitation to the specific test situation. Dependence on the parallel.

A Different Way to Think About Measurement Development: An Introduction to Item Response Theory (IRT) Joseph Olsen, Dean Busby, & Lena Chiu Jan 23, 2015.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: prediction Original citation: Dougherty, C. (2012) EC220 - Introduction.

BINARY CHOICE MODELS: LOGIT ANALYSIS

Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

Psychometrics Timothy A. Steenbergh and Christopher J. Devers Indiana Wesleyan University.

Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: binary choice logit models Original citation: Dougherty, C. (2012) EC220.

McMillan Educational Research: Fundamentals for the Consumer, 6e © 2012 Pearson Education, Inc. All rights reserved. Educational Research: Fundamentals.

Measurement Models: Exploratory and Confirmatory Factor Analysis James G. Anderson, Ph.D. Purdue University.

Measurement Models: Identification and Estimation James G. Anderson, Ph.D. Purdue University.

Chapter 2: Behavioral Variability and Research Variability and Research 1. Behavioral science involves the study of variability in behavior how and why.

Explanatory Factor Analysis: Alpha and Omega Dominique Zephyr Applied Statistics Lab University of Kenctucky.

SOCW 671: #5 Measurement Levels, Reliability, Validity, & Classic Measurement Theory.

Item Factor Analysis Item Response Theory Beaujean Chapter 6.

Chapter 6 - Standardized Measurement and Assessment

Demonstration of SEM-based IRT in Mplus

1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.

5. Evaluation of measuring tools: reliability Psychometrics. 2011/12. Group A (English)

From Data to Paper [via Stata!] Tim Croudace and Jon Heron ^ Jon works in Bristol too ;-) ESRC Funded Researcher Development Initiative Project Grant:

A Different Way to Think About Measurement Development:

Classical Test Theory Margaret Wu.

Evaluation of measuring tools: reliability

EPSY 5245 EPSY 5245 Michael C. Rodriguez

Chapter 8 VALIDITY AND RELIABILITY

Presentation transcript:

What is applied psychometrics? Tim Croudace tjc39@cam.ac.uk Department of Psychiatry John Rust jnr24@cam.ac.uk The Psychometrics Centre University of Cambridge

What is applied psychometrics? Professor John Rust http://www.ppsis.psychometrics.cam.ac.uk 2

Overview About the Centre What is psychometrics? Psychometrics today What we are doing now What we are going to do

The Psychometric Centre Educational and diagnostic eg Wechsler Organisational eg Watson-Glaser, Orpheus Statistical, IRT and AI techniques Computer languages eg Mplus, Stata, R Web based assessment BPS Level A and B courses Seminars, workshops and summer schools PhDs in psychometrics or related areas Tutorial materials on website www.psychometrics.ppsis.cam.ac.uk 4 4

Current activities Who we are (people) Announcement about summer schools Announcement about forthcoming workshops

What is psychometrics? “The science of psychological assessment” Much assessment is “high stakes” Questionnaires and social surveys Recruitment and staff development Licensing and chartering (eg Accountants, Surgeons) School and University examinations Psychiatric and ‘special needs’ diagnosis Credit ratings Career guidance Social awareness

Types of assessment First impressions Application forms and references Objective tests (on or off line) Projective tests Interviews Essays and examinations Research questionnaires and semi-structured interviews 7 7

The Psychometric Principles Maximizing the quality of assessment Reliability (freedom from error) Validity ( ‘... what is says on the tin’) Standardisation (compared with what?) Equivalence (is it biased?) Rust, J. & Golombok, S. (2009) Modern Psychometrics (3rd Edition): Taylor and Francis: London 8 8

Can everything be measured? “If anything exists it must exist in some quantity and can therefore be measured”. (Lord Kelvin 1824, 1907) In 1900, Lord Kelvin claimed "There is nothing new to be discovered in physics now. All that remains is more and more precise measurement."[ 9

The theory of true scores Whatever precautions have been taken to secure unity of standard, there will occur a certain divergence between the verdicts of competent examiners. If we tabulate the marks given by the different examiners they will tend to be disposed after the fashion of a gendarme’s hat. I think it is intelligible to speak of the mean judgment of competent critics as the true judgment; and deviations from that mean as errors. This central figure which is, or may be supposed to be, assigned by the greatest number of equally competent judges, is to be regarded as the true value ..., just as the true weight of a body is determined by taking the mean of several discrepant measurements. Edgeworth, F.Y. (1888). The statistics of examinations. Journal of the Royal Statistical Society, LI, 599-635. 10

The evolution of the Latent Trait Edgeworth, F.Y. (1888). The statistics of examinations. Journal of the Royal Statistical Society, LI, 599-635. With two measures of the same characteristic we can estimate true values. Melvin Novik and Frederick Lord (1968) “Statistical theories of mental test scores” use Classical Test Theory to derive Latent Trait Theory. Allan Birnbaum, in his supplement, established Item Response Theory of which Rasch Scaling is a special case. Today Latent Variable Analysis (LVA) is an integral part of statistical modelling in Psychometrics, Econometrics and Statistics. 11

What is applied psychometrics? Tim Croudace tjc39@cam.ac.uk Department of Psychiatry University of Cambridge

psychometry psycho·met·rics (sī′kō me′triks) Etymologically (from the Greek) psychometry means measuring the mind P. Kline (1979) “The meaning of psychometrics” p1

-definitions-definitions-definitions- Collins English Dictionary Psychometrics definition : psychometrics n the branch of psychology concerned with the design and use of psychological tests application of statistical & mathematical techniques to psychological testing dictionary.reverso.net/english-definition/psychometrics

What is psychometrics? The Science of Psychological Assessment “the branch of psychology dealing with measurable factors” Modern Psychometrics. by J. Rust & S. Golombok. Routledge. P 4

[From Wikipedia, the free encyclopedia] Psychometrics – Even Wikipedia has something to say … it doesn’t begin too promisingly!!! [From Wikipedia, the free encyclopedia] Psychometrics – Not to be confused with psychrometrics, the measurement of the heat and water vapor properties of air. For other uses of this term and similar terms, see (disambiguation). Psychometry [Redirected from Psychometry (disambiguation)] may refer to: Psychometry (paranormal) a form of extrasensory perception Psychometrics a discipline of psychology and education (getting warmer!!) And finally it begins to make sense … Psychometrics is the field of study concerned with the theory and technique of educational and psychological measurement, which includes the measurement of knowledge, abilities, attitudes, and personality traits. The field is primarily concerned with the construction and validation of measurement instruments, such as questionnaires, tests, and personality assessments.

What is ? [Psychometric] Test Theory Psychometric Test Theory …is essentially a collection of mathematical concepts that formalize and clarify certain questions about constructing and using tests [and scales] and then provide methods for answering them R.P. McDonald (1999) Test Theory: a unified treatment. LEA. P 9

What is psychometrics. Item Response Theory (IRT) What is psychometrics? Item Response Theory (IRT) Item Response Modelling (IRM) IRT refers to a set of mathematical models that describe, in probabilistic terms, the relationship between a person’s response to a survey question/test item and his or her level of the ‘latent variable’ being measured by the scale Fayers and Hays p55 Assessing Quality of Life in Clinical Trials. Oxford Univ Press: Chapter on Applying IRT for evaluating questionnaire item and scale properties.

Psychometric (Measurement) Theory : 2 main schools, old & new Classical Test Theory Associated with use of traditional (old) psychometric methods linear factor analysis Cronbach’s alpha (internal consistency), summing items and simple sum scores Item response theory Modern test theory A set or family of mathematical / probability models that describe the relationship between a person’s [response / answer] to a [questionnaire survey / test item] and his or her level of the latent variable being measured

Classical Test Theory Reliability estimation Reliability coefficient Major error source Data-gathering procedure Statistical data analysis 1. Stability coefficient Changes over time Test-retest Produce-moment correlation 2. Equivalence coefficient Item sampling: from test form to test form Given form j, form k 3. Internal consistency coefficient Item sampling: test heterogeneity A single administration Split-half correlation/ Spearman Brown correction, coefficient alpha Factor loadings Other Table 4.1 p26 Dato M.N. De Gruiter and Leo J. Th. Van der Kamp (2008)

Reliability coefficients STATA alpha and cialpha commands Continuous outcomes: Guttman-Cronbach alpha Test scale = mean(unstandardized items) Average interitem covariance: .0921364 Number of items in the scale: 8 Scale reliability coefficient: 0.7942 Cronbach's alpha one-sided confidence interval --------------------------------------------------------------------- Items | alpha [95% Conf.Interval] ---------+----------------------------------------------------------- Test | .79423639 >= .7348227

Exploratory Factor Analysis (ML): STATA factor command factor v1-v8, factors(2) ml Factor analysis/correlation Number of obs = 87 Method: maximum likelihood Retained factors = 2 Rotation: (unrotated) Number of params = 15 Schwarz's BIC = 95.9898 Log likelihood = -14.5006 (Akaike's) AIC = 59.0012 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 2.84462 1.43839 0.6692 0.6692 Factor2 | 1.40624 . 0.3308 1.0000 LR test: independent vs. saturated: chi2(28) = 261.31 Prob>chi2 = 0.0000 LR test: 2 factors vs. saturated: chi2(13) = 27.39 Prob>chi2 = 0.0110 Factor loadings (pattern matrix) and unique variances Variable | Factor1 Factor2 | Uniqueness v1 | 0.6652 -0.2760 | 0.4814 v2 | 0.8126 -0.2484 | 0.2780 v3 | 0.7071 -0.3337 | 0.3886 v4 | 0.7123 -0.0119 | 0.4925 v5 | 0.4729 0.4383 | 0.5842 v6 | 0.3554 0.6141 | 0.4966 v7 | 0.3969 0.5332 | 0.5581 v8 | 0.4764 0.5507 | 0.4698 ------------------------------------------------- . rotate, bentler bl(.35) Rotation: orthogonal bentler (Kaiser off) Number of params = 15 Factor | Variance Difference Proportion Cumulative Factor1 | 2.35464 0.45841 0.5539 0.5539 Factor2 | 1.89622 . 0.4461 1.0000 Rotated factor loadings (pattern matrix) and unique variances -------------+--------------------+-------------- v1 | 0.7188 | 0.4814 v2 | 0.8392 | 0.2780 v3 | 0.7819 | 0.3886 v4 | 0.6452 | 0.4925 v5 | 0.6015 | 0.5842 v6 | 0.7078 | 0.4966 v7 | 0.6533 | 0.5581 v8 | 0.7039 | 0.4698 (blanks represent abs(loading)<.35) Factor rotation matrix -------------------------------- | Factor1 Factor2 -------------+------------------ Factor1 | 0.8985 0.4390 Factor2 | -0.4390 0.8985

(2) Exploratory Factor Analysis (ML): STATA rotate command . rotate, bentler bl(.35) Rotated factor loadings (pattern matrix) and unique variances Variable | Factor1 Factor2 | Uniqueness -------------+--------------------+-------------- v1 | 0.7188 | 0.4814 v2 | 0.8392 | 0.2780 v3 | 0.7819 | 0.3886 v4 | 0.6452 | 0.4925 v5 | 0.6015 | 0.5842 v6 | 0.7078 | 0.4966 v7 | 0.6533 | 0.5581 v8 | 0.7039 | 0.4698 ------------------------------------------------- (blanks represent abs(loading)<.35) Factor rotation matrix | Factor1 Factor2 -------------+------------------ Factor1 | 0.8985 0.4390 Factor2 | -0.4390 0.8985 --------------------------------

Confirmatory Factor Analysis (ML): STATA cfa1 command Log likelihood = -457.31642 Number of obs = 87 | Coef. Std. Err. z P>|z| [95% Conf. Interval] Lambda | v1 | 1 . . . . . v2 | 1.146607 .1706831 6.72 0.000 .8120748 1.48114 v3 | 1.077999 .1776428 6.07 0.000 .729825 1.426172 v4 | 1.128529 .1988093 5.68 0.000 .7388694 1.518188 v5 | .6362603 .2008189 3.17 0.002 .2426624 1.029858 v6 | .4119255 .2019811 2.04 0.041 .0160498 .8078011 v7 | .5417541 .2211306 2.45 0.014 .1083461 .975162 v8 | .6653727 .2206966 3.01 0.003 .2328152 1.09793 Var[error] | v1 | .1172731 .0215309 5.45 0.000 .0750732 .159473 v2 | .0669433 .0176594 3.79 0.000 .0323315 .1015551 v3 | .1085488 .0212332 5.11 0.000 .0669325 .1501651 v4 | .1349088 .0264226 5.11 0.000 .0831214 .1866963 v5 | .240713 .038299 6.29 0.000 .1656483 .3157778 v6 | .2753728 .0426118 6.46 0.000 .1918553 .3588903 v7 | .3244316 .0504165 6.44 0.000 .225617 .4232461 v8 | .2991244 .0473675 6.31 0.000 .2062859 .391963 Var[latent] | phi1 | .1107746 .0320436 3.46 0.001 .0479702 .173579 Goodness of fit test: LR = 109.116 ; Prob[chi2(20) > LR] = 0.0000 Test vs independence: LR = 163.149 ; Prob[chi2( 8) > LR] = 0.0000

Single factor model (ML): STATA confa commands . confa (f: v1-v8), from(2SLS) log likelihood = -457.31642 Number of obs = 87 | Coef. Std. Err. z P>|z| [95% Conf. Interval] Loadings | f | v1 | 1 . . . . . v2 | 1.146608 .1706831 6.72 0.000 .8120749 1.48114 v3 | 1.077998 .1776429 6.07 0.000 .7298248 1.426172 v4 | 1.128529 .1988093 5.68 0.000 .7388694 1.518188 v5 | .6362603 .2008189 3.17 0.002 .2426625 1.029858 v6 | .4119255 .2019811 2.04 0.041 .0160499 .8078012 v7 | .5417541 .2211306 2.45 0.014 .1083461 .9751621 v8 | .6653728 .2206967 3.01 0.003 .2328153 1.09793 Var[error] | v1 | .1172731 .0215309 5.45 0.000 .0750732 .1594729 v2 | .0669433 .0176594 3.79 0.000 .0323315 .1015551 v3 | .1085489 .0212332 5.11 0.000 .0669326 .1501652 v4 | .1349088 .0264226 5.11 0.000 .0831214 .1866962 v5 | .2407129 .038299 6.29 0.000 .1656482 .3157776 v6 | .2753727 .0426117 6.46 0.000 .1918553 .3588902 v7 | .3244316 .0504165 6.44 0.000 .2256171 .4232462 v8 | .2991244 .0473675 6.31 0.000 .2062858 .3919629 Goodness of fit test: LR = 109.116 ; Prob[chi2(20) > LR] = 0.0000 Test vs independence: LR = 163.149 ; Prob[chi2( 8) > LR] = 0.0000

Confirmatory Factor Analysis (ML): STATA estat fitindices commands Fit indices RMSEA = 0.2276 90% CI= (0.1868, 0.2703) RMSR = 0.0724 TLI = 0.7702 CFI = 0.2967 AIC = 946.633 BIC = 986.087

Multidimensional factor model (ML): STATA confa command (2 factors) confa (f1: v1-v4) (f2: v5-v8), from(2SLS) log likelihood = -422.79486 Number of obs = 87 | Coef. Std. Err. z P>|z| [95% Conf. Interval] Means | v1 | 1.592161 .051198 31.10 0.000 1.491814 1.692507 v2 | 1.48841 .0494312 30.11 0.000 1.391526 1.585293 v3 | 1.568607 .0522239 30.04 0.000 1.46625 1.670964 v4 | 1.509285 .056323 26.80 0.000 1.398894 1.619677 v5 | 1.582903 .0572911 27.63 0.000 1.470614 1.695191 v6 | 1.511862 .0581486 26.00 0.000 1.397893 1.625831 v7 | 1.500861 .0640531 23.43 0.000 1.37532 1.626403 v8 | 1.456359 .0632607 23.02 0.000 1.332371 1.580348 Loadings | v1 | 1 . . . . . v2 | 1.129181 .1617634 6.98 0.000 .812131 1.446232 v3 | 1.085591 .1685842 6.44 0.000 .7551719 1.41601 v4 | 1.037635 .1794024 5.78 0.000 .6860131 1.389258 v5 | 1 . . . . . v6 | 1.132231 .2299847 4.92 0.000 .6814688 1.582992 v7 | 1.194321 .2745619 4.35 0.000 .6561897 1.732453 v8 | 1.26779 .2739953 4.63 0.000 .7307694 1.804811 Factor cov. | f1-f1 | .1190851 .0326402 3.65 0.000 .0551115 .1830586 f2-f2 | .1128016 .0399112 2.83 0.005 .0345771 .191026 f1-f2 | .040931 .017838 2.29 0.022 .0059692 .0758928 Goodness of fit test: LR = 40.073 ; Prob[chi2(19) > LR] = 0.0032 Test vs independence: LR = 232.192 ; Prob[chi2( 9) > LR] = 0.0000

Single factor model (ML): STATA confa commands . estat fitindices Fit indices RMSEA = 0.1136, 90% CI= (0.0637, 0.1627) RMSR = 0.0299 TLI = 0.9553 CFI = 0.8205 AIC = 879.590 BIC = 921.510

Reliability coefficients STATA kr20 command Kuder-Richardson KR20 Kuder-Richarson coefficient of reliability (KR-20) Number of items in the scale = 12 Number of complete observations = 6299 Item Item Item-rest Item | Obs difficulty variance correlation ---------+------------------------------------------ GHQ1 | 6299 0.1846 0.1505 0.4834 GHQ2 | 6299 0.1640 0.1371 0.3865 GHQ3 | 6299 0.1872 0.1521 0.1954 GHQ4 | 6299 0.1029 0.0923 0.4652 GHQ5 | 6299 0.1691 0.1405 0.4432 GHQ6 | 6299 0.0489 0.0465 0.3846 GHQ7 | 6299 0.1208 0.1062 0.5549 GHQ8 | 6299 0.1103 0.0982 0.5289 GHQ9 | 6299 0.0749 0.0693 0.3143 GHQ10 | 6299 0.0608 0.0571 0.3838 GHQ11 | 6299 0.1218 0.1069 0.4053 GHQ12 | 6299 0.1580 0.1330 0.5043 Test | 0.1253 0.4208 KR20 = 0.7760

Reliability coefficients STATA kr20 command Computes the reliability coefficient of a set of dichotomous items, [Cronbach's alpha is used for multipoint scales] In addition, kr20 computes: - the item difficulty (proportion of 'right' answers), - the average value of item difficulty, - the item variance, - the corrected item-test point-biserial correlation coefficients, - the average value of corrected item-test correlation coefficients. The items must be coded as: - '0' for a wrong answer (unexpected answer), - '1' for a right answer (expected answer).

What is applied psychometrics? Tim Croudace tjc39@cam.ac.uk Department of Psychiatry John Rust jnr24@cam.ac.uk The Psychometrics Centre University of Cambridge

Message TRI IRT

Latent Trait Modelling Note: IRT = IRM = LTM = CDFA* Latent trait modelling = factor analysis of categorical (binary/ordinal/nominal) data Unidimensional LTM is widely used to measure variables/constructs such as Personality Dimensions and Intelligence Ability: Mathematical / Verbal / Spatial Social and political attitudes Consumer preferences Health, Quality of life, Severity of disorder or symptoms e.g. in depression, back pain, fatigue etc… Multidimensional IRT is statistically developed but is less widely used presently

Here the criterion 1 – 4 are binary but the latent variable (x-axis) is continuous (gaussian normal) From Muthen, B.O (1991). Latent variable epidemiology. Alcohol Research World. 42 139-167.

8 IRT models you might see …

Rasch model (logistic mixed model) (1 random effect (individual differences – x – axis)) 12 fixed effects – item thresholds (location of s-shapes along x) [Stata raschtest mixed effects logistic regression [inc gllamm] Item Discriminations GHQ1 1.095 0.021 GHQ4 1.095 0.021 GHQ5 1.095 0.021 GHQ6 1.095 0.021 GHQ9 1.095 0.021 GHQ10 1.095 0.021 GHQ11 1.095 0.021 GHQ12 1.095 0.021 GHQ20 1.095 0.021 GHQ26 1.095 0.021 Item Difficulties GHQ1$1 1.226 0.028 GHQ5$1 1.306 0.029 GHQ12$1 1.364 0.030 GHQ11$1 1.598 0.033 GHQ26$1 1.601 0.033 GHQ4$1 GHQ20$1 1.855 0.039 GHQ9$1 1.986 0.039 GHQ10$1 2.146 0.045 GHQ6$1 2.283 0.048

IRT in the Stata Journal J-7-3 st0129 . Est. dichotomous & ordinal item response models with gllamm By X. Zheng and S. Rabe-Hesketh Q3/07 SJ 7(3):313—333 describes the one- and two-parameter logit models for dichotomous items the partial-credit and rating scale models for ordinal items, and an extension of these models where the latent variable is regressed on explanatory variables SJ-7-1 st0119 Rasch analysis: Estimation and tests with raschtest By J. Hardouin Q1/07 SJ 7(1):22--44 command for estimating the Rasch model, the best known item response theory model for binary responses

Running Commercial IRT software from Stata runparscale runparscale: runparscale brings the IRT analysis framework of PARSCALE into the Stata enviroment. While runparscale does little more than data reformat and ascii file creation, it removes a lot of the hassle of estimating IRT models. Authors: runparscale was written by Laura Gibbons, PhD and Richard Jones, ScD, under the direction of Paul Crane, MD MPH. We appreciate the assistance of Tom Koepsell, MD MPH. Please see runparscale.ado for UW License information. Laura Gibbons, PhD gibbonsl@u.washington.edu Richard N Jones, ScD jones@mail.hrca.harvard.edu

Running Commercial IRT software from Stata runparscale

Running Commercial IRT software from Stata runparscale PARSCALE ITEM PARAMETERS item slope (se) location (se) -------------------------------------------------- 1 GHQ1 1.001 (0.091) -0.252 (0.063) 2 GHQ2 0.433 (0.060) 0.170 (0.124) 3 GHQ3 0.260 (0.056) 1.027 (0.287) 4 GHQ4 0.988 (0.091) 0.323 (0.064) 5 GHQ5 0.934 (0.087) 0.005 (0.065) 6 GHQ6 1.004 (0.100) 0.909 (0.081) 7 GHQ7 1.599 (0.139) -0.055 (0.044) 8 GHQ8 1.403 (0.122) 0.035 (0.048) 9 GHQ9 0.598 (0.075) 1.286 (0.156) 10 GHQ10 1.035 (0.101) 0.842 (0.077) 11 GHQ11 0.935 (0.088) 0.393 (0.068) 12 GHQ12 1.436 (0.124) -0.152 (0.048)

parscale ITEM FIT STATISTICS [not to be trusted for short tests, illustrative only] | BLOCK | ITEM | CHI-SQUARE | D.F. | PROB. | ----------------------------------------------- | GHQ1 | 0001 | 19.56213 | 7. | 0.007 | | GHQ2 | 0002 | 13.82273 | 9. | 0.128 | | GHQ3 | 0003 | 5.89128 | 10. | 0.825 | | GHQ4 | 0004 | 8.73722 | 8. | 0.365 | | GHQ5 | 0005 | 13.46327 | 8. | 0.096 | | GHQ6 | 0006 | 12.87186 | 9. | 0.168 | | GHQ7 | 0007 | 14.25497 | 7. | 0.047 | | GHQ8 | 0008 | 9.20264 | 7. | 0.238 | | GHQ9 | 0009 | 27.44038 | 10. | 0.002 | | GHQ10 | 0010 | 21.55337 | 9. | 0.011 | | GHQ11 | 0011 | 10.44335 | 8. | 0.235 | | GHQ12 | 0012 | 20.04176 | 7. | 0.006 | | TOTAL | | 177.28497 | 99. | 0.000 |

X-axis Latent Trait value (IRT thresholds zero centred) Y-axis conditional standard error of measurement (s.e.m. varies with score value under Item Response Theory). Lower s.e.m = greater precision of measurement

Non-parametric IRT Mokken Analysis STATA loevH command . loevH GHQ1-GHQ12 Observed Expected Number Easyness Guttman Guttman Loevinger H0: Hj<=0 of NS Item Obs P(Xj=1) errors errors H coeff z-stat. p-value Hjk --------------------------------------------------------------------------------------------------- GHQ1 548 0.5712 628 1057.50 0.40615 23.2388 0.00000 0 GHQ2 548 0.4708 902 1183.11 0.23760 15.0931 0.00000 0 GHQ3 548 0.3923 954 1140.05 0.16320 10.1904 0.00000 1 GHQ4 548 0.4088 741 1155.62 0.35879 22.5701 0.00000 0 GHQ5 548 0.4982 775 1176.57 0.34131 21.5282 0.00000 0 GHQ6 548 0.2573 538 868.24 0.38036 20.0185 0.00000 1 GHQ7 548 0.5201 675 1151.94 0.41403 25.5869 0.00000 0 GHQ8 548 0.4891 730 1181.99 0.38240 24.2362 0.00000 0 GHQ9 548 0.2500 598 846.50 0.29356 15.1966 0.00000 0 GHQ10 548 0.2701 529 899.44 0.41185 22.1342 0.00000 0 GHQ11 548 0.3923 741 1140.05 0.35003 21.8568 0.00000 0 GHQ12 548 0.5511 629 1100.94 0.42867 25.4203 0.00000 0 Scale 548 4220 6450.98 0.34584 50.5208 0.00000 loevH by jean-benoit.hardouin@univ-nantes.fr [Websites AnaQol and FreeIRT] allows verifying the fit of data to the Monotonely Homogeneous Mokken Model or to the Doubly Monotone Mokken Model. It computes the Loevinger H scalability coefficients, and several indexes in the field of the Non parametric Item Response Theory.

(1) Non-parametric IRT Mokken Analysis STATA msp command . msp GHQ1-GHQ12, c(.4) The two first items selected in the scale 1 are GHQ7 and GHQ8 (Hjk=0.7357) The item GHQ6 is selected in the scale 1 Hj=0.5777 H=0.6534 The following items are excluded at this step: GHQ3 The item GHQ12 is selected in the scale 1 Hj=0.5025 H=0.5723 The item GHQ10 is selected in the scale 1 Hj=0.4431 H=0.5267 The item GHQ11 is selected in the scale 1 Hj=0.4538 H=0.5011 The item GHQ1 is selected in the scale 1 Hj=0.4338 H=0.4811 The item GHQ4 is selected in the scale 1 Hj=0.4083 H=0.4616 The item GHQ5 is selected in the scale 1 Hj=0.4095 H=0.4489 None new item can be selected in the scale 1 because all the Hj are lesser than .4 or none new item has all the related Hjk coefficients significantly greater than 0 Observed Expected Number Easyness Guttman Guttman Loevinger H0: Hj<=0 of NS Item Obs P(Xj=1) errors errors H coeff z-stat. p-value Hjk --------------------------------------------------------------------------------------------------- GHQ5 548 0.4982 514 870.46 0.40951 22.3093 0.00000 0 GHQ4 548 0.4088 478 828.96 0.42338 22.2905 0.00000 0 GHQ1 548 0.5712 457 795.91 0.42582 21.4001 0.00000 0 GHQ11 548 0.3923 470 812.38 0.42145 21.8744 0.00000 0 GHQ10 548 0.2701 340 631.18 0.46133 20.2369 0.00000 0 GHQ12 548 0.5511 409 827.11 0.50550 26.2866 0.00000 0 GHQ6 548 0.2573 312 606.20 0.48532 20.7341 0.00000 0 GHQ7 548 0.5201 448 859.18 0.47857 25.7520 0.00000 0 GHQ8 548 0.4891 486 870.31 0.44158 24.0575 0.00000 0 Scale 548 1957 3550.85 0.44886 48.3819 0.00000 Scale: 2 ---------- Significance level: 0.016667 The two first items selected in the scale 2 are GHQ2 and GHQ3 (Hjk=0.4111) Significance level: 0.012500 None new item can be selected in the scale 2 because all the Hj are lesser than .4 or none new item has all the related Hjk coefficients significantly greater than 0 . GHQ2 548 0.4708 67 113.78 0.41113 8.1914 0.00000 0 GHQ3 548 0.3923 67 113.78 0.41113 8.1914 0.00000 0 Scale 548 67 113.78 0.41113 8.1914 0.00000 There is only one item remaining (GHQ9).

(2) Non-parametric IRT Mokken Analysis STATA msp command Scale: 2 ---------- Significance level: 0.016667 The two first items selected in the scale 2 are GHQ2 and GHQ3 (Hjk=0.4111) Significance level: 0.012500 None new item can be selected in the scale 2 because all the Hj are lesser than .4 or none new item has all the related Hjk coefficients significantly greater than 0 . Observed Expected Number Easyness Guttman Guttman Loevinger H0: Hj<=0 of NS Item Obs P(Xj=1) errors errors H coeff z-stat. p-value Hjk --------------------------------------------------------------------------------------------------- GHQ2 548 0.4708 67 113.78 0.41113 8.1914 0.00000 0 GHQ3 548 0.3923 67 113.78 0.41113 8.1914 0.00000 0 Scale 548 67 113.78 0.41113 8.1914 0.00000 There is only one item remaining (GHQ9).

(1) Rasch model in STATA Estimation method: Conditional maximum likelihood (CML) Number of items: 9 Number of groups: 10 (8 of them are used to compute the statistics of test) Number of individuals: 548 Number of individuals with missing values: 0 (removed) Number of individuals with nul or perfect score: 111 Conditional log-likelihood: -1467.1127 Log-likelihood: -2025.3536 Difficulty Standardized Items parameters std Err. R1c df p-value Outfit Infit U ----------------------------------------------------------------------------- GHQ1 -0.13173 0.15481 11.449 7 0.1202 2.338 1.713 1.799 GHQ4 0.90796 0.15455 11.601 7 0.1145 0.654 0.785 0.863 GHQ5 0.34003 0.15343 4.847 7 0.6787 1.192 1.098 1.658 GHQ6 1.94575 0.16456 8.730 7 0.2727 0.291 0.072 0.368 GHQ7 0.20031 0.15362 10.339 7 0.1702 -1.424 -2.433 -2.124 GHQ8 0.39799 0.15341 13.443 7 0.0620 -0.871 -0.545 -1.673 GHQ10 1.85021 0.16316 11.134 7 0.1329 0.416 0.267 1.077 GHQ11 1.01368 0.15510 13.131 7 0.0690 0.578 0.844 1.462 GHQ12* 0.00000 . 5.045 7 0.6545 -2.916 -2.624 -2.884 R1c test R1c= 95.782 56 0.0007 Andersen LR test Z= 99.418 56 0.0003 *: The difficulty parameter of this item had been fixed to 0

(2) Rasch model in STATA raschtest Ability Expected Group Score parameters std Err. Freq. Score ll -------------------------------------------------------------- 0 0 -2.449 1.561 82 0.44 1 1 -1.202 0.963 61 1.32 -117.4189 2 2 -0.524 0.801 55 2.22 -186.8236 3 3 0.002 0.734 48 3.12 -189.8916 4 4 0.473 0.708 70 4.03 -281.8395 5 5 0.933 0.712 54 4.95 -233.6392 6 6 1.418 0.744 48 5.87 -171.5103 7 7 1.971 0.817 53 6.79 -151.2446 8 8 2.685 0.983 48 7.69 -85.0359 9 9 3.974 1.591 29 8.57

Running Mplus www.statmodel.com from Stata runmplus Runmplus [Author: Richard N Jones, ScD jones@mail.hrca.harvard.edu ] Builds an Mplus data file, command file, executes the command file and display Mplus log file (output) in the Stata results window. Factor analysis syntax examples: Exploratory factor analysis with continuous indicators runmplus y1-y12, type(efa 1 4) Exploratory factor analysis with categorical indicators runmplus y1-y12, type(efa 1 4) categorical(all) Exploratory factor analysis with a mixture of categorical and continuous indicators runmplus y1-y12,type(efa 1 4) categorical(y1 y3 y5 y7 y9 y11) Confirmatory factor analysis with continuous indicators runmplus y1-y6, model(f1 by y1-y3; f2 by y4-y6;)

And finally … think useR

IR : irtoys package example plots (from manual) Author: Ivailo Partchev <Ivailo.Partchev@uni-jena.de>

Extract from //cran.r-project.org/web/views/Psychometrics.html Classical Test Theory (CTT) The CTT package can be used to perform a variety of tasks and analyses associated with classical test theory: score multiple-choice responses, perform reliability analyses, conduct item analyses, and transform scores onto different scales. The CMC package calculates and plots the step-by-step Cronbach-Mesbach curve, that is a method, based on the Cronbach alpha coefficient of reliability, for checking the unidimensionality of a measurement scale. The package psychometric contains functions useful for correlation theory, meta-analysis (validity-generalization), reliability, item analysis, inter-rater reliability, and classical utility. Cronbach alpha, kappa coefficients, and intra-class correlation coefficients (ICC) can be found in the psy package. A number of routines for scale construction and reliability analysis useful for personality and experimental psychology are contained in the packages psych and MiscPsycho. Additional measures for reliability and concordance can be computed with the concord package.

(2) Extract from //cran.r-project.org/web/views/Psychometrics.html Item Response Theory (IRT): The eRm package fits extended Rasch models, i.e. the ordinary Rasch model for dichotomous data (RM), the linear logistic test model (LLTM), the rating scale model (RSM) and its linear extension (LRSM), the partial credit model (PCM) and its linear extension (LPCM) using conditional ML estimation. Missing values are allowed. The package ltm also fits the simple RM. Additionally, functions for estimating Birnbaum's 2- and 3-parameter models based on a marginal ML approach are implemented as well as the graded response model for polytomous data, and the linear multidimensional logistic model. Item and ability parameters can be calibrated using the package plink. It provides unidimensional and multidimensional methods such as Mean/Mean, Mean/Sigma, Haebara, and Stocking-Lord methods for dichotomous (1PL, 2PL and 3PL) and/or polytomous (graded response, partial credit/generalized partial credit, nominal, and multiple-choice model) items. The multidimensional methods include the Reckase-Martineau method and extensions of the Haebara and Stocking-Lord method. The difR package contains several traditional methods to detect DIF in dichotomously scored items. Both uniform and non-uniform DIF effects can be detected, with methods relying upon item response models or not. Some methods deal with more than one focal group. The package lordif provides a logistic regression framework for detecting various types of differential item functioning (DIF). The package plRasch computes maximum likelihood estimates and pseudo-likelihood estimates of parameters of Rasch models for polytomous (or dichotomous) items and multiple (or single) latent traits. Robust standard errors for the pseudo-likelihood estimates are also computed. A multilevel Rasch model can be estimated using the package lme4 with functions for mixed-effects models with crossed or partially crossed random effects. Other packages of interest are: mokken to compute non-parametric item analysis, the RaschSampler allowing for the construction of exact Rasch model tests by generating random zero-one matrices with given marginals, mprobit fitting the multivariate binary probit model, and irtoys providing a simple interface to the estimation and plotting of IRT models. Simple Rasch computations such a simulating data and joint maximum likelihood are included in the MiscPsycho package. The irtProb is designed to estimate multidimensional subject parameters (MLE and MAP) such as personnal pseudo-guessing, personal fluctuation, personal inattention. These supplemental parameters can be used to assess person fit, to identify misfit type, to generate misfitting response patterns, or to make correction while estimating the proficiency level considering potential misfit at the same time. Gaussian ordination, related to logistic IRT and also approximated as maximum likelihood estimation through canonical correspondence analysis is implemented in various forms in the package VGAM. Two additional IRT packages (for Microsoft Windows only) are available and documented on the JSS site. The package mlirt computes multilevel IRT models, and cirt uses a joint hierarchically built up likelihood for estimating a two-parameter normal ogive model for responses and a log-normal model for response times. Bayesian approaches for estimating item and person parameters by means of Gibbs-Sampling are included in MCMCpack. In addition, the pscl package allows for Bayesian IRT and roll call analysis. The latdiag package produces commands to drive the dot program from graphviz to produce a graph useful in deciding whether a set of binary items might have a latent scale with non-crossing ICCs.

(3) Extract from //cran.r-project.org/web/views/Psychometrics.html Structural Equation Models, Factor Analysis, PCA: Ordinary factor analysis (FA) and principal component analysis (PCA) are in the package stats as functions factanal() and princomp(). Additional rotation methods for FA based on gradient projection algorithms can be found in the package GPArotation. The package nFactors produces a non-graphical solution to the Cattell scree test. Some graphical PCA representations can be found in the psy package. The sem package fits general (i.e., latent-variable) SEMs by FIML, and structural equations in observed-variable models by 2SLS. Categorical variables in SEMs can be accommodated via the polycor package. The systemfit package implements a wider variety of estimators for observed-variables models, including nonlinear simultaneous-equations models. See also the pls package, for partial least-squares estimation, the gR task view for graphical models and the SocialSciences task view for other related packages. The package lavaan can be used to estimate a large variety of multivariate statistical models, including path analysis, confirmatory factor analysis, structural equation modeling and growth curve models. It includes the lavaan model syntax which allows users to express their models in a compact way and allows for ML, GLS, WLS, robust ML using Satorra-Bentler corrections, and FIML for data with missing values. It fully supports for meanstructures and multiple groups and reports standardized solutions, fit measures, modification indices and more as output. SEMModComp conducts tests of difference in fit for mean and covariance structure models as in structural equation modeling (SEM) The package FAiR performs factor analysis based on a genetic algorithm for optimization. This makes it possible to impose a wide range of restrictions on the factor analysis model, whether using exploratory factor analysis, confirmatory factor analysis, or a new estimator called semi-exploratory factor analysis (SEFA). FA and PCA with supplementary individuals and supplementary quantitative/qualitative variables can be performed using the FactoMineR package whereas MCMCpack has some options for sampling from the posterior for ordinal and mixed factor models. The homals package provides nonlinear PCA and, by defining sets, nonlinear canonical correlation analysis (models of the Gifi-family). Independent component analysis (ICA) can be computed using fastICA. Independent factor analysis (IFA) with independent non-Gaussian factors can be performed with the ifa package. A desired number of robust principal components can be computed with the pcaPP package. The package psych includes functions such as fa.parallel() and VSS() for estimating the appropriate number of factors/components as well as ICLUST() for item clustering.

Psychometrics in R Special volume of the Journal of Statistical Software www.jstatsoft.org Volume 20 Multilevel Rasch Correspondence Analysis Rasch Multilevel IRT Multidimensional Rasch Extended Rasch Marginal Maximum Likelihood IRT Mokken scale analysis …

Free R software The program LTM is available for R from http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm. It is available as an R version and S-Plus version. ltm fits the logit-probit (normal latent trait; logistic link function) models with one- [and two] factors. In a very recent (but complex) development it also allows for inclusion of nonlinear terms (e.g., interaction and quadratic terms). Extra features: computation of factor scores using Multiple Imputation Rasch model for which Goodness of Fit is assessed using a parametric Bootstrap version of the Pearson chi-squared.

Free software Factor/M-IRT MIRT Factor NOHARM Urbano Lorenzo-Seva & Pere J. Ferrando http://psico.fcep.urv.es/utilitats/factor/ MIRT NOHARM

FACTOR //psico.fcep.urv.es/utilitats/factor/ Factor is a program developed to fit the Exploratory Factor Analysis model. Below we describe the methods used. Univariate and multivariate descriptives of variables: Univariate mean, variance, skewness, and kurtosis Multivariate skewness and kurtosis (Mardia, 1970) Var charts for ordinal variables Dispersion matrices: User defined tipo matrix Covariance matrix Pearson correlation matrix Polychoric correlation matrix with optional Ridge estimates Procedures for determining the number of factors/components to be retained: MAP: Minimum Average Partial Test (Velicer, 1976) PA: Parallel Analysis (Horn, 1965) PA - MBS. It is an extension of Parallel Analysis that generates random correlation matrices using marginally bootstrapped samples (Lattin, Carroll, & Green, 2003) Factor and component analysis: PCA: Principal Component Analysis ULS: Unweighted Least Squares factor analysis (also MINRES and PAF) EML: Exploratory Maximum Likelihood factor analysis MRFA: Minimum Rank Factor Analysis (ten Berge, & Kiers, 1991) Schmid-Leiman second-order solution (1957) Factor scores (ten Berge, Krijnen, Wansbeek, & Shapiro, 1999) In ULS factor analysis, the Heywood case correction described in Mulaik (1972, page 153) is included: when an update has sum of squares larger than the observed variance of the variable, that row is updated by constrained regression using the procedure proposed by ten Berge and Nevels (1977). Some of the rotation methods to obtain simplicity are: Quartimax (Neuhaus & Wrigley, 1954) Varimax (Kaiser, 1958) Weighted Varimax (Cureton & Mulaik, 1975) Orthomin (Bentler, 1977) Direct Oblimin (Clarkson & Jennrich, 1988) Weighted Oblimin (Lorenzo-Seva, 2000) Promax (Hendrickson & White, 1964) Promaj (Trendafilov, 1994) Promin (Lorenzo-Seva, 1999) Simplimax (Kiers, 1994) Some of the indices used in the analysis are: Test on the dispersion matrix: Determinant, Bartlett's test and Kaiser-Meyer-Olkin (KMO) Goodness of fit statistics: Chi-Square Non-Normed Fit Index (NNFI; Tucker & Lewis); Comparative Fit Index (CFI); Goodness of Fit Index (GFI); Adjusted Goodness of Fit Index (AGFI); Root Mean Square Error of Approximation (RMSEA); and Estimated Non-Centrality Parameter (NCP) Reliabilities of rotated components (ten Berge & Hofstee, 1999) Simplicity indices: Bentler’s Simplicity index (1977) and Loading Simplicity index (Lorenzo-Seva, 2003) Mean, variance and histogram of fitted and standardized residuals. Automatic detection of large standardized residuals.

Interesting Journals … Psychological Assessment Psychological Methods Multivariate Behavioural Research Applied Psychological Measurement Journal of Educational and Behavioural Statistics Structural Equation Modeling Psychometrika Educational and Psychological Measurement

Running Mplus www.statmodel.com from Stata runmplus

Running Mplus www.statmodel.com from Stata runmplus

Running Mplus www.statmodel.com from Stata runmplus

Running Mplus www.statmodel.com from Stata runmplus

Running Mplus www.statmodel.com from Stata runmplus

Running Mplus www.statmodel.com from Stata runmplus

Running Mplus www.statmodel.com from Stata runmplus

Running Mplus www.statmodel.com from Stata runmplus

Excellent book chapter (non-technical) Application oriented book Assessing Quality of Life in Clinical Trials; Methods and Practice Edition: 2nd Author(s): Peter Fayers; Ron Hays ISBN: 0198527691 see Chapter by Reeve and Fayers Applying item response theory modelling for evaluating questionnaire item and scale properties download for free from www.oup.co.uk/pdf/0-19-852769-1.pdf

££££££££££££££££££££££ And out there in commerce, money talks…

As Test-Taking Grows, Test-Makers Grow Rarer, May 5, 2006, NY Times. Psychometrics, one of the most obscure, esoteric and cerebral professions in America …. is now also one of the hottest