Presentation is loading. Please wait.

Presentation is loading. Please wait.

Generalizability Theory Nothing more practical than a good theory!

Similar presentations


Presentation on theme: "Generalizability Theory Nothing more practical than a good theory!"— Presentation transcript:

1 Generalizability Theory Nothing more practical than a good theory!

2 This presentation is made by Prof. Zhao

3 Overview of Presentation Classes of reliability theories Generalizability Theory G-study D-study Illustrations

4 Three Reliability Theories Classical Test Theory Generalizability Theory Item Response Theory

5 Classical Test and G-theory Common characteristics: Concept of parallel measures Total score as a starting point Weak theories: population and item characteristics are confounded Distinctive characteristic: G-theory allows inclusion of multiple sources of error in one reliability estimate

6 Item Response Theory Item score as a starting point Item scores (mathematically) modeled using theoretical notions Strong theory: estimation of item parameters independent of population parameters

7 Item Response Theory Illustration of an “item theory”: Likelihood of Correct Answer Ability

8 Overview of Presentation Classes of reliability theories Generalizability Theory G-study D-study Illustrations

9 Generalizability Theory Fundamental is the concept of parallel measures (like classical test theory), but the theory allows a multitude of error sources Generalizability concept: Reliability is dependent on the inferences (generalizations) that the investigator wishes to make with the data from the measurement

10 Illustration Essay test 7 vignette based essay questions 2 markers independently marking all questions for all examinees Reliability in a classical framework: Cronbach’s alpha:0.66 Inter rater reliability (i.e. kappa)0.71

11 Fundamental Equation X = X = Observed score T+ E T = True score E = Error score Reliability = Variance of T Variance of X The larger the variance of T in relation to X, the higher the reliability

12 Fundamental Equation X = X = Observed score T+ E T = True score E = Error score Reliability = Variance of T Variance of X ===

13 Fundamental Equation X = X = Observed score T+ E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E

14 Multiple sources of error variance Reliability = Variance of T Var T + Var E MarkersEssaysUnexplained

15 Two steps in G analysis 1)G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) 2)D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

16 G-study steps Determine facets (factors of variance) Determine design Random vs fixed Crossed vs nested

17 Crossed vs nested designs AB 1 2 3 4 5 6 ABCDEFGHIJKL Crossed design Nested design

18 G-study Determine facets (factors of variance) Determine design Random vs Fixed Crossed vs nested Collect data Analysis of Variance (ANOVA) Estimation of variance components

19 Illustration 1 Essay Test 7 vignette based open ended questions 100 students One marker marked all essays for all students G-study questions? N of factors/facets? Random/fixed facets? Nested or crossed? One facet design Random Crossed

20 Sources of Variance Person x Items pipi,e

21 Sources of Variance Person x Items i ppi,e

22 Sources of Variance Person x Items pipi,e

23 Sources of Variance Person x Items p pi,e

24 Variance component estimation (one facet design) An observed score for a person on an item (X pi ): X pi =  [Overall mean] +  p -  [Person effect] +  i -  [Item effect] +  pi -  p -  i -  [Residual] Each of these effects have an average (always 0) and a variance (  2 ). The latter ones are the variance components. The variance of all observes scores X pi across all persons and items: ^ ^  2 (X pi ) = ^ 2p2p ^ 2i2i + ^  2 pi,e +

25 Variance components P x I design Source p i pi,e Estimated Variance Component 97.57 261.24 371.97 Standard Error 19.02 112.98 17.60 Percentage of Total Variance 13.35 35.75 50.90

26 Crossed vs nested designs AB 1 2 3 4 5 6 ABCDEFGHIJKL Crossed design Nested design

27 Sources of Variance Items : Persons pi,pi,e

28 Variance components I : P design p i,pi,e 97.57 663.21 35.75 50.90 13.35 86.65 i pi,e 261.24 371.97 Source Estimated Variance Component Percentage of Total Variance

29 Variance components I : P design p i,pi,e 97.57 663.21 35.75 50.90 13.35 86.65 i pi,e 261.24 371.97 Source Estimated Variance Component Percentage of Total Variance p i,pi,e 97.57 663.21 13.35 86.65

30 Sources of Variance Person x Items x Judges pi pij,e pi pj ij j

31 Variance component estimation (two facet design) An observed score for a person on an item (X pi ): X pi =  [Overall mean] +  p -  [Person effect] +  j -  [Item effect] +  i -  [Judge effect] +  pj -  p -  j +  [Person by judge effect] +  pi -  p -  i +  [Person by item effect] +  ij -  j -  i +  [Judge x item effect] +  pij -  pj -  pi -  ij +  p +  j +  i -  [Residual] The variance of observes scores X pi across all persons and items: ^  2 (X pij ) = ^ 2p2p ^ 2j2j ++ ^ 2i2i + ^  2 pj + ^  2 pi + ^  2 ij + ^  2 pij,e

32 Variance components P x I x J design Source p i j pi pj ij pij,e Estimated Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 Percentage of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83

33 Sources of Variance (Judges : Items) x Persons pj,ijpii pj, pij, e

34 Variance components Source Estimated Var Component Perc of Total Variance (Judges : Items) x Persons p i j,ij pi pj,pij,e i ij pj pij,e 15.00 80.00 33.18 72.94 48.71 25.18 95.00 185.87 106.12 10.57 5.45 20.62 40.33 23.03 3.26 17.36 7.20 15.83

35 Variance components Source Estimated Var Component Perc of Total Variance (Judges : Items) x Persons p i j,ij pi pj,pij,e i ij pj pij,e 15.00 80.00 33.18 72.94 48.71 25.18 95.00 185.87 106.12 10.57 5.45 20.62 40.33 23.03 3.26 17.36 7.20 15.83 p i j,ij pi pj,pij,e 48.71 25.18 95.00 185.87 106.12 10.57 5.45 20.62 40.33 23.03

36 Overview of Presentation Classes of reliability theories Generalizability Theory G-study D-study Illustrations

37 Two steps in G analysis 1)G(eneralizability)-study: Estimation of sources of variance that influence the measurement (e.g., variance between examinees, essays and markers) 2)D(ecision)-study: Estimation of reliability indices as a function of concrete sample size(s) (e.g., number of essays, number of markers)

38 Interpretation of scores Norm-oriented perspective Scores have relative meaning; scores have meaning in relation to each other Domain-oriented perspective Scores have absolute meaning to the domain of measurement Mastery-oriented perspective Scores have meaning in relation to a cut-off score (reliability of decisions, not of scores)

39 Fundamental Equation X = X = Observed score T+ E T = True score E = Error score Reliability = Variance of T Variance of X Reliability = Variance of T Var T + Var E

40 Illustration 1 Essay test 7 vignette based essay questions 1 markers marked all questions for all examinees Norm-referenced perspective Calculate generalizability coefficient!

41 D-study (n i = 7; norm-referenced) Source p i pi,e Estimated Variance Component 97.57 261.24 371.97 Standard Error 19.02 112.98 17.60 Percentage of Total Variance 13.35 35.75 50.90 G = T T + E = 97.57 +371.97/7/7 = 0.65

42 Illustration 2 Essay test 7 vignette based essay questions 1 markers marked all questions for all examinees Domain-referenced perspective Calculate dependability coefficient!

43 D-study (n i = 7; domain referenced) Source p i pi,e Estimated Variance Component 97.57 261.24 371.97 Standard Error 19.02 112.98 17.60 Percentage of Total Variance 13.35 35.75 50.90 D = 97.57 + = 0.52 261.24/ 7 +371.97/ 7

44 Illustration 3 Essay test 7 vignette based essay questions 1 markers marked all questions for all examinees Domain-referenced perspective Calculate dependability coefficient for a sample of 10 essays!

45 D-study (n i = 10; domain referenced) Source p i pi,e Estimated Variance Component 97.57 261.24 371.97 Standard Error 19.02 112.98 17.60 Percentage of Total Variance 13.35 35.75 50.90 D = 97.57 + = 0.61 261.24/ 10 +371.97/ 10

46 D-studies for several item samples N Essays 1 5 7 10 15 Generalizability Coefficient (G) 0.21 0.57 0.65 0.72 0.80 Dependability Coefficient (D) 0.13 0.44 0.52 0.61 0.70

47 Illustration 4 Essay test 7 vignette based essay questions 2 markers independently marked all questions for all examinees Norm-referenced perspective Calculate generalizability coefficient!

48 D-study (n i =7; n j =2; norm referenced) Source p i j pi pj ij pij,e Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 G = 48.71 + = 0.50 185.87/ 7 +33.18/ 2 +72.94/ 2 x 7

49 Illustration 5 Essay test 7 vignette based essay questions 2 markers independently marked all questions for all examinees Domain-referenced perspective Calculate dependability coefficient!

50 D-study (n i =7; n j =2; domain referenced) Source p i j pi pj ij pij,e Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 D = 48.71 + = 0.43 25.12/ 7 +15.00/ 2 +185.87/ 14 +33.18/ 2 +80.00/ 14 +72.94/ 14

51 Illustration 6 Essay test 7 vignette based essay questions 2 different markers independently marked each question for all examinees Norm-referenced perspective Calculate generalizability coefficient!

52 D-study (n i =7; n j =2; norm referenced) Source Estimated Var Component Perc of Total Variance (Judges : Items) x Persons p i j,ij pi pj,pij,e 48.71 25.18 95.00 185.87 106.12 10.57 5.45 20.62 40.33 23.03 G = 48.71 + = 0.52 185.87/ 7 +106.12/ 2 x 7

53 D-study summary table Two Markers 0.44 0.50 0.56 0.61 One Marker 0.39 0.47 0.56 0.65 Two Markers 0.46 0.54 0.63 0.72 Same Marker for all essays Different Marker for each essay Number of Essays 5 7 10 15 One Marker 0.36 0.41 0.45 0.49 Norm-referenced score interpretation

54 D-study summary table Two Markers 0.37 0.43 0.48 0.53 One Marker 0.37 0.45 0.54 0.64 Two Markers 0.44 0.52 0.61 0.70 Same Marker for all essays Different Marker for each essay Number of Essays 5 7 10 15 One Marker 0.29 0.33 0.37 0.40 Domain-referenced score interpretation

55 Another reliability index Reliability coefficient (G & D coefficients) Scale independent (0-1)  Non-intuitive interpretation Standard Error of Measurement (SEM) Intuitive interpretation  Scale dependent

56 Standard Error of Measurement X = X = Observed score T+ E T = True score E = Error score Reliability index = Variance of T Variance T + Variance E  E E S tandard Error of M easurement (SEM) =

57 Interpretation of SEM Suppose an examinee has a score of 60% and the SEM is 5: 60 5550 45 657075 65% CI 1.96 x 5  10 60 5550 45 657075 95% CI 2.14 x 5  11 60 5550 45 657075 95% CI

58 D-study (n i = 7; norm referenced) Source p i pi,e Estimated Variance Component 97.57 261.24 371.97 Standard Error 19.02 112.98 17.60 Percentage of Total Variance 13.35 35.75 50.90 G = 97.57 +371.97/7/7 = 0.65 SEM = = 7.29  371.97 /7/7

59 D-study (n i =7; n j =2; domain referenced) Source p i j pi pj ij pij,e Variance Component 48.71 25.12 15.00 185.87 33.18 80.00 72.94 % of Total Variance 10.57 5.45 3.26 40.33 7.20 17.36 15.83 D = 48.71 + = 0.43 25.12/ 2 +15.00/ 2 +185.87/ 14 +33.18/ 2 +80.00/ 14 +72.94/ 14 SEM == 8.57 

60 Overview of Presentation Classes of reliability theories Generalizability Theory G-study D-study Illustrations

61 Scenario CEX A clinical mini exercise (CEX) was developed in which examinees are periodically observed and rated on a rating form. An investigator analyzed a data set from 88 residents who were each observed on 4 occasions by a single different examiner (cf. 1. Norcini JJ, Blank LL, Arnold GK, Kimbal HR. The mini-CEX (Clinical Evaluation Exercise): A preliminary investigation. Annals of Internal Medicine 1995;123:795-799.). Variance Components p o,op,e G = p p +o:p / 4 = D o:p

62 Scenario OSCE I An OSCE was administered to 100 final year students consisting of 15 stations. Each station was scored by two independent examiners on a case specific checklist. Different examiners were used in each station. Variance Components p s G = p p + j:s ps pj:s ps / 15 + pj:s / 2 x15

63 Scenario OSCE II An experimental OSCE was administered to 20 residents. Each resident was tested on a different day. For each resident 3 stations were organized consisting of real patients that were available that day. Two examiners observed all residents in all stations and completed a generic rating scale. Variance Components p s:p D = p p +s:p / 3 j ps:s pj + j / 2 + ps:s / 3 + pj / 6

64 Scenario Clerkship Evaluation An investigator wishes to evaluate teaching quality of 10 clinical clerkships. She developed a questionnaire with 30 items on various quality aspects. The questionnaire was administered in all clerkships by 50 students. Variance Components c i s:c ci cs:i G = c c +s:c / 50 + ci / 30 + cs:i / 50 x 30 PS: It is doubtful that i is a random facet and i could be treated as fixed or ignored!

65 Further reading & software Literature Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley, 1972. Original monograph on generalizability theory. Complete, but hardly accessible for any reader. Brennan RL. Elements of Generalizability Theory. Iowa: ACT Publications, 1983. This is the resource book for most specialists. Not easy for non-statistically trained readers Shavelson RJ, Webb NM. Generalizability theory: A primer. Newbury Park, CA: Sage Publications, 1991. Good and accessible introduction to generalizability theory for any reader Software GENOVA Conducts G and D studies and provides ample statistical information. Operates on any PC. Program is relatively old and not user friendly. Program available from Dr. J. Crick, National Board of Medical Examiners, National Board of Medical Examiners, 3750 Market Street, Philadelphia, PA 19104-3190, USA. SPSS SPSS General Linear Models, Subprogram Variance Components, estimates variance components (also for unbalanced designs). D-studies need to be done manually.


Download ppt "Generalizability Theory Nothing more practical than a good theory!"

Similar presentations


Ads by Google