Presentation on theme: "Issues in the Implementation of Longitudinal Growth Models for Student Achievement Joseph Stevens University of Oregon Keith Zvoch University of Nevada-Las."— Presentation transcript:
Contact Information: 170 College of Education 5267 University of Oregon Eugene, OR 97403 (541) 346-2445 email@example.com Presentation available at: http://www.uoregon.edu/~stevensj/issues.ppt
No Child Left Behind NCLB requires the use of a cross-sectional, case study design for the study of achievement and school effectiveness (Stevens, 2005) How best to model student achievement and regularities in achievement by teacher or school? Increasing interest in and development of longitudinal methods and modeling over several decades
Purposes Draw attention to certain research design issues in the study of school effectiveness Review things we know about the analysis of change Describe issues in the design and use of longitudinal models and some empirical results Discuss issues in the potential use of longitudinal modeling for accountability purposes
Design Issues Status versus growth (Stevens, 2000) State mandated test data on TerraNova Survey Plus for all middle schools in six NM School Districts from 1999-2001 36 Middle Schools; 5,544 students tested in grades 6, 7, and 8
Analysis of Change “Investigators who ask questions regarding gain scores would ordinarily be better advised to frame their questions in other ways.” Cronbach & Furby (1970) “Problems in measuring change abound and the virtues in doing so are hard to find.” Linn & Slinde (1977) Longitudinal research has been recognized as the “sine qua non of evaluation in nonexperimental settings” (Marco, 1974)
The Analysis of Change Height example Cross sectional comparisons do not measure change effectively/accurately Cross sectional comparisons of cohorts produce different results than analysis of change Common to confuse longitudinal research hypotheses and language with cross sectional designs and results
Design Issues Number of Measurement Occasions Pre-post, two wave studies most common More is better “Two waves of data are better than one, but maybe not much better” (Rogosa, 1995) Shape of growth function Reliability of estimation
Design Issues Lag between treatment and outcome Treatment effects take time to realize The more distal the outcome measurement from the treatment impact, the less the effect
Design Issues Size or Length of Measurement Intervals Use shortest interval possible until temporal effect established Intervals too short, cost, reactivity to measurement, too little treatment effect Intervals too long, attrition, distal treatment effects, inability to model growth function accurately
Design Issues Attrition Incomplete data How does attrition bias results? Mobility At the student level, accountability for some, opportunity to learn At the school level, accountability for all, opportunity to teach Cohort Stability
Effects of Attrition Zvoch & Stevens, 2005 An investigation of sample exclusion and student attrition effects on the longitudinal study of middle school mathematics performance Mathematics achievement on state mandated TerraNova 24 middle schools Analytic Method: 2 and 3-level HLM growth models
Effects of Attrition One longitudinal cohort: Two samples Complete cohort sample: All students who participated in accountability testing in 1998-99 (N = 6,098) District accountability sample: A subset of the complete cohort sample – same school for three years, complete test data, no modified test administrations (N = 3,334)
Student Demographic Characteristics by Analytic Sample _________________________________________________________________ Accountability Sample Complete Cohort Sample (N = 3,334) (N = 6,098) _________________________________________________________________ Student Characteristic Frequency PercentFrequency Percent _________________________________________________________________ Female 1,710 51.3 3,016 49.5 Non-Anglo 1,797 53.9 3,536 58.0 English Language Learner 397 11.9 1,121 18.4 Free Lunch Recipient 1,170 35.1 2,628 43.1 Special Education 101 3.0 1,092 17.9 __________________________________________________________________ Note. Chi-square tests comparing the accountability sample (N = 3,334) with the group of students lost from the complete cohort sample (N = 2,764) on student background characteristics revealed that males and students in special populations were statistically over-represented in the group of excluded students (p <.01 in each comparison).
Three-Level Unconditional Model for Mathematics Achievement by Sample ______________________________________________________________________________ Accountability Sample Complete Cohort Sample (N = 3,334) (N = 5,168) ______________________________________________________________________________ Fixed Effect Estimate SE t Estimate SE t ______________________________________________________________________________ School Mean Achievement, γ 000 659.192.97 222.29*** 648.96 3.09 209.84*** School Mean Growth, γ 100 18.440.92 20.12*** 17.64 0.87 20.16*** ______________________________________________________________________________ Variance Variance Random Effect Component df 2 Component df 2 ______________________________________________________________________________ Individual Achievement, r 0ij 766.01 3310 8978.37*** 1132.02 4548 22708.86*** Individual Growth, r 1ij 27.98 3310 3858.62*** 44.04 4548 5736.01*** Level-1 Error, e tij 313.12 361.59 School Mean Achievement, u 00j 202.73 23 704.61*** 222.42 23 855.44*** School Mean Growth, u 10j 18.68 23 354.14*** 17.01 23 347.30*** Percentage of Variation Between Schools Achievement Status, π 0ij 20.7 16.4 Achievement Growth, π 1ij 40.0 27.9 ______________________________________________________________________________ Note. In this analysis, students who transferred schools within the district were dropped from the complete cohort sample as these students could not be uniquely assigned to one school location (N = 930). *** p <.001
Effects of Attrition Were cross-sample changes in school performance associated with the percentage of students from special populations excluded from the district accountability sample?
Figure 1. Cross-Sample School Achievement Mean Change in Mathematics as a Function of the Proportion of Students from Special Populations Excluded from the Accountability Sample
Figure 2. Cross-Sample School Growth Rate Change in Mathematics as a Function of the Proportion of Students from Special Populations Excluded from the Accountability Sample
Conclusions Mathematics performance estimates differed across two sample conditions District and school achievement higher and student performance more similar in the restricted sample Cross-sample school changes in student achievement closely related to the proportion of students from special student populations excluded
Cohort Stability Zvoch & Stevens, in press An investigation of stability of cohorts from one year to the next Mean achievement status and growth of students across cohorts Changes in the achievement status and growth of students between student cohorts Predictors of school achievement outcomes
Cohort Data Structure Year Grade99-0000-0101-0202-0303-04 3C1C2C3 4C1C2C3 5C1C2C3 Note. Cohort 1 (N = 3,325), Cohort 2 (N = 3,347), Cohort 3 (N = 3,322); School N = 79
School Performance Indices Accountability Model Outcome Focus Across or Within Cohorts (Current Performance) Between Cohorts (Improvement Over Time) StatusSchool Mean Achievement, Percent Proficient School Mean Achievement/ Proficiency Change GrowthSchool Mean Growth School Mean Growth Change
Figure 1. Cross-cohort relationship between school mean achievement and school mean growth in mathematics High Mean Low Growth High Mean High Growth Low Mean Low Growth Low Mean High Growth
Cohort Stability Do the cross-cohort estimates of school performance vary with schools’ social context?
Figure 2. Cross-cohort school mean growth in mathematics as a function of the percentage of free lunch recipients
Figure 3. Cross-cohort school mean achievement in mathematics as a function of the percentage of free lunch recipients
Cohort Stability To what degree do estimates of the mean achievement status and achievement growth of schools change with each successive student cohort?
Figure 4. School mean achievement in mathematics as a function of student cohort
Figure 6. School mean growth in mathematics as a function of student cohort
Cohort Stability Do the between-cohort estimates of school improvement covary with cohort enrollment size?
Figure 7. Relationship between cohort-to-cohort changes in school mean growth and cohort enrollment size by school
Conclusions Cross-cohort performance differed by outcome: mean achievement status or growth Cohort-to-cohort changes in student performance also varied by outcome: change in school mean achievement or change in school mean growth Across cohorts, schools’ social context was associated only with student achievement levels, not achievement growth Changes in school performance were closely related to cohort enrollment size
Alternative Statistical Models for Longitudinal Analysis Difference or Gain scores Residuals Growth Curve Models Latent Growth Curves Mixture Models Autoregressive Models
Alternative Statistical Models Strengths and weaknesses of each model Results differ based on model used Models answer different questions Model complexity versus transparency
Measurement Issues Carryover Effects (Learning, Sensitization, etc.) Need for Parallel Test forms
Measurement Issues: Scaling and Equating Do scales change over time? Structure/dimensionality Units Standardization or scaling may mask change Equating Vertical equating, usually one point in time across grades/cohorts Need for longitudinal equating What spans can reasonably be equated? What content can reasonably be equated?
Measurement Issues: Reliability Regression to the mean in two wave studies Reliability of Difference Scores Measured Variables versus Latent Variables Reliability Generalization Over time Interactions of time and other characteristics
Measurement Issues: Validity Construct Equivalence over time Temporal Invariance Measurement invariance Structural invariance
Measurement Issues Measurement instruments that are designed to measure cognitive growth (Collins, 1991) True Developmental Scales
Measurement Issues: Validity Messick (1989): Validity and consequences of alternative approaches Pattern Matching (Shadish, Cook, & Campbell, 2002) Riechardt (2000): study of plausible threats to validity of treatment effects
Measurement Issues: Validity Stevens (2005) Three level HLM curvilinear growth models applied to state data. 23,469 sixth grade children took the state mandated TerraNova in 1999-00 Study includes the 23,296 sixth graders (99.3%) who took the mathematics subtest These students were matched longitudinally to 7 th, 8 th, and 9 th grade records for the years 2001, 2002, and 2003
Sample included only those students who were in their middle school for 2 or 3 years (17,596; 75.5% of students). Schools with less than 5 students were also excluded (13 schools with a total of 24 students), resulting in an analytic sample of 242 schools (94% of schools) with 17,572 students. This sample differs from the population in having about 1% more White and Hispanic, 1% fewer Native American, 1% fewer LEP and Special Education, and 2% fewer bilingual students.
Mathematics Achievement Predicted by Individual Characteristics (continued) ______________________________________________________________ Fixed Effect Coefficient SE t df p ______________________________________________________________ School Curvilinear Growth, γ 200 -2.09 0.21 -9.78241 <.001 White Student, γ 210 0.48 0.20 2.35241.019 LEP, γ 220 -0.10 0.36 -0.27241.790 Title 1 Student, γ 230 0.61 0.28 2.17241.030 Special Education, γ 240 0.61 0.50 1.22241.224 Modified Test, γ 250 -0.10 0.75 -0.14241.890 Free Lunch Student, γ 260 0.260.33 0.79241.427 Gender, γ 270 1.050.19 5.64241 <.001 ______________________________________________________________ School Level Level-1 Level-2 Variance Variance Component Explained ______________________________________________________________ Mean Achievement, u 00 242.78 184.89 23.8% Linear Growth, u 10 41.46 30.68 26.0% Curvilinear Growth, u 10 2.94 2.60 11.6% ______________________________________________________________
Pattern matching: relationships between alternative measures of school effectiveness and confounding variables NCLB Proficiency (percent proficient or above using state determined cutpoint) State rating of schools (weighted combination of proficiency score, attendance, dropout rates) HLM Empirical Bayes (EB) intercept estimates HLM EB slope estimates
Measurement Issues: Validity Pattern Matching: Relation to schooling effects If schooling policies and practice impact student learning they should emerge as correlates of growth Growth measures are more sensitive to the effects of schooling than status measures (Bryk & Raudenbush, 1989)
Zvoch & Stevens (2005) Study Purpose: To examine correlates of status and growth in mathematics achievement over a three year period. Individual math achievement scores on the TerraNova were used from a longitudinal sample of middle school students in the sixth grade in 1998-99, seventh grade in 1999-00, and eighth grade in 2000-01
Study conducted in one urban school district in NM: 24 middle schools, over 20,000 students; 51% female, 49% male 47% Hispanic, 44% Anglo, 3% Native American, 3% African American, 2% Asian, and 1% Other 17% of students were classified as ELL 17% special education 40% of students receive a free or reduced price lunch
Percent Free-Lunch (M =.49, SD =.28) Mean Educational Level of Mathematics Staff (M = 17.61, SD =.58) Mathematics Curricula (0 = Traditional Program, 1 = NSF Reform Curricula, 9 of the 24 middle schools (38%) Pattern of results differed depending on whether status scores or growth scores were examined
Zvoch, K., & Stevens, J. J. (in press). Longitudinal effects of school context and practice on mathematics achievement. Journal of Educational Research.
Using Longitudinal Models for Accountability Systems Design: Number of measurement occasions Single occasion case studies Annual measurement interval, size of interval Initial starting point, prior achievement Some states cannot track students over time
Using Longitudinal Models for Accountability Systems Measurement: Need for new assessments designed to measure cognitive growth Assessments sometimes vertically equated, seldom longitudinally equated; Need for true developmental scales, measurement invariance over time Construct equivalence over time
Using Longitudinal Models for Accountability Systems Attrition, Mobility, Cohort Effects Need for further study How are school estimates biased? Can statistical adjustments be used? School size, disaggregated group sizes and stability or bias
Summary and Conclusions Different designs, measures, and methods of analysis are likely to provide different evaluations of student growth and school effectiveness However, despite difficulties of longitudinal modeling, cross-sectional designs can not address fundamental issues of growth, change, and learning Need for the inclusion of policy and practice variables Importance of empirically validating assessment instruments and accountability systems
Bibliography Bryk, A. S., & Raudenbush, S. W. (1987). Application of hierarchical linear models to assessing change. Psychological Bulletin, 101, 147-158. Cattell, R. B. (1966). Patterns of change: Measurement in relation to state dimension, trait change, lability and process concepts. In R. B. Cattell (Ed.), Handbook of multivariate experimental psychology. Chicago: Rand McNally. Collins, L. (1991). Measurement in longitudinal research. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions. Washington, DC: American Psychological Association. Collins, L. M., & Horn, J. L. (1991). Best methods for the analysis of change: Recent advances, unanswered questions, future directions. Washington, DC: American Psychological Association. Collins, L. M., & Sayer, A. (2001). New methods for the analysis of change. Washington, DC: American Psychological Association.
Cronbach, L. J., & Furby, L. (1970). How we should measure change—or should we? Psychological Bulletin, 74, 68-80. CTB/McGraw-Hill (1997). TerraNova Technical Bulletin 1. Monterey, CA: Author. Duncan, S. C., & Duncan, T. E. (1995). Modeling the processes of development via latent variable growth curve methodology. Structural Equation Modeling: A Multidisciplinary Journal, 2(3), 187-213. Ferrer, E., Hamagami, F., & McArdle, J. J. (2004). Teacher’s corner: Modeling latent growth curves with incomplete data using different types of structural equation modeling and multilevel software. Structural Equation Modeling: A Multidisciplinary Journal, 11(3), 452-483. Linn, R.L., & Haug, C. (2002). Stability of school-building accountability scores and gains. Educational Evaluation and Policy Analysis, 24 (1), 29-36. Linn, R. L., & Slinde, J. A. (1977). The determination of the significance of change between pre- and posttesting periods. Review of Educational Research, 47(1), 121-150.
Marco, G. L. (1974). A comparison of selected school effectiveness measures based on longitudinal data. Journal of Educational Measurement, 11, 225- 234. Meredith, W. & Horn, J. (2001). The role of factorial invariance in modeling growth and change. In L. M. Collins & A. Sayer (Eds.), New methods for the analysis of change. Washington, DC: American Psychological Association. Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational Measurement (3rd Ed., pp. 13-103). New York: MacMillan. Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23, 13-23. Nesselroade, J. R. (1991). Interindividual differences in intraindividual change. In L. M. Collins & J. L. Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered questions, future directions. Washington, DC: American Psychological Association. Raudenbush, S. W. (2001). Comparing personal trajectories and drawing causal inferences from longitudinal data. Annual Review of Psychology, 52, 501-525. Raudenbush, S.W., Bryk, A.S., Cheong, Y.F., & Congdon, R.T. (2001). HLM 5: Hierarchical linear and nonlinear modeling. Chicago: Scientific Software International.
Raudenbush, S. W., & Willms, J. D. (1995). The estimation of school effects, Journal of Educational and Behavioral Statistics, 20 (4), pp. 307-35. Reichardt, C. S. (2000). A typology of strategies for ruling out threats to validity. In L. Bickman (Ed.), Research design: Donald Campbell’s legacy (Vol. 2, pp. 89-115). Thousand Oaks, CA: Sage. Rogosa, D. (1995). Myths about longitudinal research. In J.M. Gottman (Ed.). The analysis of change. Mahwah, NJ: Erlbaum. Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi- experimental designs for generalized causal inference. Boston, MA: Houghton Mifflin Company. Stevens, J. J. (2000). Educational Accountability Systems: Issues and Recommendations for New Mexico. Research report prepared for the New Mexico State Department of Education. Stevens, J. J. (2005). The study of school effectiveness as a problem in research design. In R. Lissitz (Ed.), Value-added models in education: Theory and applications. Maple Grove, MN: JAM Press. Stevens, J. J., Estrada, S., & Parkes, J. (2000). Measurement issues in the design of state accountability systems. Paper presented at the annual meeting of the AERA, New Orleans, LA. Stone, C.A., & Lane, S. (2003). Consequences of a state accountability program: Examining relationships between school performance gains and teacher, student, and school variables. Applied Measurement in Education, 16, 1-26.
Teddlie, C., Reynolds, D., & Sammons, P. (2000). The methodology and scientific properties of school effectiveness research. In C. Teddlie, & D. Reynolds (Eds.), The International handbook of school effectiveness research. New York: FalmerPress. Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and predictors of individual change over time, Psychological Bulletin, 116(2), 363-381. Willett, J. B., Singer, J. D., & Martin, N. C. (1998). The design and analysis of longitudinal studies of development and psychopathology in context: Statistical models and methodological recommendations, Development and Psychopathology, 10, 395-426. Zvoch, K., & Stevens, J. J. (in press). Successive Student Cohorts and Longitudinal Growth Models: An Investigation of Elementary School Mathematics Performance. Educational Policy Analysis Archives. Zvoch, K., & Stevens, J. J. (in press). Longitudinal effects of school context and practice on mathematics achievement. Journal of Educational Research. Zvoch, K., & Stevens, J. J. (2003). A multilevel, longitudinal analysis of middle school math and language achievement. Educational Policy Analysis Archives, 11 (20). (Available at: http://epaa.asu.edu/epaa/v11n20/).http://epaa.asu.edu/epaa/v11n20/ Zvoch, K., & Stevens, J. J. (2005). Sample exclusion and student attrition effects in the longitudinal study of middle school mathematics performance. Educational Assessment, 10(2), 105-123.
Issues in the Implementation of Longitudinal Growth Models for Student Achievement Joseph Stevens & Keith Zvoch 170 College of Education, 5267 University of Oregon Eugene, OR 97403, (541) 346-2445 firstname.lastname@example.org Presentation available at: http://www.uoregon.edu/~stevensj/issues.ppt