Presentation is loading. Please wait.

Presentation is loading. Please wait.

Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014.

Similar presentations


Presentation on theme: "Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014."— Presentation transcript:

1 Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014

2 Study Quality Matters? Building theory (or a house) Studies = 2x4s, bricks, etc. Self-evident? Rarely discussed in linguistics research But lack of attention to quality ≠ low quality Implication: Study quality needs to be examined, not assumed YES!

3 Defining ‘Study Quality’ How was SQ defined in Plonsky & Gass (2011) and Plonsky (2013)? How was SQ operationalized? Do you agree with this definition & operationalization? Now consider your (sub-)domain of interest How would you operationalize SQ? How would you weight or prioritize different features?

4 Missing data

5 Data TypePrimarySecondary / Meta-Analytic SDs- Sample variability- Calculate ESs (d)  exclusion? Reliability- Small effects due to treatment or dependent measure? - Inform instrument design - Adjust ESs for attenuation Effect sizes - Interpret magnitude of effects - Future power analyses - Compare/combine results - Power for moderator analysis LIMITED INFLUENCE ON L2 THEORY, PRACTICE, AND FUTURE RESEARCH  INEFFICIENT

6 Sources/Considerations for an Instrument to Measure Study Quality? 1. (Over 400) Existing measures of study quality from the meta- analysis literature (usually for weighting Ess) (e.g., sample: Valentine & Cooper, 2008—Table 2) 2. Societal guidelines (e.g., APA, APS, sample: JARS Working Group—Table 1, 2008, AERA 2006 reporting standards, LSA??, AAAL/AILA??) 3. Journal guidelines (e.g., Chapelle & Duff, 2003) 4. Methodological syntheses from other social sciences (e.g., Skidmore & Thompson, 2010) 5. Previous reviews / meta-analyses (e.g., Chaudron, 2001; Norris & Ortega, 2000; Plonsky, 2011) 6. Methods/stats textbooks (Larson-Hall, 2010; Porte, 2010) 7. Others?

7 (Only) Two studies in this area to address study quality empirically Plonsky & Gass (2011); Plonsky (2013, in press) Rationale & Motivations Study quality needs to be measured, not assumed Concerns expressed about research and reporting practices “Respect for the field of SLA can come only through sound scientific progress” (Gass, Fleck, Leder, & Svetics, 1998) No previous reviews of this nature

8 Plonsky & Gass (2011) & Plonsky (2013) Two common goals: 1.Describe and evaluate quantitative research practices 2.Inform future research practices

9 Methods (very meta-analytic but focus on methods rather than substance/effects/outcomes) Plonsky & Gass (2011) Domain: Interactionist L2 research; quantitative only Across 16 journals & 2 books (all published, 1980-2009) K = 174 Coded for: designs, analyses, reporting practices Analyses: frequencies/%s Plonsky (2013) Domain: all areas of L2 research; quantitative only Two journals: LL & SSLA (all published, 1990-2010) K = 606 Coded for: designs, analyses, reporting practices (sample scheme) Analyses: frequencies/%s How would you define your domain? Where would you search for primary studies?

10 RESULTS

11 Results: Designs Major Designs across Research Settings Plonsky (2013)P&G (2011) DesignClassLab(all) Observational20%80%65 Experimental45%55%35

12 Results: Designs Samples StudyAverage nTotal NGroups P&G (2011)227,951365 Plonsky (2013)19181,2551,732

13 Results: Designs Plonsky (2013)P&G FeatureClassLabAll Random assign.23%48%32% Ctrl/Comp group90%84%55% Pretest78%59%39% Delayed posttest50%29%79%

14 Results: Analyses AnalysisP&G (2011) %P (2013) % ANOVA5456 t test6943 Correlation1831 Chi-square5019 Regression815 MANOVA77 ANCOVA75 Factor analysis25 SEM-2 Other-7 Nonparametrics-5

15 Results: Analyses P&G (2011) %P (2013) % Zero612 One3228 Multiple6260 Number of Unique Statistical Analyses Used in L2 Research M 35 SD 64 95% CIs 30-40 Median 18 Tests of Statistical Significance in L2 Research Plonsky (2013)

16 Results: Descriptive Statistics ItemP&G (2011) %P (2013) % Percentage6268 Frequency7148 Correlation-30 Mean6477 Standard deviation5260 Mean without SD-31 Effect size1826 Confidence interval35 Plonsky (2013)

17 Results: Inferential Statistics ItemP&G (2011) %P (2013) % F2661 t3236 x2x2 -17 p =4449 p 6180 p either = or -44 p = and p -42 ANOVA / t test without M-20 ANOVA / t test without SD-35 ANOVA / t test without f or t-24

18 Results: Other Reporting Practices ItemP&G (2011) %P (2013) % RQs or hypotheses-80 Visual displays of data-53 Reliability6445 Pre-set alpha2522 Assumptions checked317 Power analysis21 Plonsky (2013) ?

19 Studies excluded due to missing data (usually SDs) (as % of meta-analyzed sample) MEDIAN Median K = 16 (Plonsky & Oswald, under review)

20 Data missing in meta-analyzed studies (as % of total sample)

21 Reporting of reliability coefficients (as % of meta-analyzed sample)

22 Reporting of effect sizes & CIs (as % of meta-analyzed sample)

23 Other data associated with quality/transparency and recommended or required by APA (as % of meta-analyzed sample)

24 Elsewhere in the social sciences…

25 Results: Changes over time Meara (1995): “[When I was in graduate school], anyone who could explain the difference between a one-tailed and two-tailed test of significance was regarded as a dangerous intellectual; admitting to a knowledge of one-way analyses of variance was practically the same as admitting to witchcraft in 18th century Massachusetts” (p. 341).

26 Changes Over Time: Designs Plonsky & Gass (2011) Plonsky (in press)

27 Changes Over Time: Designs Plonsky (in press)

28 Changes Over Time: Analyses Plonsky & Gass (2011)

29 Changes Over Time: Analyses Plonsky (in press)

30 Changes Over Time: Reporting Practices Plonsky & Gass (2011)

31 Changes Over Time: Reporting Practices Plonsky (in press)

32 Relationship between quality and outcomes? Plonsky (2011) Plonsky & Gass (2011): larger effects for studies that include delayed posttests

33 Discussion (Or: So what?) General: Few strengths and numerous methodological weaknesses are present—common even—in quantitative L2 research Quality (and certainly methodological features) vary across subdomains AND over time. Possible relationship between methodological practices and the outcomes they produce. Three common themes: Means-based analyses Missing data, NHST, and the ‘Power Problem’ Design Preferences

34 Discussion: Means-based analyses ANOVAs, t tests dominate, increasingly Not problematic as long as Assumptions checked (17% of Plonsky, 2013) Data are reported thoroughly Test are most appropriate for RQs (i.e., not default) Benefits to increased regression analyses (see Cohen, 1968) Less categorization of continuous variables (e.g., proficiency, working memory) to use ANOVA  loss of variance! More precise results (R 2 s +informative than an overall p or eta 2 ) Fewer tests  preservation of experiment-wise power

35 Discussion: Missing data, NHST, & Power In general: lots of missing and inconsistently reported data! BUT We’re getting better! The “Power Problem” Small samples Heavy reliance on NHST Effects not generally very large Omission of non-statistical results  inflated summary results Rarely check assumptions Rarely use multivariate statistics Rarely analyze power

36 Discussion: Design Preferences Signs of domain maturity? +classroom-based studies +experimental studies +delayed posttests

37 Discussion-Summary Causes/explanations - Inconsistencies among reviewers - Lack of standards - Lack of familiarity with design and appropriate data analysis and reporting - Inadequate training (Lazaraton et al., 1987) - Non-synthetic-mindedness - Publication bias Effects Limited interpretability Limited meta- analyzability Overestimation of effects Overreliance on p values S l o w e r P r o g r e s s

38 Study Quality in Secondary/Meta-analytic Research?

39 Intro M-As = high visibility and impact on theory and practice  quality is critical Several instruments proposed for assessing M-A quality Stroup et al. (2000) Shea et al. (2007) JARS/MARS (APA, 2008) Plonsky (2012)

40 Plonsky’s (2012) Instrument for Assessing M-A Quality Goal 1: Assess transparency and thoroughness as a means to Clearly delineate the domain under investigation Enable replication Evaluate the appropriateness of the methods in addressing/answering the study’s RQs Goal 2: Set a tentative, field-specific standard Inform meta-analysts and reviewers/editors of M-As Organization: Lit review/intro Methods Discussion What items would you include?

41 Plonsky’s (2012) Instrument for Assessing M-A Quality—Section I Combine?

42 Plonsky’s (2012) Instrument for Assessing M-A Quality—Section II

43 Plonsky’s (2012) Instrument for Assessing M-A Quality—Section III

44 Looking FORWARD Recommendations for: -Individual researchers -Journal editors -Meta-researchers -Researcher trainers -Learned societies

45 Consider power before AND after a study (but especially before) p is overrated (meaningless?) especially when working with (a) small samples, (b) large samples, (c) small effects, (d) large effects Report and interpret data thoroughly (EFFECT SIZES!) Consider regression and multivariate analyses Calculate and report instrument reliability Team up with an experimental (or observational) researcher Develop expertise in one or more novel (to you) methods/analyses Love, Luke Dear individual researchers,

46 Dear journal editors, U se your influence to improve rigor, transparency, and consistency It’s not enough to require reporting (of…ES, SDs, reliability etc.) – interpretation too! Develop field-wide and field-specific standards Include special methodological reviews (see Magnan, 1994) Devote (precious) journal space to methodological discussions and reports Love, Luke

47 Dear meta-researchers, Use your voice! Guide interpretations of effect sizes in your domains Evaluate and make knows methodological strengths, weaknesses, and gaps; encourage effective practices and expose weak ones Don’t just summarize Explain variability in effects, not just means (e.g., due to small samples, heterogeneous samples or treatments) Examine substantive and methodological changes over time and as they related to outcomes Cast the net wide in searching for primary studies Love, Luke

48 Dear researcher trainers, Lots of emphasis on the basics: descriptive statistics, sample size+power+effect size+p; synthetic approach, ANOVA Encourage more specialized courses, in other departments if necessary Love, Luke

49 Dear learned societies (AILA/AAAL, LSA, etc.), To Learned Societies (AILA, AAAL, LSA, etc.) Designate a task force or committee to establish field-specific standards for research and reporting practices: (a)at least one member of the executive committee, (b)members from the editorial boards of relevant journals, (c)a few quantitatively- and qualitatively-minded researchers, (d)and one or more methodologists in other disciplines Love, Luke

50 Closure Content objectives: conceptual and practical (but mostly conceptual) Inform participants’ current and future research efforts Motivate future inquiry with a methodological focus Happy to consult or collaborate on projects related to these discussions

51 THANK YOU!


Download ppt "Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative Research Methods LOT Winter School, 2014."

Similar presentations


Ads by Google