Purpose Not how to use statistics in a study, but rather… To help everyone better understand and interpret common statistical methods encountered in SLA studies To cover common errors and issues related to each procedure
Overview Introduction Descriptive Statistics – Harumi T-tests – Philip One-way ANOVA – Peter Factor Analysis – Matthew Q&A
Outline Each presenter will introduce: – The function of the procedure – Important underlying concepts – Its use in SLA research – An example of the procedure in action – Errors and issues to look out for
Descriptive Statistics Harumi Kimura Nanzan University
Q1: Unreasonable fear? Why do so many language teachers draw back in terror when confronted with large doses of numbers, tables, and statistics? It is irresponsible to ignore such research just because you do not have the relatively simple tools for understanding it. J.D. Brown (1988)
Q 2: Values of statistical studies? Individual behavior & Group phenomena Quantifiable data Structured with definite procedures Follow logical steps Replicable ReductivePATTERNS
Q3: What do descriptive statistics provide? Snapshot description of the situation observed Numerical representations of how each group performed on the measures Readers can draw a mental picture
Q4: How do we manage the data? Organize and present the data for further analysis We describe them in/as Graphs Figures
Two aspects of group behavior Mean Central tendency Standard Deviation Variability from mean
Normal Distribution a normal curve Just as in the natural world … Position of an individual Within a group or Comparison of a group with other groups
To conclude Mean and Standard Distribution Normal Distribution These concepts “are central to all statistical research and sometimes forgotten by researchers.” Brown, 1988
t-tests Philip McNally Osaka International University
Function: Comparing two means A t-test will… …tell you whether there is a statistically significant difference in the mean scores (Pallant, 2006, p.206). a.) for two different groups, or b.) for one group at two different times.
Types of t-test One group (Within-subject or repeated measures design) Paired samples t-test Matched pairs t-test Dependent means t-test Two groups (Between-group or Between-subjects design) Independent samples t-test Independent measures t-test Independent means t-test
Uses of t-tests (T)he simplest form of experiment that can be done: only one independent variable is manipulated in only two ways and only one dependent variable is measured (Field, 2003, p.207).
Example: Paired samples t-test One group Time 1 : no extensive reading (IV); vocab test (DV). Time 2 : after extensive reading (IV); vocab test (DV).
Example: Independent samples t- test Two groups Group A - implicit grammar (IV); test (DV). Group B - explicit grammar (IV); test (DV).
Example: Macaro & Erler (2007) A longitudinal study of 11-12 year old British learners of French. The effect of reading strategy instruction. Treatment group: N = 62 Control group:N = 54 Measures taken of reading comprehension, reading strategy use, and attitudes to French before and after the intervention.
Interpreting the data: Macaro & Erler (2007) Results of attitudes to French Area Reading Speaking Writing Listening Spelling General learning Homework Textbook t = 4.91, df = 114, p =.001* t = 2.28, df = 114, p =.024 t = 2.30, df = 114, p =.023 t = 4.12, df = 114, p =.001* t = 3.74, df = 114, p =.001* t = 3.61, df = 114, p =.001* t = 2.92, df = 114, p =.004* t = 3.01, df = 114, p =.005* *p <.006
Types of error Type I error You think you’ve got significance, but you haven’t. You should have adjusted your alpha value if you made multiple comparisons. Type II error You think the difference between the means was by chance. It wasn’t, but because you adjusted for multiple comparisons the data failed to reach significance.
Controlling for Type I error 95% level of significance = 95% sure difference is not by chance 20 comparisons = 1 by chance 100 comparisons = 5 by chance
Controlling for Type I error So, we have to make a Bonferroni adjustment if we make multiple comparisons… Alpha level No. of comparisons 0.05 5 = 0.01 …and use this new figure as your alpha level.
Issues does the data meet normality assumptions? is the sample size large enough? is the data continuous? is Type I error controlled for?
What it is Function – ANalysis Of VAriance - a search for mean differences between data sets – One-way ANOVA - looking for significant differences in the mean scores of 2 or more groups – Why “one-way?” - looking at the effect that changing one variable has on the study’s participants
ANOVA Example 1 ControlTreatment 1Treatment 2 GroupGroupGroup M2● M1● M3 ● similar means (M) = non-significant (p >.05)
ANOVA Example 2 ControlTreatment 1Treatment 2 GroupGroupGroup M2● M3 ● M1● M1 significantly different from M2 & M3, but… M2 & M3 not significantly different from each other p<.05 p>.05
ANOVAs and T-tests Both procedures look for significant mean differences between groups; However, t-tests work best when limited to 2 groups. ANOVAs can work with 3 or more groups while introducing less error.
ANOVAs in Language Research Often used to compare: – Assessment scores – Survey responses A typical situation may be to try different treatments/methods with 3 different groups and then testing them to see if the results show any significant differences
Vocabulary Testing Example 3 learning groups with equivalent starting vocabulary range – Group I learns with word cards – Group II learns with word lists – Group III learns with PC software After several weeks of study, a vocab test is given Results from the test are ANOVA analyzed to see if any groups scored significantly higher/lower
Peer Review Survey Example 3 learning groups in EFL writing courses Students peer review each other’s written work in one of 3 ways – Group I – Written peer review – Group II – Oral peer review – Group III – PC-based peer review After the review sessions, peer review satisfaction surveys are given using Likert (1~5) scales Results are ANOVA analyzed for significant differences in satisfaction level among the review types
Reporting One-way ANOVA Results Three basic components: – 1) Table of descriptive statistics (mean, standard deviation, etc) – 2) An ANOVA table (degrees of freedom, sum of squares, F-statistic) – 3) A report of the post-hoc results with effect size
The F-statistic The higher the better (for the model) A significant F-statistic (p <.05) is what researchers look for
ANOVA in the Literature Descriptive statistics ANOVA statistics F-statistic is significant …i.e. our model seems to work
F-statistic cont. Reaching significance indicates there are statistically important differences between some of the group means But…the F-statistic doesn’t tell us where the differences are For that we turn to…
Post-hoc Results and Effect Size Post-hoc results These are done if the F-statistic is significant Paired comparisons of the group means Tell us where the significant differences lie Often reported in the text (though sometimes in table form) Effect size Often referred to as ‘eta- squared’ or ‘strength of association’ Indicates the magnitude of the difference between means Reflects the total variance effected by the treatments –.01 – small effect –.06 – medium effect –.14 – large effect *According to Cohen (1988) Effect size Often referred to as ‘eta- squared’ or ‘strength of association’ Indicates the magnitude of the difference between means Reflects the total variance effected by the treatments –.01 – small effect –.06 – medium effect –.14 – large effect *According to Cohen (1988)
Reporting Post-hoc Results and Effect Size “Post-hoc comparisons using the indicated that the mean score for Group 1 (M=21.36, SD=4.55) was significantly different from Group 3 (M=22.96, SD=4.49). Group 2 (M=22.10, SD=4.15) did not differ significantly from either Group 1 or 3.” “Despite reaching statistical significance, the actual difference in group means was quite small. The effect size, calculated using eta- squared, was.02.”
Common ANOVA Problems and Issues Starting out with non-equivalent groups Not reporting the type of ANOVA performed Not reporting specific post-hoc results Not reporting effect size
Post-hoc results ANOVA table “[This table] shows the result from running through an ANOVA by using SPSS. It can be seen that the difference among treatments is significant (p < 0.05). The scores for the Vocabulary condition were much higher than the other conditions. The Main Character condition was slightly higher than the Combined condition.” Problems No mention of the type of ANOVA No mention of post-hoc results. – Which groups were significantly different from each other? No mention of effect size. – What was the magnitude of the treatment effect?
Conclusion One-way ANOVAs are useful for looking at the effect of changing one variable on 3 or more equivalent groups Often used for testing treatment effects or comparing survey results Involves a two-step process of analyzing the model (through the F-statistic) and performing post-hoc procedures Effect size (eta-squared) is an important component indicating the magnitude of the treatment effect
Factor Analysis Matthew Apple Doshisha University
FA: What it is Measures only one group or sample population A “family” of FA – PCA, FA, EFA, CFA…
FA: What it does Tests the existence of underlying (latent) constructs within a sample population – Identifies patterns within large numbers of participants – “Reduces” several items into a few measurable factors
Uses of FA within SLA Typically used with psychological variables and Likert-scale questionnaires Often a preliminary step before more complicated statistical analyses – Correlational Analysis – Multiple Regression – Structural Equation Modeling
Example questionnaire factors 1. 英語で外国人と話しがしたい。 I would like to communicate with foreigners in English. 2. 英語習得は自分の教養を高めるのに必要だ。 English is essential for personal development. 3. 日本語でも自分がうまく表現できない。 I am not good at expressing myself even in Japanese. 4. 外国の音楽と文化に興味がある。 I am interested in foreign music and culture. 5. 英語は社会で活躍するのに必要だ。 English is essential to be active in society. 6. 難しいトピックに関しても、自分の意見が言える。 I can express my own opinions even about difficult topics. Integrative Instrumental Self- competence
Terminology Factor - the latent construct Variance - different answers to each item (variable)
More Terminology! Factor loading – Amount of shared variance between items and the factor – Factor loadings above.4 are desirable, above.7 are excellent Cronbach’s Alpha – Measurement of item-scale reliability – Based on inter-item correlation (i.e., the more items, the greater the alpha) – Does not “prove” cause-effect or validity of items themselves
Determining factors, then items Researchers should determine the factors before adding items to the questionnaire – Previous research results – Carefully constructed model Items should be designed to relate to a particular concept (factor) – “Borrow” items or develop them in a pilot – 6-8 items for a robust factor
FA in the literature Item 43 (“The more I study English, the more enjoyable I find it”) F1 (“Beliefs about a contemporary (communicative) orientation to learning English”).630 loading.63 X.63 = 40% of shared variance with the factor (Above.4 is acceptable according to Stevens, 1992)
Assumptions of FA Normal distribution Items are correlated above.3 Large N-size – “Over 300” (Tabachnick & Fidell, 2007) – 5-10 participants for each item (Field, 2005) Ex: 30-item questionnaire 3-4 factors 150-300 participants
Problems and issues with FA “Fishing” for data (i.e., not reading the literature, then simply allowing SPSS to tell you what it finds) Not understanding the nature of factors (i.e., using 2 or 3 items as a “factor” or keeping too many factors) Using an arbitrary cut-off point for factor loadings (typically.3,.32,.35) N-size far too small
Item 38 (“I am very aware that teachers/lecturers know a lot more than I do and so I agree with what they say is important rather than rely on my own judgment”) F3,.33 loading ; F1,.29 loading.33 X.33 = 11% of shared variance with the factor Typical factor loading issues
Conclusions regarding FA Typically used with questionnaires to reduce individual items to factors for purposes of correlation or prediction Helps researchers draw conclusions from a large number of items through data reduction Requires a large N-size and several reference books Often written up with no regard to APA guidelines or previous research results Horribly, horribly complicated
To sum up… Descriptive statistics – Mean and SD T-tests – Dependent, independent, paired One-way ANOVA – F, effect size, post-hoc Factor Analysis – Factor, variance, factor loading
Thank you for attending! CUE SIG Forum 2007 JALT 2007 International Conference Yoyogi Olympic Memorial Youth Center Tokyo, Japan, November 25, 2007