Presentation on theme: "Vocabulary use during conversation: a cross- sectional study of development amongst learners of Spanish and French WORK IN PROGRESS – PLEASE DO NOT CITE."— Presentation transcript:
Vocabulary use during conversation: a cross- sectional study of development amongst learners of Spanish and French WORK IN PROGRESS – PLEASE DO NOT CITE WITHOUT PERMISSION FROM THE AUTHORS Emma Marsden, University of York, Annabelle David, University of Newcastle,
Aims Document / describe progression – useful for teaching practice and assessment Also, indirectly & in long term: –Begin analysis of use of formulaic language –Does it interact with learning a generative grammar? –Begin to explore relationship between learner’s vocabulary and their morphosyntactic development –L1 and early bilingual literature suggests causal link
Outline The task, data and participants Results 1. General diversity of types & counts of tokens 2.Use of different word classes Nouns versus verbs 3.Diversity of inflections 4.Formulaic language Conclusions
Can we measure lexical knowledge from oral corpus data? Size of vocabulary - no –need to give them tests Richness, sophistication, rarity – yes, but not yet –needs a comparison with word lists. –From relevant corpus i.e. oral, and L2 classroom learners – not available! Diversity, or lexical variation - YES! –when they produce language, how often do they have to repeat the same words? –what is the balance of nouns, verbs, adjectives?
The task Photos task: semi-guided interview / conversation –Descriptions of photos –Questions about photos –Discussion around photos, relating to past, current and future activities.
The Participants English speaking learners of French and Spanish From years 9 and 13 (approximately 230 and 600 hours classroom instruction respectively) twenty learners in each group in each language. –twenty final year undergraduates in Spanish –native controls 15, age-matched Spanish natives, and five adult French natives. –approximately 120 participants in total
The Data: Which bits of speech are ‘words’? Data excluded from the analysis –Filled pauses (er…) –Repeated language (with or without corrections) –Imitations of researcher –Words in another language (e.g. French & English) Data included –Including made-up words or incorrect words e.g. mi hermano nadar (for nada) –Final repair –Some lemma (stem) counts: va, vamos = 1 –Some whole word counts: va, vamos = 2 –Some counts of just the inflections
Types & tokens Ojos…ojos –1 lexical type –2 lexical tokens Mira….miran…miras… –1 lexical type –3 lexical tokens
Types and tokens based on LEMMAS Group (n) tokens (st.dev) different types (st.dev) TTR SpFrSpFrSpFr year 9 (20) 194 (117) 230 (115) 64 (28) 65 (24).387* (.122). 311 (.092) year 13 (20) 523 (134) 529 (191) 155 (32) 142 (38).300* (.028).279 (.038) The more diverse the speech, the higher the TTR. According to TTR, year 9 have a more diverse vocabulary than year 13 (Spanish) and no difference in French! TTR is problematic, not a valid measure (but it is standard output in CLAN FREQ commands )
Compensating for influence of text length Guiraud index Types/√tokens. D Uses random sampling of tokens in plotting curve of TTR against increasing token size. Calculated by vocd in CLAN software –Usually correlates well with Guiraud
D Group (n)D based on words (st. dev) D based on lemmas (st. dev) SpFrSpFr year 9 (17) (8.55) (11.41) (5.81) (6.82) year 13 (20) (12.99) (13.77) (7.92) (57.34) undergraduates (20) Natives
Results 2: Use of word class types 1.Basic descriptions of use: nouns, verbs, adjectives, interrogative pronouns, adverbs 2.How ‘nouny’ are their productions? What proportion of word types belong to a certain class? What is the density of different word classes in total productions? 3.Is the diversity of nouns different to the diversity of verbs? 4.Do these give any indication of progression?
Basic description: Adjectives and adverbs *lemmas, not colours **lemmas, not y/n Types of adjectives* Tokens of adjectives* Types of adverbs** Tokens of adverbs** SpFrSpFrSpFrSpFr Year 9 (20) 1.5 (1.4) 1.6 (1.7) 2.3 (2.6) 2.0 (2.3) 4.1 (2.4) 2.6 (2.8) Year 13 (20) (4.4) 8.4 (4.8) (6.6) 11.7 (6.7) 12.1 (2.7) 23.6 (9.7)
Basic description: Creo que, el hombre que…, and interrogative pronouns Tokens of que as conjunction + relative Types of interrogative pronouns (lemmas) Tokens of interrogative pronouns (lemmas) SpFrSpFrSpFr Year 9 (20) 0.7 (1.3) 1.2 (0.9) 2.6 (2.5) Year 13 (20) 6.8 (6.6) 1.8 (0.8) 5.9 (3.2)
A nouny style Year 9 (230 hours instruction) *P02:two chi eh dos chicos un camisa Southampton. *MJA:now I would like you to ask me questions about the pictures so... *P02:hermanos ? *MJA:eh estos son hermanos sí mmm.
How much speech is nouns & verbs? *all are based on lemmas Noun Types / Total Types* Noun Tokens / Total Tokens Verb Types / Total Types Verb Tokens /Total Tokens SpFrSpFrSpFrSpFr Year 9 (20) 34% (11) 28% (5) 28% (11) 17% (4) 12% (4) 12% (4) 15% (5) 16% (5) Year 13 (20) 28% (4) 25% (4) 18% (2) 12% (2) 15% (2) 15% (2) 18% (3) 19% (3) Proportion of types out of all types (see e.g. Kauschke and Hofmeister, 2002
How much speech is adjectives? Group (n) Adj tokens out of all tokens Adj types out of all types SpFrSpFr year 9 (20) 1.0%0.7%1.9%2.0% year 13 (20) 2.7%2.1%6.7%5.6% undergraduate (20) Natives
Comparing diversity of noun types to diversity of verb types Malvern et al (2004) propose the ‘Limiting Relative Diversity’ calculation to compare the diversity of different word classes when token samples are different Implemented by CLAN vocd software –Square root of division of diversity of one word class by the diversity of the other –Needs at least 50 tokens of noun, 50 of verb
Limiting relative diversity Group (n)LRD (verbs / nouns) SpFr year 9 (n Sp = 5) (n Fr = 4).366 (.061).353 (.095) year 13 (n Sp = 18) (n Fr= 13).425 (.089).341 (.073) NO stat sig. differences between diversity of verbs and diversity of nouns between year 9 and 13. Unreliable (Small sample sizes) or new nouns and verbs learnt at same rate??
BUT LRD correlates well with: –proportion of verb types / total types (r=.786**) –verb tokens / total tokens (r=.862**) –verb noun ratio (r=.862**) –And these all DO increase between yr 9 & 13 Need year UG & natives to validate LRD
Results 3: Inflectional diversity Inflectional diversity total number of words - total stem forms = number of inflectional variations on stem forms i.e. how well are the learners manipulating stem forms See Malvern et al. (2004).
Inflectional Diversity Group (n)Inflectional diversity (D words-D lemmas) (st. dev) SpFr year 9 (Sp n=17) (Fr n = 20) (4.38) 8.4 (5.3) year 13 (20) (5.59) (7.49)
Does verb use correlate with inflectional diversity? Broeder, Extra, van Hout (1993) found verb use indicates progression –See also NSF data, and argument by Myles (2004) Correlating lexical and inflectional diversity with verb/noun proportions…
Indicators of development? As learners use more verb types, they use more inflections (strong positive correlations) Inflectional diversity does not seem to correlate with use of other word classes Nouns (tokens & types) decrease, verbs increase (strong negative correlations)
Results 4: Formulaic language – lexical items? Criteria for a ‘chunk’ (Myles et al, 1998) –Greater length and complexity of sequence compared with other learner output; usually well-formed –Often used inappropriately (syntactically, semantically, pragmatically), e.g. overextensions
Formulaic language (chunks) asking about people in photos: P02:eh dónde vives ? *MJA:mmm ellos ? …ellos viven en Southampton. *P02:mmm cuántos años tienes ? *MJA:eh ellos. *P02:tú ? *MJA:tienen doce y trece.
Chunks even when some verbs appear to be manipulated *P03:come... lleva... están.... hacen....están jugando...son...jugan...jugo...tengo...voy...voy a ir BUT THEN...eh cuánto años tienes ? (for how old is he?) BUT later: mi hermano tiene once años y mi hermana que se llama Ellie y tiene ocho años MJA:qué haces un sábado normal en tu en tu vida ? *P02:jugar al fútbol en mañana y salgo con mis amigos en tarde CONTEX-DEPENDENT ACCURACY: CHUNKS, or item by item learning?
Conclusions The tasks in SPLLOC and FLLOC seemed to elicit broadly similar language Greater verb density seems to indicate progression 450 more hours instruction does make significant difference –both for vocabulary diversity and inflectional diversity –previous comparisons between smaller gaps suggest no gains Formulaic language –Evidence for item by item learning (constructionist)?
Limitations Only one measure of lexical knowledge – productive, oral This quantitative approach doesn’t tell us about accuracy of lexical or inflectional use (e.g. gastar (spend) time) We can say positive correlation between inflectional and lexical diversity – but this product data does not tell us whether increase in vocabulary enables processing of morphosyntax
Future directions Comparisons with undergrads and native controls A richness measure –will be based on rarity WITHIN our own corpus Analysis of closed class items, using CLAN’s list Further exploration of relationship of increased lexical knowledge, increased verb types and emerging morphosyntax