Presentation on theme: "Morphology and word processing"— Presentation transcript:
1Morphology and word processing ’Mieli ja aivot, kieli ja kognitio’: MorphologyTi ,Morphology and word processingRaymond BertramDepartment of PsychologyTurku University
2Finnish morphologyOmnipresent: most words in newspaper are morphologicallycomplex, contain more than one morpheme:HS, : Rakennusalan työturvallisuus Suomessa Pohjoismaiden heikoin: Suomessa rakennustyömaiden työturvallisuus on heikompi kuin muissa Pohjoismaissa. Suurten rakennuskonsernien NCC:n ja Skanskan sisäisissä vertailuissa suomalaisten tytäryhtiöiden työmaat ovat muita pohjoismaisia työmaita turvattomampia.Rich: many possibilities in combining morphemes
3Word forms in the Finnish language: some facts and figures Affixes that can be added to a Finnish noun:-number (singular, plural)-case (over 10)-possessive suffix (2 x 3)-enclitic particles (-kin, -pa, -han, -ko, etc.).about forms for each noun, about forms for each adjective, forms for each verbderivation and compounding adds up to the formsEPÄONNISTUMATTOMUUDESSAMMEKOHAN
6What is morphology? Study of the internal structure of words: morph-ology word-s jump-ingKnowledge of the words of a language can’t be summarized in a finite list: we need to know the principles of word-formation
7What is morphology?Morphology is the study of the way words are built up from smaller meaning-bearing units, morphemes.e.g. talo + ssa + ni + kin; house+sMorphemes…1. cannot be subdivided in smaller meaning-full units.ta+lo or ki + n2. contribute to meaning of a word.talo ’house’, –ssa ’in’ or –kin ’also’3. can appear in many different words.taloon or talonmies or metsäkin4. do not necessarily coincide with syllable structure.ta.los.sa
8What is morphology? Some basic concepts Free vs. bound morphemesStems vs. affixesAffixation typesAllomorphsTypes of word formation: inflection, derivation, compounding
9Morphemes: free vs. bound Free morphemes: morphemes that can stand on their own, i.e., can form a word by themselveslexical morphemes: talo ’house’, car, dog, cat etc. (an open-ended class: new morphemes can be added)grammatical morphemes: e.g., determiners the, a, every, most etc. (a closed class -> all members can be listed)Bound morphemes: must be attached to/appear with one or more morphemes to form a wordhatuton ’hatless’ -> *hatu, *ton
10Stems vs. affixesMany morphologically complex words are composed of stems & affixesStems:principal components of wordssupply the main (lexical) meaningtalo + ssa, house + sAffixes:supply additional meaning”in” ”plurality/’many’”
11Affixation types Affixes are always bound morphemes. Concatenative morphology uses the following types of affixes:prefixes, e.g. epä- in epä+olennainen, un+realsuffixes, e.g. –ssa in talo+ssacircumfixes, e.g. German ge- -t in ge+sag+t ([have] said)
12In non-concatenative morphology the stem morpheme is split up In non-concatenative morphology the stem morpheme is split up. The following types of affixes are used:infixes: bili: ‘buy’ => bumili: ‘bought’ (Tagalog);fan-****-tastic; abso-blooming-lutelytransfixes, e.g. Hebrew, l+a+m+a+d (he studied), l+i+m+e+d (he taught), l+u+m+a+d (he was taught)
13AllomorphsAllomorph are variants of a morpheme: one function/meaning, many forms.stem allomorphy: hattuun vs. hatussakäsi, käde+n, kät+tä, käte+ensuffix allomorphy: metsässä vs. talossatalo+on, metsä+än, talo+i+hin, huonee+seen, huone+i+siin, maahan
14Inflection, Derivation, & Compounding There are three broad classes of ways to form words from morphemes: inflection and derivation and compounding.
15InflectionInflection: minimal change due to affixation => new word formsplural: house -> housespast tense: work -> workedDoes not alter the category of the stem: worked is a verbCompositional (meaning can be easily predicted on basis of morphemes)Very productive
16Derivation Derivation: creation of new words by affixation agentive: to teach -> teacher ’someone who teaches’collective: kirja ’book’ -> kirja+sto ’library’location: kahvi ’coffee’ => kahvila ’cafe, coffee house’May change the syntactic category: [teach]Ver]N, [kahvi] N la]NMeaning is less (un) predictable from the parts – semantically less transparentProductivity is restricted -> derivational affixes are some times called semi-productive
17CompoundingCompounding: creating a words by combining two (or more) real wordsCompounding is very productive (in both Finnish and English).The simplest case consists of two words concatenated together: auto+talli, joukkue+henki,In Finnish, the right-hand part is a so-called head that determines the syntactic category (word class) and the meaning: autotalli is more an auto than a talliThe left-hand member is the modifier: morphology+article is an article about morphologyFrom transparent to opaque: joukkue+henki: the meaning is the ”summed” meaning of the parts vs. helppo+heikki or red+neck: meaning cannot be directly computed from constituent morphemesHowever, even in transparent cases meaning relationship between constituent is pluriform: pine tree (be); music box (make); night flight (in) abortion problem (about)
18Does morphology play a role in daily language processing? Intuitive grounds: we are able to form new words on a daily basis according to morphological principles. We are also able to understand new morphologically complex words that we have never encountered before. Loopvliegen ’walkfly’; neusdoek ’nenäliina’; computer hand; FinishnessObservations:Speech errors: Speakers make morphologically structured slips: it’s not only we who have screw looses (for ”screws loose”); easy enoughly (for ”easily enough”).Child language: Children overgeneralize during the course of acquisition: *goed (for went)Aphasics can produce morphologically structured neologisms: he *jumberfoked and off he went
19Does morphology play a role in daily language processing? Experimental evidence (Taft & Forster 1975):Non-word interference: real (bound) stems are rejected as non-words in lexical decision more slowly than pseud-stems: -vive (revive) vs. –lish (relish)Same thing happens with non-words that are composed of an illegal combination of an existing prefix and stem (dejoice) take longer to reject than non-words consisting of a pseudo-stem (dejouse)-> bound stems have an access representation (morphological structure): prefixed words are decomposed in lexical access
20Does morphology play a role in daily language processing? Experimental evidenceMorphological primingcars primes car: solid effects, long-lasting, facilitationOrthographic: card – car; variable effects, short-lived, inhibitionSemantic effects: priming for morphologically related, but semantically unrelated words create-creation vs. create-creature (Feldman & Stotko, 1992); Emmorey (1989) priming between permit and submit.=> Morphological effects cannot be reduced to either form-based or semantic effects
21Does morphology play a role in daily language processing? Experimental evidenceBase frequency effectsryhmässä vs. kerhossafrequency ryhmässä = frequency kerhossa ( n. 35)frequency ryhmä > much bigger than frequency kerho 9531 vs. 648)If frequency of stem/base has an effect, morphological structure has been used in course of processing(Taft, 1979; Bradley, 1980; Burani & Caramazza, 1987 Bertram et al., 2000)
22Does morphology play a role in daily language processing? In summary, yes morphology does playa role, the question is how and when?
23Human Morphological Processing How are multimorphemic words represented in the minds of human speakers?compositiontalo+ssasyntactic/semantic levelmental lexicon: words & morphemestalossa talo ssasegmentationtalo+ssaletter levelt a l o s s aphoneme levelvisual feature levelauditory feature level
24mental lexicon: words & morphemes talossa talo ssasegmentationtalo+ssaroute 1route 2letter levelt a l o s s a
25Human Morphological Processing How are multimorphemic words represented in the minds of human speakers?1. Full Listing (letter => word)vs.2. Morphological Decomposition(letter => segmentation => morpheme => composition => word)3. Dual Storage (1 or 2; 1 and 2)
26The morphological processing question: What’s in the mental lexicon and what not? koulu/maailma, talo/ssa, kirja/sto, puhu/minenMental lexicon:all full formsall morphemes3a some words fully, some not3b all words fully, and in morphemestalossakoulumaailmakirjastopuhuminen-ssatalokirjaminenkoulu-stopuhumaailmatalokoulumaailmakirjastopuhu-ssa-minen
27The dictionary question: What’s in the dictionary and what not? koulu/maailma, talo/ssa, kirja/sto, puhu/minenNykysuomenSanakirjatalokoulumaailmakirjastopuhua3a.talokoulumaailmakirjastopuhu-ssa-minen
28How are multimorphemic words represented in the minds of human speakers? My own position3a. Dual Storage: 1 or 2, some words full storage, some not:+ all novel words cannot have obtained full-form storage: computer hand+ all non-novel words have obtained full-form storage: talossa- all non-novel words storage is morpheme-based as well: talo+ssa3b. Dual Storage: 1 and 2: all words full & morphemic storage- all novel words cannot have obtained full-form storage: computer hand+ all non-novel words storage is morpheme-based as well: talo+ssa3c. Dual Storage: some words morpheme-based storage, some words full & morphemic storage
29How are multimorphemic words represented in the minds of human speakers? Question: how are storage and processing relatedif there is a double representation, do we use them both (talossa + talossa)Question: why do often observe that morphologically complex words are processed holistically(that is, why do we often not find morphological effects for existing complex words)Question: why do often observe that morphologically complex words are processed via morphemes(that is, why do we often only find morphological effects for existing complex words)My answer: if you don’t see something, it doesn’t mean that it isn’t there
30Dual Route Models MRM (Morphological Race Model) e.g., Schreuder & Baayen (1995); Bertram, Schreuder & Baayen (2000)Representations for both full-forms and morphemese.g., walked activates /walk/ /ed/ and /walked/Recognition is attempted via whole word and morpheme-based representations simultaneously -> Parallel Dual Route ModelWhether full-form or decompositional route wins is dependent on various properties of the whole word and the comprising morphemes
31Factors that determine who is winning the race 1. frequency whole word/frequency morphemes ratio* Colé, Segui, Taft, 1997: silmät (70% plural) vs. nenät (10% plural)2. frequency whole word (aamulla; aphasic patient H.H., Laine)3. properties of affixes4. word length X morphology5. ease of segmentation
323. The of Storage (full-form processing) and Computation (morpheme-based processing in Finnish and Dutch: properties of affixesBertram, Schreuder & Baayen, 2000: Specific PropertiesAffixal Homonymy => Storage? (‘warmer’-‘rower’; tutkija-lukkoja)Productivity => Computation? (warmth-emptiness; kahvila-rahaton)Word Formation Type Storage? <=> Computation? derivation vs. inflection => (SAID)emptiness-laughed; kirjasto-talossaLanguage Storage? <=> Computation? Dutch vs. Finnish
33Experimental method Visual lexical decision – testing word length SEMINAARINO YES600 MSMINASEERINO YES700 MSTALONO YES500 MSDependent variables: Reaction time and error ratesE.g., after 20 items per condition, 100 ms or 5% error difference in favour of short words=> word length has an effect!
34VISUAL LEXICAL DECISION: Exp. Methods 5 Dutch Suffixes and 1 Finnish in The Taft-way (1979)Experiment 1: Surface Freq. Exp. (N=20)dieper ‘deeper’ <=> rauwer ‘rawer’Fbase = 148Fsurface >Experiment 2: Base Freq. Exp. (N=20)blonder ‘blonder’ <=> nobeler ‘nobler’Fbase > 21Fsurface = 1.3Matching on rel. factors like length & bigramfr. __________________________________If high-freq. condition elicit faster rt’s than low-freq. conditionin Exp. 1, surface freq. exp. => storagein Exp. 2, base freq. exp. => computation
35VISUAL LEXICAL DECISION: Exp. Methods The Finnish way (Experiment 1 ) monomorphemic words as baselineMatching onmm.: hevonen ‘horse’ Lemma, Surface,vs. Bigram Frequency,inf.: talossa ‘house + in’ Vf, Word Length1. rt mono = rt complex => storage2. rt mono < rt complex => computation3. rt mono > rt complex => stor.& comp.
36Language X Factor Interaction? SUFFIX WFT PRO. HOM.? E.G.1a. V-te INF YES NO LACHTE1b. N-ssA INF YES NO TALOSSA1a. Dutch -te 1b. Finnish -ssABASE SURFACERTHF LF HF LF MM INFComputation for Finnish and Dutch Inflectional Suffixes
37Hypo 1: Morphology plays bigger role in Finnish than in Dutch * AS YET it seems that processing complexwords in typologically quite differentlanguages such as Finnish and Dutch areco-determined by the same factors.* However, we have been investigating 3factors in two languages only and the wordsinvestigated were bimorphomic.
38Hypo 2: Specific properties of complex words determine whether lexical access is more or less holistic* Indeed, for both languages, theof Storage and Computation is co-determined by the INTERPLAYof factors such as Affixal Homonymy,Word Formation Type andAffix ProductivitySEE FIGURE
40CONCLUSIONS1. Suffixes should be checked for their properties and be studied one by one2. Neither Full Parsing or Full listing Models nor Models in which coarse distinctins are made between certain subsets (prefix-suffix; inflection-derivation) can account for the variable data pattern as found here3. Data pattern is compatible with race models in which the winner is determined by factors as investigated here
41Factors that determine who is winning the race 1. frequency whole word/frequency morphemes ratio* Colé, Segui, Taft, 1997: silmät (70% plural) vs. nenät (10% plural)2. frequency whole word (aamulla; aphasic patient H.H., Laine)3. properties of affixes4. word length X morphology5. ease of segmentation
42Word length X Morphology: Visual acuity and word processing1. Is the use of morphemic constituents in processing long compounds due to visual acuity limitations?fovea:parafoveaoppilas/määrä
43Experimental methods in our lab Eye Movements tracked with EyelinkEye movements can give one a goodinsight in language processing!Duration of fixation and length of saccade (among others) tell you something about processing behaviour!
44Visual acuity and word processing: Indeed we know that processing of words like vaniljakastike goes by via the constituents vanilja and kastike.On top of that, early processing of the compound is limited to the first constituent. (HP1998; PHB2000)fovea:parafoveavanilja/kastike
45vanilja/kastike sivu/ovi Is this purely due to visual acuity limitations?1vanilja/kastikesivu/ovi’side door’A.B.’vanilla sauce’If morphological structure has a more independent role, processing of shorter compounds SIVU/OVI like should go by via constituents as well!
46Morphology X Word Length, XP1,2 BH2003 XP1: For long and short compounds, similar manipulation of first constituent frequency.Long: HF: tutkimus/retki vs. LF: akilles/jänneShort: HF: työ/puku vs. LF: hätä/tila(respectively: research trip, Achilles’ heel, working outfit, emergency situation)XP2: For long and short compounds, similar manipulation of whole-word frequency.Long: HF: keskus/sairaala vs. LF: kirja/kahvilaShort: HF: kesä/loma vs. LF: veri/koe(respectively: central hospital, book café, summer holiday, blood test)
47Eye movements in our laboratory * Including words of special interest in single sentencesHF: Saarinen kysyi: “Minkälainen KESKUSSAIRAALA on ...LF: Saarinen kysyi: “Minkälainen KIRJAKAHVILA on ...
51Also later involvement of 1st constituent for long compounds, but not for short compounds. Shortcompounds are overall processed in a holistic manner.
52Conclusion vanilja/kastike sivu/ovi 11vanilja/kastikesivu/oviSupport for visual acuity hypothesis: for long compounds initial access of 1st constituent due to visual acuity benefit of 1st constituent over the latter part of the word.Use of 1st constituent in early and also late processing of long compounds is driven by visual rather than structural principles
53Factors that determine who is winning the race 1. frequency whole word/frequency morphemes ratio* Colé, Segui, Taft, 1997: silmät (70% plural) vs. nenät (10% plural)2. frequency whole word (aamulla; aphasic patient H.H., Laine)3. properties of affixes4. word length X morphology5. ease of segmentation
545. Ease of segmentationAre there certain cues we use in parsing out constituents
55Our intuition was that readers must use some cues that would enhance parsing(has never been established in reading though)One cue we considered: ’VOWEL QUALITY’
56Vowel Harmony Category 1: A O U Back Vowels äöy: front vowelsVowel HarmonyCategory 1: A O U Back VowelsCategory 2: Ä Ö Y Front VowelsCategory 3: E I Neutral VowelsBasic Phonological Rules:1 + 3 or do co-occur: Seppo, enää1 + 2 do not co-occur in a word: talo+ssa, pöly+ssä
57aou: back vowelsäöy: front vowelsVowel Harmony1 + 2 do not co-occur in a word: talo+ssa, pöly+ssä, except thus incompounds: e.g. SELKÄ/ONGELMA ’back problem’ different vowel quality at constituent boundary (cb)though not necessarily, e.g., RYÖSTÖYRITYS ’robbery attempt’Question: Will a compound word be easier to process (all else equal) if one constituent has front vowels and the other has back vowels?
58Vowel Quality Question: Is a word like aou: back vowelsäöy: front vowelsVowel QualityQuestion: Is a word likea. ryöstöyritys with same vowel quality at cbmore difficult to process than a word likeb. selkäongelma with different vowel quality at cb
59Examples different vowel type-condition (dvt): aou: back vowelsäöy: front vowelsExamplesdifferent vowel type-condition (dvt):Ystäväni kertoi, että selkä/ongelma oli nyt taaksejäänyttä elämää.My friend told me, that the back problem was now behind him.same vowel type-condition (svt):Ystäväni kertoi, että ryöstö/yritys oli jättänyt hänelle pysyviä traumoja.My friend told me, that the robbery attempt had left him with permanent traumas.
60EXPERIMENT 1, RESULTS same: ryöstöyritys different: selkäongelma aou: back vowelsäöy: front vowelsEXPERIMENT 1, RESULTSDifference between different vowel quality (selkäongelma) and same vowel quality (ryöstöyritys) condition in the following measures: gaze duration (Gaze), 1st fixation duration (FFD), 2nd fix. location (SFL), 3rd fixation location (TFL)same: ryöstöyritysdifferent: selkäongelma
61aou: back vowelsäöy: front vowelsVowel QualityIn sum, parsing of processing relatively low-frequency compounds seems to be complicated by vowels of the same quality in the two constituentsVowel quality difference across constituents serves as an efficient cue for parsing
62Factors that determine who is winning the race 1. frequency whole word/frequency morphemes ratio* Colé, Segui, Taft, 1997: silmät (70% plural) vs. nenät (10% plural)2. frequency whole word (aamulla; aphasic patient H.H., Laine)3. properties of affixes4. word length X morphology5. ease of segmentation
63ConclusionsThere are all kind of factors that determine whether the direct/full-form route or the indirect/parsing/decomposition route dominates the processing of morphologically complex wordsEvidence for dual route model 3c. (some words morpheme-based storage, some words full & morphemic storage)?Massive priming between morphological relativesFactors that we considered logically complicate or facilitate one or the other routeMorphemic and whole word effects can be observed at the same timeAt any rate, we may say that the role of morphology is not an all-or-nothing affair.
64VERY GENERAL CONCLUSION MORFOLOGIAN ROOLIA EI PITÄISI ALIARVIOIDAARVOITUKSELLISIMMISSAKAANKIELISSÄ, EIKÄ EDES KIELISSÄ KUTEN HOLLANTI!(the role of morphology should not be underestimated, neither in the most mysterious languages, nor in languages like Dutch)
66Human Morphological Processing ExperimentsEvidence for morpheme-based processing in Finnish: Patient H.H.Left hemisphere damage aphasia, producing (seemingly) morphological errors in various single-word processing tasksExamples of HH’s morphological errors:junasta ”junalle”eläimessä ”eläin”puhuja ”puhetta”
67Human Morphological Processing ExperimentsEvidence for morpheme-based processing in Finnish: Patient H.H. (Laine et al., 1995)HH’s morphological difficulties were multimodal (reading, repetition, word production)Conclusion?The deficit is of central origin, most probably at the semantic-syntactic level
68Human Morphological Processing ExperimentsEvidence for morpheme-based processing in Finnish: Patient H.H.(2) HH found reading of inflected words extremely difficult while derived words went equally well (or, rather, equally badly) as monomorphemic words. These difficulties were accompanied with increased eye fixations.Conclusion?Inflected but not derived Finnish nouns are morphologically decomposed
69Human Morphological Processing ExperimentsEvidence for morpheme-based processing in Finnish: Patient H.H.(3) As an exception to point (2), HH read very high-frequent inflected word forms (e.g., aamu+lla) equally well as comparable monomorphemic wordsConclusion?Even inflected forms (those of very high frequency) can develop full-form representations in the mental lexicon
70Human Morphological Processing ExperimentsEvidence for morpheme-based processing in Finnish: Patient H.H.(4) Different stem allomorphs (e.g., seiväs, seipää-) were processed equally well.Conclusion?Stem allomorphs have their own representations in the mental lexicon
71Discussion topic: which factors might affect the way multimorphemic words are stored in the mental lexicon? How would each factor affect the storage? (see, e.g., HH’s results: very high-frequent inflected form full form storage; inflected forms representing other frequency ranges morpheme-based storage)red+neck vs. air+plane; walk+ed vs. went;re+search vs. search+er; in+sufficient vs. in+clude;head+ache –head+quarters
72Morphological processing: theories Full listing approaches:Butterworth (1983):All words are accessed and represented as full-forms, i.e., no morphological structure even at the central levelProductivity constraint: how are novel complex words recognized? -> assumes a rule-based back-up procedure that ”jumps in” when new words come alongRich morphology: how do we do it in Finnish or Turkish where the potential number of inflected forms alone can be thousands (or tens of thousands) per given lexeme?Satellite model (e.g., Lukatela et al., 1978)Separate entry for each inflectional variantForms are organized as satellites around the base form (nom. singular) functioning as a nucleus via which all processing goes
73Morphological processing: theories Full listing approaches:Network models (example Bybee, 1995)Only whole word forms representedHowever: no list metaphor -> representations are part of an associative network with interconnections between phonological and semantic levels only (only form and meaning)No morphology as such: morphological generalizations are seen as emerging from the form-meaning connections (epiphenomenon - no separate level).Generalizations are gradient and schematic -> no discreet symbolic representationsConnectionist network models are built upon similar assumptions
74Morphological processing: theories Obligatory decomposition:Taft & Forster (1975)Prefix stripping: in lexical access the prefix is stripped off and ”sent” directly to the central level; access takes place via the stemAt the central level the legality of the combination is checked -> word is recognizedMorphologically structured stem-based shared representation for all morphologically complex words with the particular stem.in-
75Dual Route Models AAM (Augmented Addressed Morphology) Caramazza et al. (1988)Representations for both full-forms and morphemese.g., walked activates /walk/ /ed/ and /walked/All familiar words are accessed via the whole word route -> decomposition route operational only for novel (unfamiliar) words -> the whole word route always winscf. full-listing with decompositional back-up (Butterworth)MRM (Morphological Race Model)e.g., Schreuder & Baayen (1995)Recognition is attempted via whole word and morpheme-based representations simultaneously -> Parallel Dual Route ModelWhether full-form or decompositional route wins is dependent on various properties of stems and affixes, e.g., frequency, semantic transparency, phonological (formal) transparency, affixal homonymy, the ratio with which the form serves as a real and pseudo-affix -> complex interplay
76MRMThree-level interactive activation framework (whole word access) with mechanism for carrying out symbolic computations on representations that become available at different stages (decompositional access)Activation feedback mechanismRoughly: any factor that complicates morpheme-based access ends up increasing the activation level for the full forms and enhances whole word access
78SAID: evidenceInflected words are decomposed in lexical access, derived words are notInflected nouns take longer to recognize in lexical decision than (matched) monomorphemic nouns -> cost associated with morphological decompositionAn Aphasic patient H.H. had significantly more difficulties in reading inflected than either derived or monomorphemic nouns (no difference between the latter)-> inflected words decomposed, derived words notHowever, H.H. read high frecuency inflections (töissä ’at work’) as well as monomorphemics -> high frequency inflected nouns have full-form representationsPhonological transparency (hattu+ja ’hats’ vs. hatu+ssa ’in a hat’) did not affect either H.H’s performance or lexical decision times -> stem allomorphs have separate representations in the mental lexiconThus: inflected nouns are decomposed whereas derived nouns are processed as wholes
79SAID: further evidence Allomorphy and decompositionfree-standing allomorphs prime the respective nominative singular in both unmasked and masked priming (käde primes käsi)DerivationOnly full-form effects for even productive derived words in –ton (hatuton ’hatless’) in a frequency manipulation experimentProblems accumulated over the last years:The whole-word frequency (rather than the stem frequency) of an inflected Finnish word may correlate with reaction time in lexical decision: Affixal homonymy with partitive ending –jA (auto-ja) agentive suffix –jA (opettaja)In certain sentence contexts, the processing cost associated with inflected Finnish nouns seems to disappearAt least productive derivational endings attached to nonword “stems” (e.g., *rono+nta) can slow down rejection latenciesSuffix allomorphy modulates the processing of derived words (Järvikivi, Bertram & Niemi, in press)