Corpus Linguistics meets Lexical Semantic Theory James Pustejovsky Brandeis University University of Pavia December 15, 2004.

Corpus Linguistics meets Lexical Semantic Theory James Pustejovsky Brandeis University University of Pavia December 15, 2004

2 Background Joint work with Patrick Hanks and Anna Rumshisky Research funded by NSF References: –Pustejovsky, J., P. Hanks, and A. Rumshisky (2004) “Automated Induction of Sense in Context”, Proceedings of COLING, Geneva. –Pustejovsky, J. and P. Hanks (2001) “Very Large Lexical Databases”, Tutorial Notes from ACL, Toulouse.

3 Outline Corpus Linguistics needs Theory –Linguistic Theory needs corpus data Assumptions about Lexicons –Lexicons are for something Remarks on Lexical Architectures -Possible versus Probable Meaning Encoding Context for a predicate –Capturing word senses through context Semantic Induction from Corpora –Theory guides clustering

4 Building Lexicons with Corpus Analysis Regarding Lexicons: –Lexicons are for some purpose or task. –There is no one lexicon but multiple lexicons. Regarding Senses (GL, 1995): –Words have senses, but there are no finite number of senses independent of the contextualized use of words in composition Words have meaning potentials: –Words are active objects with functional behavior.

5 What is the relationship between corpus and lexicon? Corpus: –an accumulation of tokens Lexicon: –an ordered collection of word-types (lemmas), with data attached.

6 As corpora grow: There is a continuing flow of new nouns: There are very few new verbs and adjectives: –but increasing number of contexts for them. No new function words.

7 Content of Lexicons for real Applications Proper Names: –humans, locations, institutions, brands, products Open class items: –nouns, verbs, modifiers Multiword Expressions –compounds, idioms, collocations, constructions

8 Things to hang on Words: Inflectional forms of the lemma Phonetic form Syntactic categorization Subcategorization Semantic Type Typical Contexts (phraseology) Co-specifications Implicatures (contextually determined) Translations Examples of usage Probabilities

11 Lexicon Design should: Enable the Possible: Be tempered with the Probable: Be embedded within a specific application: –instances of actual.

12 Selectional Features from Corpora Selection doesn’t specify exactly how a word is going to behave on all occasions. Rather: Selection specifies how words typically behave: The typical is the foundation for forecasting the probable: The probable comes from corpus and the cospecifications associated with words.

13 Lexical Acquisition Goals: -Acquisition of subcategorization using corpus analytics; -Learning selectional associations; -Clustering of complementation patterns All are necessary techniques, but: -There must be an initial lexical architecture; -Efficacy of the results depends on application model and corpus available.

Encoding Context What is the Context of a Linguistic Utterance? Local context characterized as Strong Selection Broad context captured in part by Weak Selection Words encode context as types; Compositional rules refer to these types: Types can be selected; Types can be coerced. Types can be exploited. Composition can license new interpretations.

Basic Generative Lexicon Two classes of sortal constraints on a concept: – Argument structure – Event structure These bind into the Qualia Structure Compositional Rules invoke Type Selection Type Coercion: Inviolable Selection Type Exploitation: Subselection of type features

 Formal: the basic category which distinguishes it within a larger domain;  Constitutive: the relation between an object and its constituent parts;  Telic: its purpose and function;  Agentive: factors involved in its origin or “bringing it about”. Qualia Structure

17 Types and Words Select Different Things Types: -Operation: Selection Restrictions (semantic typing) -Result: Possible combinations Tokens: -Operation: Corpus Selection (cospecification) -Result: Probable combinations

18 Sense of a word depends on its context Peter treated Mary badly. Peter treated Mary with antibiotics. Peter treated Mary with respect. Peter treated Mary for her asthma. Peter treated Mary to a fancy dinner. Peter treated Mary to his views on George W. Bush. Peter treated the woodwork with creosote. Consider the word treat: Dictionaries do not provide the contexts that distinguish one sense of a word from another.

19 Problem: what context is relevant? The more senses a word has, the greater its lexical entropy. How to decide what context features determine the sense of a word? We want a data-driven sense definition. –Sort contexts of use for a given word into “buckets” to reduce lexical entropy –Analyze features typical for each “bucket”

20 Corpus Pattern Analysis (CPA) Corpus Pattern Analysis (CPA) is a corpus analytic and automated induction technique that: 1.Identifies the typical syntagmatic patterns for each word and determines discriminant context features. 2.Catalogs semantic types of arguments that are relevant for distinguishing between different senses. 3.Creates an inventory of syntactic and lexical realizations for relevant semantic types.

21 CPA (II) Word senses are linked to syntagmatic patterns. Selection contexts of a word are the typical syntagmatic patterns of its use. Selection contexts can be indexed on clauses and phrases, as well as single words. Selection contexts are captured in CPA patterns. Current work focuses on CPA patterns for verbs.

22 Research Areas Impacted by CPA Selectional preference acquisition –Resnik (1996), Briscoe & Carroll (1997), Abney & Light (1999), Korhonen (2002) Word sense disambiguation –SENSEVAL efforts, Stevenson & Wilks (2001), Aguirre et al. (2002) Ontology construction –EuroWordNet, SIMPLE

23 CPA Components Lexical discovery –Manual discovery of selection context patterns for specific verbs through corpus analysis Automatic recognition of pattern use –Sorting unseen instances of verb use according to nearest match to identified patterns –Similar to conventional WSD Automatic pattern acquisition –Acquisition of patterns for unanalyzed cases Discriminant feature selection Predicate-based argument clustering

24 CPA Pattern Elements Syntactic Parsing –Phrase-level parsing (clause roles) Shallow Semantic Typing –Generic semantic features –Brandeis Shallow Ontology Minor Category Parsing –Adverbial Phrases, Locatives, Purpose Clauses, Rationale Clauses, Temporal Adjuncts, etc. Subphrasal Syntactic Cue Recognition –Genitives, partitives, bare plural/determiner distinction, infinitivals, negatives, past participles, etc.

26 Corpus-Driven Type System Shallow Typing –applying a shallow-type ontology to a parsed corpus Type Promotion –promoting to type position lexical units breaking a particular statistical threshold Lexical Sets –predicate-based groupings of similarly typed lexical elements from corpus E.g. absorb : heat, light, energy, power, shock, wave, sound, impact, movement –populated through type-filtered cluster analysis, in each argument position

27 Fine-tuning the Features Extending classification of minor categories e.g. adverbials of manner/effect –Peter treated Mary rudely. –Peter treated Mary effectively. Semantic features defining lexical sets e.g. Energy (argtype for absorb) –heat, light, energy, power, shock, wave, sound, impact, movement

28 Implementation Details CPA patterns for an initial sampling of verbs is derived manually A corpus is parsed (British National Corpus). A shallow type system is applied to the parsed corpus (Brandeis Shallow Ontology). A training sample is created. Machine learning techniques are applied to disambiguate the unseen instances using pattern features.

29 Brandeis Shallow Ontology BSO Noun Coverage –3400 type nodes total –20,000 noun entries –10,000 nominal collocation entries 65 Shallow Types –‘Abstract’, ‘Asset’, ‘Animate’, ‘Artifact’, ‘Document’, ‘Human Group’, ‘Information’, ‘Institution’, ‘Location’, ‘Person’, ‘PhysObj’, ‘Process’, ‘Substance’, ‘Surface’, ‘Time Period’, etc. Subset of 24 shallow types was used in the experiments.

30 RASP Statistical Parsing System (Briscoe & Carroll, 2002) input tokenized, POS-tagged, lemmatized generates forest of full parse trees for each sentence set of grammatical relations associated with each parse analysis –named relation, head, dependent subjects: ncsubj, clausal (csubj, xsubj) objects: dobj, iobj, clausal complement modifiers: adverbs, modifiers of event nominals pick the top-ranked tree for the sentences where full parse was a success

31 Selected context features from RASP/BSO implementation obj_institution: object belongs to the BSO type ‘Institution’ subj_human_group: subject belongs to the BSO type ‘HumanGroup’ mod_adv_ly : target verb has an adverbial modifier, with a -ly adverb clausal_like: target verb has a clausal argument introduced by ‘like’ iobj_with: target verb has an indirect object introduced by ‘with’ obj_PRP: direct object is a personal pronoun stem_VVG: the target verb stem is an -ing form

32 Disambiguation accuracy for sample predicates verbpatternstraining setdecision treekNN edit210087%86% treat420045%52% submit410059%64%

33 Experimental Results CPA appears to be as accurate or better than other techniques for WSD. Different types of ambiguities are resolved with different degree of effectiveness. It will be tested on the latest SENSEVAL data.

34 Goals of CPA To create an inventory of semantically motivated syntagmatic patterns, so as to reduce the ‘lexical entropy’ of each word. To develop procedures for populating lexical sets by computational cluster analysis of text corpora. To collect evidence for the principles that govern the exploitations of norms.

35 Lexical Discovery (creating patterns) Create a sample concordance for each word –300-500 examples –from a ‘balanced’ corpus (i.e. general language) [We use the British National Corpus, 100M words, and the Associated Press Newswire for 1991-3, 150M words] Identify statistically significant collocates Classify every line in the sample, on the basis of its context. Take further samples if necessary to establish that a particular phraseology is conventional Check results against corpus-based dictionaries. Use introspection to interpret data, but not to create data.

36 Every line in the sample must be classified The classes are: Norms (normal uses in normal contexts) Exploitations (e.g. coercions and ad-hoc metaphors) Alternations –e.g. [[Doctor]] treat [[Patient]] <> [[Medicine]] treat [[Ailment]] Names (Sea Biscuit: name of a horse, not a cracker) Mentions (to mention a word or phrase is not to use it) Errors Unassignables

37 Lexical sets are contrastive sets Different lexical sets generate different meanings. The lexical sets associated with each sense of each verb are different. –It remains to be discovered whether they are ‘transferable’. In principle, lexical sets are open-ended. In practice, a lexical set may have only 1 or 2 members, e.g. take a {look | glance}. No certainties in word meaning; only probabilities. … but probabilities can be measured.

38 A Simple CPA Entry toast, verb 1.[[Person]] toast [[Food = bread, nuts, cheese]] Implicature: cook or brown [[Food]] by exposure to radiant heat. 2.[[Person 1]] toast {[[Person 2]] | success | memory} Implicature: honour [[Person 2]] by raising a glass of wine, then drinking some.

39 A more complicated verb: ‘take’ 61 phrasal verb patterns, e.g. [[Person]] take [[Garment]] off [[Plane]] take off [[Human Group]] take [[Business]] over 105 light verb uses (with specific objects), e.g. [[Event]] take place [[Person]] take {photograph | photo | snaps | picture} [[Person]] take {the plunge} 18 ‘heavy verb’ uses, e.g. [[Person]] take [[PhysObj]] [Adv[Direction]] 13 adverbial patterns, e.g. [[Person]] take [[TopType]] seriously [[Human Group]] take [[Child]] {into care} TOTAL: 197, and growing (but slowly)

40 Noun norms Norms for nouns are different in kind from norms for verbs. Adjectives and prepositions are more like verbs than nouns. A different analytical apparatus is required for nouns. Prototype statements for each true noun can be derived from a corpus.

41 What are the components of a normal context? – (2) Nouns The apparatus for CPA (corpus pattern analysis) of nouns: Collocations.

42 Arranging collocates: storm (1) WHAT DO STORMS DO? Storms blow. Storms rage. Storms lash coastlines. Storms batter ships and places. Storms hit ships and places. Storms ravage coastlines and other places.

43 Arranging collocates: storm (2) BEGINNING OF A STORM: Before it begins, a storm is brewing, gathering, or impending. There is often a calm or a lull before a storm. Storms last for a certain period of time. Storms break. END OF A STORM: Storms abate. Storms subside. Storms pass.

44 Arranging collocates: storm (3) WHAT HAPPENS TO PEOPLE IN A STORM? People can weather, survive, or ride (out) a storm. Ships and people may get caught in a storm.

45 Arranging collocates: storm (4) WHAT KINDS OF STORMS ARE THERE? There are thunder storms, electrical storms, rain storms, hail storms, snow storms, winter storms, dust storms, sand storms, tropical storms… Storms are violent, severe, raging, howling, terrible, disastrous, fearful, ferocious…

46 Arranging collocates: storm (5) TYPICAL QUALITIES OF STORMS: Storms, especially snow storms, may be heavy. An unexpected storm is a freak storm. The centre of a storm is called the eye of the storm. A major storm is remembered as the great storm (of [[Year]]). ____ STORMS ARE ALSO ASSOCIATED WITH rain, wind, hurricanes, gales, and floods.

47 Why norms are important These statements about abate and storm represent typical usage as well as typical meaning. They are empirically well founded (corpus- derived). This is where syntax meets semantics.

48 How is CPA different from FrameNet? CPA: investigates syntagmatic criteria for distinguishing different meanings of polysemous words, in a “semantically shallow” way. FrameNet: expresses the deep semantics of situations (frames); proceeds frame by frame, not word by word; analyses situations in terms of frame elements; studies meaning differences and similarities between different words in a frame; does not explicitly study meaning differences of polysemous words; does not analyze corpus data systematically, but goes fishing in corpora for examples in support of hypotheses; has problems grouping words into frames, and misses some; has no established inventory of frames; has no criteria for completeness of a lexical entry.

49 Challenges Extending the empirical discovery of lexical sets and other pattern features Learning to recognize all the features required by CPA patterns

50 Conclusions Creation of a selection context dictionary Development of a corpus-driven type system Identification of meaning by a richer set of criteria Basis for investigating the mechanisms of coercion and exploitation

51 Thank You

Corpus Linguistics meets Lexical Semantic Theory James Pustejovsky Brandeis University University of Pavia December 15, 2004.

Similar presentations

Presentation on theme: "Corpus Linguistics meets Lexical Semantic Theory James Pustejovsky Brandeis University University of Pavia December 15, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Corpus Linguistics meets Lexical Semantic Theory James Pustejovsky Brandeis University University of Pavia December 15, 2004.

Similar presentations

Presentation on theme: "Corpus Linguistics meets Lexical Semantic Theory James Pustejovsky Brandeis University University of Pavia December 15, 2004."— Presentation transcript:

Similar presentations

About project

Feedback