Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Corpus Pattern Analysis (CPA) Patrick Hanks Research Institute of Information and Language Processing, University of Wolverhampton ***

Similar presentations


Presentation on theme: "1 Corpus Pattern Analysis (CPA) Patrick Hanks Research Institute of Information and Language Processing, University of Wolverhampton ***"— Presentation transcript:

1 1 Corpus Pattern Analysis (CPA) Patrick Hanks Research Institute of Information and Language Processing, University of Wolverhampton *** patrick.w.hanks@gmail.com

2 Patterns in Corpora When you first open a concordance, very often some patterns of use leap out at you. –Collocations make patterns: one word goes with another –Each pattern is associated with a meaning –To see how words make meanings, we need to analyse collocations The more you look, the more patterns you see. BUT When you try to formalize the patterns, you start to see more and more exceptions. The boundaries are fuzzy and there are many outlying cases. 2

3 Analysis of Meaning in Language Analysis based on predicate logic is doomed to failure: –Words are NOT building blocks in a ‘Lego set’ –A word does NOT denote ‘all and only’ members of a set –Word meaning is NOT determined by necessary and sufficient conditions for set membership Instead, a prototype-based approach to the lexicon is necessary: –mapping prototypical interpretations onto prototypical phraseology –classifying unusual uses (unusual syntax, unusual collocations) for what they are: exploitations of normal patterns of word use. 3

4 The linguistic ‘double-helix’ hypothesis A language is a system of rule-governed behaviour. BUT: Not one, but TWO (interlinked) sets of rules: 1.Rules governing the normal uses of words to make meanings 2.Rules governing the exploitation of norms 4

5 Exploitations People exploit the rules of normal usage for various purposes: For economy and speed: –Conversation is quick –Listeners (and readers) get bored easily –Words that are ‘obvious’ are often omitted So ellipsis is also a form of exploitation To say new things (reporting discoveries) To say old things in new ways For rhetoric, humour, poetry, politics … –To grab the listeners’ (or readers’) attention 5

6 Lexicon and prototypes Each word in a language (more precisely: each content word) is typically used in one or more patterns of usage (valency + collocations) –Function words and inflections are the ‘glue’ that holds the content words together. Each pattern is associated with a meaning: –a meaning is a set of prototypical beliefs –In CPA, meanings are expressed as ‘anchored implicatures’. –few patterns are associated with more than one meaning. Corpus data enables us to discover the patterns that are associated with each word. 6

7 What is a pattern? (1) The verb is the pivot of the clause. –A verb pattern is a statement of the clause structure (valency) associated with a meaning of a verb –Clause structure: SPOCA –Subject, Predicator, Object, Complement (co-referential with S or O), and/or Adverbial [a.k.a. Adjunct, a.k.a. Prepositional Object] –together with the typical (prototypical, stereotypical) semantic values of each argument. Different semantic values of arguments (subject, object, prepositional object) activate different meanings of the verb. To get the meaning of a clause, it is necessary to correlate the arguments, then map them onto patterns. 7

8 What is a pattern? (2) Some patterns for the verb fire: –[[Human]] fire [[Firearm]] –[[Human]] fire [[Projectile]] –[[Firearm]] fire [[Projectile]] –[[Human 1]] fire [[Human 2]] –[[Anything]] fire [[Human]] {with {enthusiasm}} –[[Human]] fire [NO OBJ] Etc. (PDEV has 14 patterns for the verb fire) 8

9 Semantic Types and the CPA Shallow Ontology Items in double square brackets are semantic types. Semantic types are arranged hierarchically in a shallow ontology. Each type in the ontology is populated with a set of lexical items on the basis of what’s found in the corpus under each relevant pattern. The ontology is corpus-driven, not speculative. –(This is work in progress in the PDEV project) 9

10 Shimmering lexical sets Lexical sets are not stable – not "all and only". Example: –[[Human]] attend [[Event]] –[[Event]] = meeting, wedding, funeral, etc. –But not thunderstorm, suicide. –ALSO, people attend a school, a clinic, etc. School and clinic are [[Location]]s not [[Event]]s, but: You attend a school or a clinic because of the [[Event]]s that take place there. 10

11 Meanings and boundaries Boundaries of linguistic and lexical categories are fuzzy. –There are many borderline cases. Instead of fussing about boundaries, we need to focus instead on identifying prototypes. Then we can decide what goes with what –Many decision will be obvious. –Some decisions – especially about boundary cases – will be arbitrary. 11

12 The Idiom Principle (Sinclair) According to John Sinclair, in word use there is tension between the "terminological tendency” and the "phraseological tendency”: –The terminological tendency: the tendency for words to have meaning in isolation –The phraseological tendency: the tendency for the meaning of a word to be activated by the context in which it is used. 12

13 Verbs vs. nouns “ Many, if not most, meanings depend on the presence of more than one word for their realization.” – John Sinclair Semi-prefabricated chunks (Alison Wray: formulaic language) –The meaning of a verb is largely determined by the semantic values of its arguments. –Predicative adjectives (glad, afraid) and event nouns (distribution, blow) operate like verbs –The meanings of noun-y nouns and attributive adjectives are determined very differently. A plug is not a socket. 13

14 14 A crucial difference Scientific concepts and stipulative terminology: –Neat, tidy, orderly, lifeless. –If word meanings were governed by necessary conditions, you couldn’t use existing words to say new things. Word meanings: –Messy, chaotic, dynamic. –It’s the ‘looseness of fit’ that enables us to use existing words to say new things.

15 15 What are the components of a normal context? – Verbs Apparatus for corpus pattern analysis of verbs: Valencies (NOT “NP VP” BUT “SPOCA”). Semantic types for the lexical sets in each valency slot: [[Event]], [[Phys Obj]], [[Human]], [[Location]], etc. –Lexical sets are populated by nouns – through cluster analysis of large corpus samples. Subvalency items (quantifiers, determiners, etc.) may be part of the pattern – determining the meaning of the clause: –‘Something took place’ [= an event] vs. –‘Something took its place’ [= a physical or abstract object]

16 SPOCA For CPA of verbs and predicative adjectives, we need a grammar of clause roles (also known as “lexical functions”). This is SPOCA: Subject (noun): 1 Predicator (verb): the pivot of the clause. Object (noun): 0, 1, or [with verbs of giving] 2 Complement: noun or adj. [co-ref. with Subj. or Obj.] –EG She is happy; she is president; they elected her president. Adverbial [also known as Adjunct]: 0, 1, or many –Some Adverbials are meaning-determining [EG They treated her badly / with respect] –Others are optional extras [EG They treated her in hospital / with penicillin] 16

17 Do words have meaning? What’s the meaning of blow? What’s the meaning of file? What’s the meaning of abate? What’s the meaning of treat? 17

18 18 Implicatures: taking prototypes seriously When a pilot files a flight plan, he or she informs [they inform?] ground control of the intended route and obtain[s] permission to begin flying. …If someone files a lawsuit, they activate a procedure asking a court for justice to make a decision about some action. When a group of people file into a room or other place, they walk in one behind the other. (PDEV identifies 14 prototypical patterns for file, verb, but the distinctions are arbitrary. It would be equally plausible to argue in favour of twice as many patterns for file.)

19 19 Implicatures vary according to context Peter treated Mary. [He’s a doctor (or a generous chap)] Peter treated Mary with antibiotics. [Definitely a doctor] Peter treated Mary badly. [May or may not be a doctor] Peter treated Mary with respect. [Probably not a doctor] Peter treated Mary to a fancy dinner. [Generous chap] Peter treated Mary to his views on Jeremy Corbyn. [Ironic implication of generosity] Peter treated the woodwork with creosote.[None of the above]

20 Cognitive salience and social salience What is the primary implicature of Peter treated Mary? Cognitively salient interpretation: he bought her lunch. Socially salient interpretation: he was a health professional, attending to her injuries or illness. 20

21 21 Sample from a concordance (unsorted) incessant noise and bustle had abated. It seemed everyone was up after dawn the storm suddenly abated. Ruth was there waiting when Thankfully, the storm had abated, at least for the moment, and storm outside was beginning to abate, but the sky was still ominous Fortunately, much of the fuss has abated, but not before hundreds of, after the shock had begun to abate, the vision of Benedict's been arrested and street violence abated, the ruling party stopped he declared the recession to be abating, only hours before the ‘ soft landing’ in which inflation abates but growth continues moderate the threshold. The fearful noise abated in its intensity, trailed ability. However, when the threat abated in 1989 with a ceasefire in bag to the ocean. The storm was abating rapidly, the evening sky ferocity of sectarian politics abated somewhat between 1931 and storm. By dawn the weather had abated though the sea was still angry the dispute showed no sign of abating yesterday. Crews in

22 22 Sorted (1): [[Event = Storm]] abate [ NO OBJ ] DOMAIN: Weather dry kit and go again.The storm abates a bit, and there is no problem in ling.Thankfully, the storm had abated, at least for the moment, and the sting his time until the storm abated but also endangering his life, Ge storm outside was beginning to abate, but the sky was still ominously o bag to the ocean.The storm was abating rapidly, the evening sky clearin after dawn the storm suddenly abated.Ruth was there waiting when the h t he wait until the rain storm abated.She had her way and Corbett went storm.By dawn the weather had abated though the sea was still angry, i lcolm White, and the gales had abated: Yachting World had performed the he rain, which gave no sign of abating, knowing her options were limite n became a downpour that never abated all day.My only protection was ned away, the roar of the wind abating as he drew the hatch closed behi

23 23 Sorted (2): [[Event = Problem]] abate [ NO OBJ ] Domain: Social Interaction ‘soft landing’ in which inflation abates but growth continues modera Fortunately, much of the fuss has abated, but not before hundreds of the threshold. The fearful noise abated in its intensity, trailed incessant noise and bustle had abated. It seemed everyone was up ability. However, when the threat abated in 1989 with a ceasefire in the Intifada shows little sign of abating. It is a cliche to say that h he declared the recession to be abating, only hours before the pub he ferocity of sectarian politics abated somewhat between 1931 and 1 been arrested and street violence abated, the ruling party stopped b the dispute showed no sign of abating yesterday. Crews in

24 24 Sorted (3): [[Emotion = Negative]] abate [ NO OBJ ] DOMAIN: Human Emotion ript on the table and his anxiety abated a little.This talented, if that her initial awkwardness had abated # for she had never seen a es if some inner pressure doesn't abate.He wanted to play at the fun Baker in the foyer and my anxiety abated.He seemed disappointed and hained at the time.When the agony abated he was prepared to laugh wi self; the pain gradually began to abate spontaneously, a great relie ght, after the shock had begun to abate, the vision of Benedict's sn y calm, control it!) The fear was abating, the trembling beginning t his dark eyes. That fear did not abate when, briefly, he halted. For AN EXPLOITATION OF THIS NORM: isapproval, his kindlier feelings abated, to be replaced by a resurg (“kindlier feelings” are normally positive, not negative.)

25 25 A domain-specific norm: [[Person | Action]] abate [[Nuisance]] DOMAIN: Law, REGISTER: Jargon o undertake further measures to abate the odour, and in Attorney Ge us methods were contemplated to abate the odour from a maggot farm s specified are insufficient to abate the odour then in any further as the inspector is striving to abate the odour, no action will be t practicable means be taken to abate any existing odour nuisance, ll equipment to prevent, and or abate odour pollution would probabl rmation alleging the failure to abate a statutory nuisance without t I would urge you at least to abate the nuisance of bugles forthw way that the nuisance could be abated, but the decision is the dec otherwise the nuisance is to be abated.They have full jurisdiction ion, or the local authority may abate the nuisance and do whatever

26 26 Part of the lexical set [[Event = Problem]] as subject of ‘abate’ From BNC: {fuss, problem, tensions, fighting, price war, hysterical media clap-trap, disruption, slump, inflation, recession, the Mozart frenzy, working-class militancy, hostility, intimidation, ferocity of sectarian politics, diplomatic isolation, dispute, …} From AP: {threat, crisis, fighting, hijackings, protests, tensions, anti- Japan fervor, violence, bloodshed, problem, crime, guerrilla attacks, turmoil, shelling, shooting, artillery duels, fire-code violations, unrest, inflationary pressures, layoffs, bloodletting, revolution, murder of foreigners, public furor, eruptions, bad publicity, outbreak, jeering, criticism, infighting, risk, crisis, …} (All these are kinds of problem.)

27 27 The CPA method Create a sample concordance for each word –250-500 examples –from a ‘balanced’ corpus (i.e. general language) [We use the British National Corpus, 100 million words] –Classify every line in the sample, on the basis of its context. Take further samples if necessary to establish that a particular phraseology is conventional Check results against corpus-based dictionaries. Use introspection to interpret data, but not to create data.

28 28 In CPA, classification of every line in the sample must be attempted The classes are: Norms (normal uses in normal contexts) Exploitations (e.g. ad-hoc metaphors) Alternations –e.g. [[Doctor]] treat [[Patient]] <> [[Medicine]] treat [[Patient]] Not classified: –Names (Midnight Storm: name of a horse, not a storm) –Mentions (to mention a word or phrase is not to use it) –Errors –Unassignables

29 Corpus analysis of ‘shower’, verb Go to corpus and select ‘shower’ v. Does Sketch Engine help? ___ Look at PDEV, ‘shower’ v. Compare the entries in existing dictionaries: OED, (N)ODE, COED, OALDCE. Are they all mutually compatible? –Do they have to be? 29

30 The Pattern Dictionary of English Verbs http://www.pdev.org.uk/ –freely available – no login, no subscription. –There are approximately 5600 verbs (“base verbs”) in normal use in English. –Phrasal verbs and idioms are analysed simply as patterns of the base verb. –At the time of writing we have completed pattern analysis of 1200 English verbs. 30


Download ppt "1 Corpus Pattern Analysis (CPA) Patrick Hanks Research Institute of Information and Language Processing, University of Wolverhampton ***"

Similar presentations


Ads by Google