Presentation is loading. Please wait.

Presentation is loading. Please wait.

Concordances, collocations and connotation Barnbrook G (1996) Language and Computers. Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings.

Similar presentations


Presentation on theme: "Concordances, collocations and connotation Barnbrook G (1996) Language and Computers. Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings."— Presentation transcript:

1 Concordances, collocations and connotation Barnbrook G (1996) Language and Computers. Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings. Amsterdam: John Benjamins. Chapters 1,2,4

2 2 Lexical information in corpora Start looking at the kind of information (about individual words) that can be got from corpora –Simple frequency information –Distribution information –Collocation (co-occurrence information) –Connotation (semantic prosody) Introduce basic ideas Future topics –Statistics –Case studies

3 3 Frequency information Most banal information: counting how many times a word (“type”) appears in a text Most frequent words will be function words, so often f counts exclude words listed in a “stop list” Should you count words or lemmas? Should you distinguish alternate meanings of ambiguous word forms (if you can)?

4 4 Frequency information Frequency information on its own is not particularly interesting Quite useful to compare f of related words –eg alternative readings of a given word form (already seen in probability calculations in tagging) –or comparing near synonyms, especially if we can take context into account (see later) f of a given word in a given context can be indicative, eg pronouns more frequent as subject or 1 st word of sentence

5 5 Types and tokens Remember distinction between “tokens” (words) and “types” (different words) Type count gives a measure of how many DIFFERENT words are used Type-token ratio gives a measure of “vocabulary richness” –If vocabulary is very varied, TTR will be higher TTR is very sensitive to overall text length, so it is not meaningful to compare TTRs for texts of different lengths Standardized TTR is the average of the TTR for each sequence of n words (typical default n=1000) in a text or corpus

6 6 Vocabulary growth curve Plotting types against tokens for a given text shows us how the TTR grows as the text gets longer Typically, the curve starts steeply and then flattens, sooner or later reflecting homogeneity (or otherwise) of the text VGC for Macbeth in Basic English source: http://web.missouri.edu/~youmansc/vmp/help/Youmans-TypeToken.pdf

7 7 Vocabulary growth curve Comparative VGC for four texts Simple measure used in some literary studies (a) Longfellow (b) Hemingway (c) Basic English (Macbeth) (d) Bible (Genesis 2) (a) (b) (c) (d)

8 8 Vocabulary in context “Concordance”, also known as KWIC list (key word in context) Allows us to see the (immediate) environment in which a word appears Listings can be customised to show what you want more clearly, eg – sorted according to next or previous word –showing more or less context

9 9 source: A Partington Patterns and Meanings. Amsterdam (1998): John Benjamins

10 10 CIWK search inverted KWIC specify the context and look to see what words occur in it

11 11 Collocation Term coined by J R Firth (1957) to characterise (part of) his theory of meaning “You shall judge a word by the company it keeps” “The occurrence of two or more words within a short space of each other in a text” (Sinclair 1991) “The relationship a lexical item has with items tha appear with greater than random probability in its (textual) context” (Hoey 1991; emphasis added)

12 12 Collocation, text type and style Distinguish between general and more usual collocations vs technical and more personal ones eg in a general corpus time collocates with save, spend, waste, fritter away, … but in a corpus of sports reports time collocates with half, full, extra, injury, first, second, third, …

13 13 Collocation and idiom Listing collocations will often reveal idioms and cliches Important to think of collocation as extending beyond neighbouring words (which can be captured by simple concordances)

14 14 Collecting collocations If we are to look beyond neighbouring words, what constraints might we impose? Collocation means co-occurrence within some defined context –possibly a “window” of n words to left and/or right –if corpus is tagged/parsed, we can look at collocations within structures –or we can define the window in terms of constituents rather than words

15 15 Measuring significance The significance of any co-occurrences nees to be established –Raw co-occurrence frequency counts mean nothing –Need to be compared to something else Need to compare a given co-occurrence with random chance, or with some other co-occurrence More detail next time

16 16 Collocation and synonymy Collocation is good evidence in discussing (near) synonymy Lots of studies take near synonyms and look to see if the nature of their relationship can be characterised by their distribution In other words: what words does each of the synonym set collocate with? Especially useful for language learners

17 17 Example of sheer and synonyms (from Partington book) three senses (LDOCE) –pure, ‘nothing but’, eg sheer luck –steep, sheer drop –thin, sheer stockings (Cobuild) use sheer to emphasize completeness of state 92 occurrences of sheer (in meaning 1) in his corpus

18 18 collocations of sheer expression of magnitude of weight or volume to right (20%) –volume, weight, numbers, mass, scale, quantity, size –almost always with article the expression of force, strength or energy (22%) –energy. exertion, force, muscle, strength, power, pressure, fury, pace, intensity –usually with the, or a preposition but no article expression of persistence (14%) –pesistence, irreversibility, obstinacy, indomitability, insistence, reliability, integrity, hard work –left context: through, because of, out of, expressing causation, but not the

19 19 collocations of sheer nouns expressing strong emotion (11%) –fun, joy, panic, inspiration, enjoyment, terror nouns expressing extreme personal qualities (11%) –beauty, glamour, brutality, thuggery, madness, folly nouns expressing extreme ability or lack of same (8%) –expertise, competence, virtuosity, gamesmanship

20 20 Synonyms of sheer - pure LDOCE definitions, 5 meanings of which two overlap: –not mixed with anything –complete, thorough Corpus has 135 examples Larger variety of syntactic environments (sheer was always modifying a noun) including predicative, which sheer does not occur in –*? The drop was sheer –* His fury was sheer

21 21 Synonyms of sheer - pure Religious-moral context; sense of unmixed –doctrine, faith, goodness; chemicals, gold But, many examples where it has an emphasizing function, like sheer –accident, chance, comedy, guesswork, honesty, idiocy, malice, nostalgia, pleasure, selfishness, talent, theatre, vulnerability, whim, wickedness –often with proper nouns (unlike sheer) No examples of pure collocating with items expressing magnitude, force or persistence Some overlap with sheer –personal qualities, emotion (though generally less extreme ones) Only few examples of pure in prepositional phrase expressing causation; causes can be sheer, but states are pure

22 22 Other synonyms of sheer Partington does similar analysis of complete and absolute Shows that each of the “synoynms” has more typical uses and patterns, though there is some overlap But there is also clear evidence of complementary usage

23 23 Connotation and semantic prosody Collocation can also be used to illustrate connotation –“secondary implications of a word” (Lyons 1977) Three distinct uses of the term –marker of a particular speech variety (eg lovely) –cultural implications (words used to describe women show what society thinks of them) –marker of speakers evaluation (firm ~ stubborn) “Semantic prosody” (Sinclair 1987) –use of a certain word spreads its connotation over the whole utterance

24 24 Some examples object of commit is often something bad (foul, deception, offence) if something is described as rife, it is not good (crime, disease, mistakes), and describing it as rife expresses a negative connotation (speculation is rife) both the above exemplify “unfavourable prosody”, but other prosodies are possible good example claim vs admit responsibility for an atrocity

25 25 More power to your elbow Examples given in last few slides were largely subjective More interesting if we can back up observations with calculations of statistical significance Next time we will look at some simple statistical measures


Download ppt "Concordances, collocations and connotation Barnbrook G (1996) Language and Computers. Edinburgh: EUP. Chapters 3,4,5 Partington A (1998) Patterns and Meanings."

Similar presentations


Ads by Google