Presentation on theme: "Corpus Linguistics Richard Xiao"— Presentation transcript:
1 Corpus Linguistics Richard Xiao email@example.com Corpus analysis (1)Corpus LinguisticsRichard Xiao
2 Outline of the session Lecture Lab Concordance Patterning Semantic prosodyWordlistCluster (lexical bundle, MWU, n-gram)LabWST Concord and WordlistAntConcOnline concordancers
3 Who reads a corpus?A corpus is usually too large for anyone to read, e.g. the BNC is very, very large…It took 4 years to buildIt contains over 100 million (100,106,008) words of modern EnglishIt comprises 4,124 textsThere are six and a quarter million sentence units in the whole corpusEach word is automatically assigned a part of speech code - there are 65 parts of speech identifiedIt occupies 1.5 gigabytes of disk space - the equivalent of more than 1,000 high capacity floppy disksThe whole corpus printed in small type on thin paper would take up 10 metres of shelf spaceReading the whole corpus aloud at a rate of 150 words a minute, eight hours a day, 365 days a year, would take nearly 4 yearsA computer can scan in a few seconds more text than you can read in your whole life…
4 ConcordanceA comprehensive index of the words used in a text or a corpusA set of concordance linesThe most common concordance format is the KWIC concordance - Key Word in ContextIn a KWIC concordance of your search word, i.e. the node word, is in a central position with all lines vertically aligned around the nodeCan be sorted to reveal patterns of usage
5 ConcordancerA concordancer is the software that displays concordances (Unicode compliant)Concord WordSmith Tools (GBP50)MonoConc (USD85)AntConc (free)Xaira (free)Multilingual Corpus Tool (MLCT) - free
10 Online concordancers English (free) http://corpus.byu.edu/bnc/ (COCA)Chinese (free)Sketch Engine: Corpus query system of multilingual data, incorporating word sketches, grammatical relations, and a distributional thesaurus (30 days free trial)
12 Collocation is syntagmatic Langue (Language system)paradigmaticfamous boots. On the stroke of full time theStoke the lead on the stroke of half-time with a goalSmith sin-binned on the stroke of half-time, added aclinched their win on the stroke of lunch after resumingchase by declaring on the stroke of lunch. <p> With a leadexpectant crowd, on the stroke of midday. The birdhour began not upon the stroke of midnight but upon theof midnight but upon the stroke of noon. There was,booked in advance. On the stroke of seven, a gong summonsPromptly on the stroke of six 'clock, the chooksfrom Edinburgh on the stroke of the Millennium.Parole (Utterance) syntagmatic
13 Example of pattern meaning “on the stroke of X”X = a temporal point“It is/was adj. that…” (construction grammar?)certain, likely, possible, probable, etc.apparent, clear, evident, obvious, plain, etc.fantastic, marvellous, appropriate, logical, encouraging, exciting, reassuring, etc.appalling, unjust, annoying, etc.critical, important, necessary, vital, etc.amazing, funny, interesting, intriguing, etc.Possibility, necessity; Evidentiality; Evaluation
14 Pattern meaningA large number of different adjectives occur in the pattern between is/was and thatProbability“It was important to establish this because it was possible that strontium and calcium in fossils might have reacted chemically with the rock in which the fossils were buried.” (New Scientist)Evaluation - used to evaluate propositions (statements) rather than things or people“But a lot of health authorities say they will not allow these drugs on NHS prescription as they cannot afford them at around £90 a month. It is scandalous that the rich can buy the drugs privately, but tough luck if you are poor.” (The Sun)
15 Meaning arising from collocation “There are always semantic relations between node and collocates, and among the collocates themselves.” (Stubbs 2002: 225)Collocational meaning arising from the semantic relations between node and collocates: semantic prosody (also called “discourse prosody”)Collocational meaning arising from the semantic relations among collocates of a node: semantic preference
16 What is semantic prosody? “consistent aura of meaning with which a form is imbued by its collocates” (Louw 1993: 157)“a form of meaning which is established through the proximity of a consistent series of collocates.” (Louw 2000: 57)“the spreading of connotational colouring beyond single word boundaries” (Partington 1998: 68)“When the usage of a word gives an impression of an attitudinal or pragmatic meaning, this is called a semantic prosody” (Sinclair 1999)This kind of meaning is “prosody” in the sense that it stretches over more than one unit (word)
17 Semantic prosodyThe primary function of SP is to express speaker/writer attitude or evaluation (Louw 2000: 58)Attitudinal, affective, evaluative and pragmatic meaningTypically negative, with relatively few of them bearing an affectively positive meaningUnsurprising: contented human beings utter much less than discontented onesIt is unrequited love, not requited love, that forms most of the subject matter for the greatest love poetry in English!
18 Semantic prosodySET IN: occurs primarily with subjects which refer to unpleasant states of affairs…before bad weather sets in……the fact that misery can set in……desperation can set in……stagnation seemed to have set in……before rigor mortis sets in…BREAK OUT: it is bad things that break out…violence broke out……riots broke out……war broke out……real disagreements have broken out……a storm of protest broke out…
19 Semantic prosody Collocates of CAUSE Collocates of consequences damage, problems, pain, disease, distress, trouble, concern, degradation, harm, pollution, suffering, anxiety, death, fear, stress, symptomsThese examples of ‘bad company’ collocate with cause so frequently that the central and typical use of cause shows a negative affective meaning (近墨者黑？)Collocates of consequencesIn the sense of resultserious, disastrous, adverse, dire, damaging, negative, unintended, unfortunate, tragic, fatal, severeIn the sense of importanceimportant, significant, far-reaching, profound
21 Semantic prosodyThe negative (or less frequently positive) prosody that belongs to an lexical item is the result of the interplay between the item and its typical collocatesThe item does not appear to have an affective meaning until it appears in the context of its typical collocatesIf a word has typical collocates with an affective meaning, it may take on that affective meaning even when it is used with other atypical collocatesThe consequence of a word frequently keeping ‘bad company’ is that the use of the word alone may become enough to indicate something unfavourable (cf. Partington 1998: 67)
22 Semantic prosody Is semantic prosody a type of connotative meaning? “Semantic prosodies are not merely connotational” as the force behind semantic prosodies is “more strongly collocational than the schematic aspects of connotation.” (Louw 2000: 49-50)In my view, connotation can be collocational or non-collocational; semantic prosody can only be collocational
23 Semantic prosodySemantic prosody is strongly collocational in that it operates beyond the meanings of individual wordsBoth personal and price are quite neutral, but when they co-occur, a negative prosody may result: personal price most frequently refers to something undesirableIn the BoE with over 550 million words of written and spoken texts, 20 instances of “personal price” are all evaluatively negative
24 “Personal price” typically negative and high something undesirable Barclays’ slogan to promote their personal financial services in 2003“The personal loan with the personal price”
25 Semantic preference‘a lexical set of frequently occurring collocates [sharing] some semantic feature’ (Stubbs 2002: 449)large typically collocates with items from the same semantic set indicating ‘quantities and sizes’number(s), scale, part, quantities, amount(s)‘absence/change of state’ is a common feature of the collocates of maximizers such as utterly, totally, completely and entirely
26 Semantic preferenceSemantic preference and semantic prosody are two distinct yet interdependent collocational meaningsSemantic prosody is a further level of abstraction of the relationship between lexical units (Sinclair 1996, 1998; Stubbs 2001)Collocation (the relationship between a node and individual words)Colligation (the relationship between a node and grammatical categories, e.g. “very” tends to collocate with adjectives and adverbs)Semantic preference (semantic sets/fields of collocates)Semantic prosody (affective meanings of a given node with its typical collocates)
27 Semantic preferenceSemantic preference and semantic prosody have different operating scopes (Partington 2004:151)Semantic preference can be viewed as a feature of the collocates while semantic prosody is a feature of the node wordThe two also interact (Partington 2004: 151)Semantic prosody ‘dictates the general environment which constrains the preferential choices of the node item’Semantic preference ‘contributes powerfully’ to building semantic prosodyEnd of concordance versus patterning, collocation and colloational meaning
28 Wordlist A list of words in a corpus and their frequency Can become very meaningful when compared with other lists: “keyword analysis”“A type is not a token.”Token: an occurrence of any given word form (6 tokens)Type: a (unique) word form (5 types - “a” is repeated)Type-token ratio (TTR): the number of types divided by the number of tokens multiplies 100lexical density: a low TTR indicates a text is not very lexically richuseful when comparing samples of roughly equal lengthStandardized type-token ratio (STTR)It is difficult to compare the TTR of a smaller corpus against a larger oneAs a corpus gets bigger, the number of new word types being counted declinesIn order to remedy the issue of comparing TTRs of corpora of different sizes, WordSmith can calculate TTR based on every 1,000 words (the default setting can be adjusted) and produce an average TTR
31 PracticeMake a wordlist of the following text using wordlist function in WST or AntConcThe Stephen text (local copy available)A book written by the hippie guru Stephen GaskellBrowse through the frequency list. Can you see any pattern in the list?
32 Cluster Also called lexical bundle, n-gram, multi-word unit (MWU) Groups of N words which appear in sequence in the textPresented using frequency listsGood way to identify recurrent/specific expressions for a corpusToolsWordSmithConcordWordlist (Index)AntConcN-gram
34 Clusters in WordSmith The Stephen text Clusters with WST Concord The search termClusters with WST Wordlist (Index)The whole corpusQuestionsWhat are the most frequent 3-word clusters with “know” in the Stephen text? What are the most frequent 3-word clusters in the whole text? Are they all “expected” phrases?
Your consent to our cookies if you continue to use this website.