2 1. DefinitionsFirth (1951) : Modes of Meaning : « Words shall be known by the company they keep ».Collocation designates both a process or a state (the act of collocation or the state of being collocated) and the result of the process (an arrangement or juxtaposition, especially of linguistic elements, such as words.
3 Crystal (1991) : A Dictionary of Linguistics and Phonetics : « a habitual co-occurrence of individual lexical items »Co-occurrence may be fortuitous, whereas collocation reflects collective usageCollocation is a type of lexical constraint : « the language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices » (Sinclair : 1991)
4 A collocation is an arbitrary and recurrent word combination (Smadja, 1990) A contrastive view of collocation, as expressed by F.J. Hausmann (1990) :« […] l'idiosyncrasie de la collocation ne se révèle définitivement que dans l'optique d'une autre langue qui combine, pour exprimer le même fait, des mots différents »(idiosyncrasy : a structural or behavioral characteristic peculiar to an individual or a group)
5 2. Types of collocationsHalliday (1966: 151, 157) argues that the collocational patterns of lexical items can lead to generalizations at the lexical level.If certain items belong to the same set, then they can be regarded as “a single lexical item”:
6 A strong argument, he argued strongly, the strength of his argument and his argument was strengthened [can all be regarded] as instances of one and the same syntagmatic relation. What is abstracted is an item strong, having the scatter strong, strongly, strength, strengthened, which collocates with argue (argument).
7 Sinclair (1991) proposes two principles: “The grammatical level is represented by the “open-choice principle”, which sees “language text as the result of a very large number of complex choices ... the only restraint [being] grammaticalness”. (cf. Colorless green ideas sleep furiously)The “idiom principle” represents the lexical level and accounts for “the restraints that are not captured by the open-choice model”.
8 Three factors that determine the categorization of a lexical combination the degree of probability that the items will co-occurthe degree of fixity of the combination (i.e. grammatical restrictions)the degree in which the meaning of the combination can be derived from the meaning of its constituent parts
9 the terms idiom and collocation (as well as their shadings) are used by different linguists with different definitions.Most linguists would agree that kick the bucket is an idiom, whereas [make / reach / take] a decision are collocations”.
10 An example : the verb carry in OH [person, animal] porter [bag, shopping, load, news, message][vehicle, pipe, wire, vein] transporter; [wind, tide, current, stream] emporter;comporter [warning, guarantee, review, report]supporter [weight, load, traffic]l'emporter dans [state, region, constituency]; remporter [battle, match] the motion was carried by 20 votes to 13 la motion l'a emporté par 20 votes contre 13Idioms : to be carried away by sth être emballé[!] par qch; to get carried away[!] s'emballer[!], se laisser emporter.
11 Woods, E. & McLeod, N. (1990) Using English Grammar, Prentice Hall. Woods & McLeod suggest the following continuum (from most to least predictable/fixed):Idioms (do not allow for substitution of their elements, nor for grammatical or syntactic alterations)Collocations (roughly predictable word combinations with some restrictions)Colligations (generalisable classes of collocations, for which at least one construct is specified by category rather than as a distinct lexical item)Free combinations (compositional and productive)
12 Oxford Dictionary of Current Idiomatic English, Vol Oxford Dictionary of Current Idiomatic English, Vol. 2 - English Idioms (1983), Oxford University Presspresents a continuum from idiom to non-idiomdistinguishes between pure idioms (totally fixed) and figurative idioms (allowing for some variation) - blow a fuse, as a figurative idiom, can only be used in the active form. - the idiomatic sense of blow one's own [horn / trumpet] is not activated in the absence of own.Collocations (non-idioms) are divided between restricted (or semi-idioms) and open
13 Restricted collocations allow a degree of lexical variation” (one element has a figurative sense not found outside that limited context whereas the other appears in a familiar, literal sense cf. carry a motion)In open collocations elements are freely combinable and are used in a common literal sense
14 Word Combinations (Howarth, 1993 : A PHRASEOLOGICAL APPROACH TO ACADEMIC WRITING) functional expressions(1) More haste less speed. (proverb)(2) Unaccustomed as I am to public speaking .. (speech formula)(3) You name it, we've got it. (slogan)(4) When in Rome. (abbreviated proverb)composite units(5) blow a trumpet (open collocation)(6) blow a fuse (restricted collocation)(7) blow your own trumpet (figurative idiom)(8) blow the gaff (pure idiom) = vendre la mèche
15 Cruse, D. A. (1986) Lexical Semantics, Cambridge University Press. distinguishes between idioms (“lexically complex” units, constituting a “single minimal semantic constituent” kick the bucket) and collocations (“sequences of lexical items which habitually co-occur”, each lexical item being a “semantic constituent”).He also introduces bound collocations (expressions “whose constituents do not like to be separated”) as a “transitional area bordering on idiom”.
16 Benson, M. , Benson, E. & Ilson, R Benson, M., Benson, E. & Ilson, R. (1986) Lexicographic Description of English (Studies in Language Companion Series, No 14), John Benjamins Publishing Company.Benson, M., Benson, E. & Ilson, R. (1986) The BBI Combinatory Dictionary of English, John Benjamins Publishing Company.
17 B,B & I distinguish between grammatical and lexical collocations. Grammatical collocations have a node followed by a subordinate unit (which is often a preposition) : refer to, reliance on, proud ofIn lexical collocations, both components have equal lexical status (ADJ-N, VB-N, ADV-ADJ, VB-ADV)
18 Sinclair, J. McH. (1991) Corpus, Concordance, Collocation, Oxford University Press. Defines two types of collocations (upward / downward) depending on the relative frequency of the two words considered in the order in which they occur.« give sb an edge » is a downward collocation, because « give » is more frequently used than « edge ».
19 Clas (1994) : « Collocations et langues de spécialité » in Meta, XXXIX, 4. V+N : prononcer un discours (verbe support)N+ADJ : rude épreuve, marque distinctiveADV+ADJ : grièvement blesséVB+ADV : recommander chaudementN+V : la cloche sonne, le chat miauleMarquage de la quantité du nom : un troupeau de vaches, une pincée de sel
20 Critique :la première catégorie est trop restrictive : set a record serait une collocation, mais pas [beat / break / hold] a recordLes deux dernières catégories ne supportent presque pas la variation
21 Exemple de choix de lexicalisation (http://pie.usna.edu/explore.html) Out of 55 nouns that co-occur with « emergency » at least 10 times in the BNC, only 14 can be found in both OH and RC :brake, measure, operation, repair for collocationscase, center, exit, landing, powers, ration, room, service, services, ward for compound nouns.
22 What makes a collocation worth learning for an EFL learner? Collocations that involve a verb and its typical object (drive a car, read a book) can usually be inferred.Some verbs generate an infinite number of collocates (buy a car, buy a book…)What makes the collocation worth memorizing is the fact that the verb takes on another meaning (buy a story, buy time)
23 What makes a collocation remarkable (salient) is the fact that one of its components has few collocates (cf. the Tact Z-score formula)Consequently, it makes more sense for an EFL learner to learn « downward collocations » grouped under the collocate rather than the node (record / beat, break, hold, set)
24 Mel’čuk’s lexical functions Lexical functions are the main principle underlying Mel’čuk’s Meaning-Text TheoryThey are meant to describe « institutionalized » lexical relations.Wanner (1996) gives examples of such relations: aircraft and crew, sheep and flock, bachelor and confirmed, mountain and peak, influence and exert, attention and pay.
25 The list includes both syntagmatically and paradigmatically related pairs of words. Mel’čuk admits that even tough all L-F covered phrases are collocations, his model does not cover some collocations when the logical relation between their components cannot be readily inferred (as with assurance maladie and assurance vie).
26 LFs only cover bigrams.Their aim is to cover syntagmatic and paradigmatic relations between words within a formalized notation system.The concept is meant to be applied to a wide variety of languages.Standard LFs include 36 syntagmatic LFs that belong to 4 distinct categories: nominal, adjectival/adverbial, prepositional and verbal.
27 Nominal LF:2828. Centr. [Lat. centrum – ‘the center of culmination of’]Centr(crisis)= the peakCentr(desert)=the heart
28 Adjectival/Adverbial LFs 29.Magn [Lat. magnus – ‘big, great’]Magn(naked)=starkMagn(thin)=as a rake33.Bon [Lat. bonus – ‘good’]Bon(aid)=valuableBon(proposal)=tempting
29 Prepositional LFs 35. Locin [being in ‘place’] Locin(height)=at [a height of…]36. Locab [moving away from ‘place’]Locab(height)=from [a height of…]37. Locad [moving into ‘place’]Locad(height)=to [a height of…]
31 Dirk Siepmann : Collocation, Colligation and Encoding Dictionaries Dirk Siepmann : Collocation, Colligation and Encoding Dictionaries. Part I: Lexicological Aspects IJL (4):Linguistic ‘intertextuality’ : the meaning of one text and its constituent elements depends on millions of other texts using similar or identical elements.Textual meaning is thus created by the interplay of two types of repetition :(a) collocation (in the largest possible sense, including colligation and phraseology)(b) cohesion.
32 The subject of collocation has been approached from two main angles: the semantically-based approaches (Benson 1986, Mel’cuk 1998, Hausmann 2003) which assume a particular meaning relationship between the constituents of a collocationthe frequency-oriented approach (Sinclair 1991)
33 A few of Siepmann’s opinions Only the frequency-based approach can provide a heuristic for discovering the entire class of co-occurrences; in a way, it is safe from refutation, but empty.By contrast, the semantically-based approach is fragmentary – it cannot account for all possible cases.HeuristicEtymology:German heuristisch, from New Latin heuristicus, from Greek heuriskein to discover; akin to Old Irish fo-f*air he foundDate:1821involving or serving as an aid to learning, discovery, or problem-solving by experimental and especially trial-and-error methods *heuristic techniques* *a heuristic assumption*; also : of or relating to exploratory problem-solving techniques that utilize self-educating techniques (as the evaluation of feedback) to improve performance *a heuristic computer program*
34 A purely pragmatic approach relying on the extralinguistic context cannot explain a large number of co-occurrences operating at the level of semantic features.What is needed is an extension of the semantically-based approach that will take account of strings of regular syntactic composition which form a sense unit with a relatively stable meaning.
35 ‘Lexical bundles’ (Biber et al ‘Lexical bundles’ (Biber et al. 1999) such as je sais que c’est or it's been will not be included among the class of collocations. Although such sequences may perform similar or identical functions across a range of texts, they have no meaning ‘by themselves’.
36 there are good […] reasons for subsuming under the notion of collocation such colligational patterns as regarde où tu vas, dans les colonnes de (+ name of newspaper or magazine) or si elle est prise à temps (referring to an illness), which have so far been regarded as free sequences of words subject only to general rules of syntax and semantics.
37 Are collocations always binary ? It is accepted wisdom among European researchers that collocations are binary units, and this is probably true for the majority of the class (e.g. take a step, launch an appeal).[…] threatening to this view are irreducible three-element collocations such as the following:(2) the car holds the road well(3) avoir un geste déplacé -> (?)avoir un geste recevoir un accueil chaleureux -> (?)recevoir un accueil
38 hold the road (subject: tyre), tomber à gros flocons (subject: neige), emporter la conviction (subject: argument) or eine Kurve machen (subject: Straße)[With such collocations] it [is] difficult to identify a standard lexical function (in the sense of Mel’cuk) that can provide a systematic link between the verb and the noun; this is because the entire collocation is semantically dependent on a specific subject.
39 Directionalitythe assumption of directionality (or of a hierarchical relationship between the constituents of the collocation) seems obvious with items such as table + lay / set or money + withdraweven such textbook examples of collocational theory as célibataire + endurci (‘confirmed bachelor’) may be viewed as bidirectional, since the adjective endurci combines with any noun carrying the semantic feature [+ figé dans son comportement]: criminel, catholique, Parisien
40 Berry-Rogghe’s Z_score The Z-score is an indication of the probability that two words will co-occur within a certain span.P = frq_totale collocant / longueur du texteE = P x longueur du mini-texteEcart type = SQR (E x (1-P))Z-score = (frq_mini-texte collocant –E) / Ecart type
42 Concordance de « lit » dans l’expression faire le lit de 6286 infectieux semblent faire le lit des localisations7774 | dont on | sait qu'ils font le lit du cancer.21884 et cartilagineuses qui feront le lit de l' arthrose.21939 qui | vont | faire le lit de l' arthrose.27952 | personnalité peuvent faire le lit de véritables maladies21146 détérioration dentaire et fait le lit de l‘ATS32987 | vieillissement artériel fait le lit de l' ATS8847 de l' oreillette gauche faisant le lit des troubles rythmiques ;17440 organes des sens, | peut faire le lit de délires d'
43 Z-SCORE des collocants de « lit » BEFORE 2, AFTER 0. Mini-text: 268 Z-SCORE des collocants de « lit » BEFORE 2, AFTER 0. Mini-text: 268. Total Text:CollocateCollocate Freq.Type Freq.Z-scorerepos1330064.077au46951739.336feront11125.781garder 24724.903le353624713.646faire5112412.384faisant1786.263font2445.300fait21853.120
44 The Mutual Information (MI) score the word post co-occurs with many words, among which are "the", "office" and "mortem".f(office) = 5237 f(the) = f(mortem) = 51(f= overall frequency in the Birmingham Corpus)
45 Joint frequency for those three words is as follows : j(the) = 1583 j(office) = 297 j(mortem) = 51The relative frequencies can be compared with what would be expected under the null hypothesis
46 THE NULL HYPOTHESISThe word post has no effect whatsoever on its lexical environment and the frequencies of words surrounding post will be exactly the same as they would be if post were present or not.Expected co-occurrence of post is calculated as : (f(post) * span ) * relative_freq(the) (2579 * 8) * (1 / 20) = / 20 = 1031
47 The MI Score is the ratio between observed co-occurrence and expected co-occurrence For post and the, it is log(1583/1031) = 0,17The expected joint frequency for post and office is : (f(post) * span ) * relative_freq(office) 2579 * 8 * 297/20m = 0,3The observed joint frequency is 297. Hence the MI score is about log(990)=2,99For mortem, the MI score is log(51/0,05) = 3
48 The mutual-information score for a two-word collocation is a base-2 logarithm of the ratio of the combined probabilities of the occurrence of the first word and the occurrence of the second word to the probability of the occurrence of the two-word collocation.T-scores differ from mutual information scores in being scaled by an estimate of the variance (they tend to correct skewed MI scores that are due to a low number of occurrences).