Presentation on theme: "1 Building and using field-specific corpora to enhance ESP teaching Manuela Reguzzoni"— Presentation transcript:
1 Building and using field-specific corpora to enhance ESP teaching Manuela Reguzzoni firstname.lastname@example.org
2 The session focus How to build a pedagogic corpus and what to do to analyse it. An example: a tentative investigation into a ME pedagogic corpus.
3 What is a 'corpus'? A corpus [kaw-puhs] is a collection of language texts saved to disk in ASCII format.
4 General corpora are huge The Bank of English (http://mycobuild.com/about-collins- corpus.aspx)http://mycobuild.com/about-collins- corpus.aspx over 500m words of present-day English 56 million words sub-corpus for teaching ----------------------------------------------- The British National Corpus (http://www.natcorp.ox.ac.uk/)http://www.natcorp.ox.ac.uk/ A 100-million-word corpus of British English ______________________________ (Aston University ACORN corpus) 180 million words are equivalent to: –12.5 million text messages, –2 to 3000 books, –15 years of a daily newspapers, – 7 years of non-stop conversation.
5 Corpus size Do all corpora need to be large? Do all corpora need to be large? Not necessarily. The mini-corpus or sub-corpus, approach has increasingly gained recognition and users for pedagogical purposes. (Willis, 1998; Lewis, 2000; Tribble, 2000) Not necessarily. The mini-corpus or sub-corpus, approach has increasingly gained recognition and users for pedagogical purposes. (Willis, 1998; Lewis, 2000; Tribble, 2000) How large should a pedagogic (ESP) corpus be? How large should a pedagogic (ESP) corpus be? The amount of language required depends on the purpose of the research (Barnbrook,1996; Coxhead, 2000). The amount of language required depends on the purpose of the research (Barnbrook,1996; Coxhead, 2000). Should a pedagogic corpus cover 'all' the target discourse? Should a pedagogic corpus cover 'all' the target discourse? Not necessarily. Not necessarily. e.g. e.g.
6 Field-specific pedagogic corpora do not need to be large do not need to be large can be created with the teaching materials we use in class can be created with the teaching materials we use in class can be centred on the language we expose our learners to, the language we think our learners need to know can be centred on the language we expose our learners to, the language we think our learners need to know
7 Why is the creation of a pedagogic corpus useful? A corpus can provide data-based evidence produce a word frequency list show common language patterns allow comparisons with other corpora improve material selection/design
8 Building your corpus: step 1 make your materials machine-readable
9 Building your corpus: step 2 prepare your corpus before loading
10 complicating factors e.g. "recipe-books", engine room may be considered as a single type or as two single words. "cook's could mean "of the cook", or it could be a contacted form (cook is). The programme you use could treat s as a separate type distinct from "cook" and "cooks". The apostrophe in cooks is ambiguous. The programme could ignore it. letters like I and a can be miscounted. I can be a personal pronoun (I am a teacher) and a can be an indefinite article (this is a ship) but they can also be used in ordered lists. Before indexing your corpus you have to sort out these problematic aspects.
11 A Practical Example:Why is a Ship called She? by Sir John Martin, Lieutenant Governor of Guernesay (Annual Report R.I.N.A., 1976, vol. 118,) Text of only: 97/98 running words or tokens Possible complicating factors: Capital letters/small letters: e.g. A/a – she/She Poly-words: e.g. good-looking; decked out ME field specific lexis: 3 or 13 types? e.g. ship; gang; waist; stays; paint; decked out; handle; helm; topsides; bottom; port; heads; buoys.
12 Gangnumber of persons working together; team (of seamen) Waist1. part of the body between the ribs and the hips 2. upper deck between forecastle and poop (in old wooden ships) Stay1. strip of bone, plastics, or metal used to stiffen a garment 2. a heavy rope/cable used as support To deck out1. to decorate 2. to cover with decks To handle1. to control, to treat 2. to manoeuvre Helmwheel or tiller for moving the rudder of a craft Topsides1. upper parts 2. the sides of a ship above the waterline Bottom1. part of the body on which a person sits, lower part of sth. 2. the lower, horizontal part of the hull To headto move, to sail towards sth. Portartificial harbour designed to accommodate and look after ships Buoy ( pronounced asboy) floating device or float made of buoyant material moored in water
13 Building your corpus: step 3 choose a software tool to analyse the corpus for example: AntConc by Lawrence Antony (http://www.antlab.sci.waseda.ac.jp/index.html) AntConc by Lawrence Antony (http://www.antlab.sci.waseda.ac.jp/index.html)http://www.antlab.sci.waseda.ac.jp/index.html
15 The AntConc tools The word list tool counts all the running words in the corpus (called tokens) and presents them in an ordered list. This allows you to find which words (types) are the most frequent in a corpus. The concordance tool allows you to see how words and phrases are commonly used in a corpus of texts. The concordance table shows the search word in the middle and its co- text to the left and to the right. The file view tool shows the source text and can be used to see where particular examples come from. The search term hits will be highlighted throughout the text. The collocates tool shows the collocates of a search term. (Collocation is the phenomenon whereby certain words tend to co-occur with other words.)
16 Building your corpus: step 4 define your objective/s start your research
17 Exhaustive listing of electronic resources and tools available on the Internet concerning corpus linguistics David Lees site http :// www.uow.edu.au/~dlee/CBLLin ks.htmhttp :// www.uow.edu.au/~dlee/CBLLin ks.htm Bookmark
18 AN EXAMPLE: A TENTATIVE INVESTIGATION INTO A PEDAGOGIC CORPUS OF MARITIME ENGLISH (ME) AN EXAMPLE: A TENTATIVE INVESTIGATION INTO A PEDAGOGIC CORPUS OF MARITIME ENGLISH (ME)
19 MARITIME ENGLISH sub-registers set languages (SeaSpeak and IMO Standard Phrases ) shipbuilding, seamanship, cargo handling, meteorology and oceanography, marine engineering, electricity, electronics, automation, port operations, marine pollution, safety of life at sea, international rules and regulations, marine insurance, shipping, business transactions, catering and tourism.
20 ME: The State of the Art very little, if any, is known about ME research almost non-existent no field-specific corpora available
21 The purpose of the investigation identifying the main characteristics of the teaching materials that I used finding the right lexical items to focus on devising better-targeted learning tasks reducing, the learners difficulties in building up their ME mental lexicon and in using it appropriately
22 The software WinATA (Aston Text Analyser) FREQUENCY and RANGE (Heatley, Nation and Coxhead, 2002) WordClassifier (Denies, Goethals and EET Project Team, 1996)WordClassifier e.g. Word classification with WordClassifierWord classification with WordClassifier
23 Corpus statistics Sub-corpora14 Texts185 Average length of texts280 running words Pages96 Tokens/Running words51,823 (WinATA count) Types5,831 Hapax legomena2,528 Types occurring less than 9 times 5,013
24 Sub-corpora descriptionToken countType count Basic Ship Terminology 1,771455 Ship Types 1,255416 Ship Particulars 1,010314 Manning 2,057541 The History of the Ship 2,323747 Famous Ships 5,9561,659 Shipbuilding 1,235509 Miscellanea: Structural Elements and Shipboard Plants 2,583775 Technical Specification (4) 9,4821,984 IMO/Classification Societies 2,958874 Marine Pollution 3,6421,115 Marine Meteorology 6,1341,515 Port Operations 3,153750 Collision Regulations 8,264997
25 Stages in the investigation Stage1 –Producing a frequency listfrequency list –Comparing the MEPC most frequent words with the ones from other listsmost frequent words –Identifying the function words not/present in the corpusfunction words –Finding the coverage of the most common words Stage 2 –Identifying the maritime lexical items in the corpus –Analysing the main features of the field specific lexical items. –Classifying the technical words
26 The most and the least frequent words across different lists The 50 most frequent words The 50 most frequent words General Service List (GSL) General Service List (GSL) adapted from West by Bauman (http://jbauman.com/gsl.html ) http://jbauman.com/gsl.html Cambridge International Corpus (CIC) Cambridge International Corpus (CIC) 330,000 words of written data 330,000 words of written data The COBUILD Bank of English The COBUILD Bank of English 196 million words of written corpus
27 ME vocabulary Hardly unique per se Mainly general words taking on different meanings and roles through polysemy and homonymy compounding
28 Polysemy and homonymy Polysemy and homonymy 1/5th of all types Polysemy and homonymy GE/ME differences In meaning In meaning In grammatical functions: In grammatical functions: –adverbs or prepositions -> adjectives –from verbs -> nouns
29 Shifts Shifts In meaning bank bank - a financial institution - the bank of a river - a bank of fog - a row of objects (e.g. a bank of oars, a bank of tubes). floorfloor - a horizontal subdivision in a building - a vertical plate in the ship bottom. air draughtair draught - a current of air - the maximum height of the ships parts above the water surface. port port - an artificial harbour, - an opening in the hull - the left side of the ship. In grammatical functions bow GE: - noun (a knot with two loops, a weapon or a device for playing a musical instrument) - verb (indicating a body motion) ME: - noun (the fore end of a ship) after GE: - time relater (preposition/adverb) ME: - adjective (the after end of the ship).
30 Compounding (1) Usual types of connection noun plus noun noun plus noun e.g. ballast water, radio officer present participle plus noun present participle plus noun e.g. mooring ropes, navigating cadet past participle plus noun past participle plus noun e.g. compressed air, I-shaped beam
31 Compounding (2) Common semantic relationships ( Blakey, 1987: 146) B of A cylinder cover, hatchway B with/has A B contains A salt water, shipowner wheelhouse, storeroom B in/on/at A port operations, bow thruster B is made of/from A copper wire, air-cushion B operated by A B uses A hand pump steam turbine, water plant B shaped like A needle valve, I-beam B invented by A Diesel engine, Beaufort wind scale
32 Compounding (3) adjectives adjectives (deep tank, double bottom, forecastle, parallel middle body, strong beam, upper deck) nominalised adjectives nominalised adjectives (deck longitudinals) adjectival compounds adjectival compounds (oil tight, watertight) reverse combinations reverse combinations (depth moulded, length overall) ordinal numbers ordinal numbers (first mate, third engineer) prepositions prepositions (tween deck, upkeep, overhaul) (tween deck, upkeep, overhaul) the names of seasons the names of seasons ( summer load line) proper nouns turned into common nouns proper nouns turned into common nouns (jacobs ladder, samson post) eponyms or names of inventors to describe a product eponyms or names of inventors to describe a product (Diesel engine, Beaufort scale, Plimsoll marks) place names to indicate an important event or convention place names to indicate an important event or convention (York-Antwerp Convention, Florida Act) (York-Antwerp Convention, Florida Act) geographical names geographical names (North Atlantic loadline)
33 Compounding (5) poly-words One word (bulkhead, shipowner) One word (bulkhead, shipowner) Spaces in between (water ballast, bracket floor) Spaces in between (water ballast, bracket floor) Hyphens (I-beam) Hyphens (I-beam) Prepositions (round of deck, turn of the bilge, length between perpendiculars) Prepositions (round of deck, turn of the bilge, length between perpendiculars) Possessive case (Ships Cook) Possessive case (Ships Cook) Combined devices (men-of-war) Combined devices (men-of-war) fixed collocations with specialized unitary meaning
34 condense information (Hatch & Brown,1995:191) create new meanings different from the one of each of the parts making up a combination (Barlow,1996:12) create unique meanings are the only acceptable referential forms available to point to areas of experience shared by the target maritime community (there exist no other words to point to the concepts they represent) do not serve other frames of reference are to be considered as single words (though written with hyphens or with spaces in between) have stable relationships having frozen into fixed forms extreme forms of fixed collocation can be seen as extreme forms of fixed collocation (Becker, 1975: 8; Schmitt and McCarthy, 1997:43) ME multi-word items - fixed collocations with specialized unitary meaning -
35 Other relevant lexical aspects –clippings (bosun for boatswain, fcsl for forecastle), –initialization (A.B.S.) –acronyms (SOLAS: Safety Of Life At Sea, MARPOL: MARine POLlution).
36 Metaphors Metaphors Metaphorical use of animal names in fixed collocations with specialized unitary meaning Metaphorical use of animal names in fixed collocations with specialized unitary meaning (cats walk, dog watch, crows nest, donkeyman) Metaphorical use of the language in connection with the word ship Metaphorical use of the language in connection with the word ship (she/her ->backbone, ribs)
38 ME lexical classification Few unique field specific lexical items Lexical items also belonging to other ESP fields Multi-word sense segments or compounds (common words occurring together to form unique field specific single meanings) Polysemes and homonyms (common words used with special unique meanings in the frame of reference) Function words and general service words
39 THE PEDAGOGIC WASH-BACK greater attention to the most frequent and to the least frequent words in the texts a different approach in designing learning tasks sense-segment-based lexical activities matching old words to new meanings exploring the multiple meanings of words analysing and manipulating the different relationships and combinations
40 Activity 1: Look at the following table and decide what is the meaning of course in the different instances This isof course important in all ships. The captain's watch, and,of course, the bell itself, ….. ….. an alterationof course towards a vessel abeam ….. sufficient sea-room, alterationof course alone may be the solution.
41 Activity 2: Read the following examples and guess the different meanings of the wordcurrent in context. Then check by using a dictionary. 1. Evaluate current, nearby port and hurricane haven locations that may be considered for tropical cyclone avoidance. 2. Current and lighting are supplied by the generators. 3. Winds of hurricane force opposing any ocean current can quickly create very steep, short period waves. 4. Plot current/ forecast positions of all active/ suspected tropical cyclone activity. 5. The service speed as well as the optimum size of tanker is very much related to current market economics. 6. The developing storm drifts westwards with the current of free air and it deviates from the equator after arriving at the western margin of the semi-permanent 'high' 7. The current state of the environment is one of the most serious problems facing mankind today.
42 Activity 3: Find the different uses and meanings of the word after using a dictionary. Then read the following bits of sentences and identify the different meanings. …..on course and lookafter all the equipment used. ….. not going to Liverpoolafter all, not yet anyway. Theafter perpendicular ( A.P.) is a ….. Every deck is namedafter an Italian city (Genoa, ….. ….. deviates from the equatorafter arriving at the western ….. ….. if a witch wasafter her. ….. died a few daysafter she was registered and ….. However, by 12 hours after landfall, tornadoes tend to ….. ….. peak tanks and theafter peak tanks. ….. have patterned their shipsafter the shapes of waterfowl. Standing on theafter davit, he was trying ….. ….. vertical line through theafter edge of the rudderpost.
43 Activity 4: All the words listed below contain ship, but there are two odd-words-out. Cross them out and motivate your decision. Provide an example for each word. Translate the words into Italian. 1. amidships 2. athwartships 3. battleships 4. lightship 5. seamanship 6. shipboard 7. shipbuilder 8. warship 1. shipmasters 2. ship-owner 3. ship-repairing 4. relationship 5. shipwreck 6. shipwright 7. shipyard 8. steamship
44 Activity 5: Identify the relationships in the following compounds and fill in the table after peak tank cylinder cover salt water needle valve I-beam ship owner wheelhouse storeroom hatchway steam turbine water plant hand pump steam turbine air-cushion Beaufort wind scale port operations B of A B with/has A B contains A B is made of/from A B in/on/at A B uses A B operated by A B shaped like A B invented by A
45 Activity 6: Gapped compounds - Complete the compound words in this passage. A general cargo _______ is a single- or two-deck ship. The hull is divided up into a number of water-__________ compartments by decks and _________ heads. At the fore and after ends of the hull are the fore _________ tanks and the ________ peak tanks. There usually are four or five holds in-between. The holds also have __________ decks, i.e. decks dividing up cargo space. A traditional dry cargo _______ has her engine ______ and bridge ____________ amidships so that there are three holds forward of the engine _____________ and two holds aft of it. Above the main _________ at the fore end, forward of n°1 hold there is the __________ castle and right forward is the _________ staff. The derricks are supported by masts and by a _______________ post. They are stowed fore-and- __________ when the ship is at sea. There are two _________ boats, one on the port ___________ amidships, another on the ___________________ side amidships, abaft the funnel (the funnel is always abaft the bridge). The poop is at the after end of the ship and there is an ensign ______________ right aft.
46 Task aiming at developing learner autonomy (created with Word Classifier) Read the following lists of words. They are all the words (381) from the Module Basic Ship Terminology that you have studied. Their difficulty ranges from 0 (fairly common) to 5 ( less common) Work on your own. Underline all the words that you recognize and whose meaning you can remember. Count them and see how good you are and how much you have learnt. Work with a partner and create as many compound words as you can. Form a group of four and compare your lists. If you like, you can turn this activity into a competition. (The winner is the team of 2 students who have produced more compound words. The group decides whether the words are correct or not and assigns the scores. If you do not manage to reach an agreement, ask your teacher)