Presentation is loading. Please wait.

Presentation is loading. Please wait.

(Re)searching the Corpus of Research Articles for discipline- and genre-specific language 1 Winnie Cheng Department of English EDC Workshop 13 Oct 2014.

Similar presentations


Presentation on theme: "(Re)searching the Corpus of Research Articles for discipline- and genre-specific language 1 Winnie Cheng Department of English EDC Workshop 13 Oct 2014."— Presentation transcript:

1 (Re)searching the Corpus of Research Articles for discipline- and genre-specific language 1 Winnie Cheng Department of English EDC Workshop 13 Oct 2014

2 Aims of workshop 1. To examine the Corpus of Research Articles (CRA) to compare sixteen academic research article sections to identify discipline- and genre*-specific grammar and phraseologies. (Task) 2. To suggest pedagogical implications for capstone and research project report writing for PolyU students and implications for future research * a particular type of art, writing, music etc, which has certain features that all examples of this type share (Longman Dictionary of Contemporary English http://www.ldoceonline.com/dictionary/genrehttp://www.ldoceonline.com/dictionary/genre 2

3 Outline 1. Concepts, approaches and methods of corpus research 2. Data: Corpus of Research Articles (CRA) 3. Corpus analysis tools 4. Description, interpretation and explanation of findings 5. Conclusions 6. Implications 3

4 Defining a corpus A collection of pieces of authentic language text (including transcripts of spoken data) Sampled to be representative of a particular language or language variety Stored in electronic form and machine-readable 4

5 Concepts, approaches and methods of corpus research The phraseological tendency of language Lexical verbs Combining genre (academic research articles) analysis and corpus research Use of different corpus analytical programs/tools 5

6 6

7 7 Meaning is Created I couldn’t believe that I could actually understand what I was reading. The phenomenal power of the human mind According to a researcher at Cambridge University, it doesn’t matter in what order the letters in a word are, the only important thing is that the first and last letter be in the right place. The rest can be a total mess and you can still read it without a problem. This is because the human mind does not read every letter by itself, but the word as a whole. Amazing huh? Yeah and I always thought spelling was important!

8 8 Meaning is Created … the human mind does not read every word in a clause by itself, but the co- selection of words as a whole.

9 9 Patterns of word co-occurrence and meaning Words do not create meaning in isolation. “You shall know a word by the company it keeps” (Firth, 1957: 11). Meaning is created by the co-selection of words. 9

10 The phraseological tendency of language To fully describe the meaning and use of language: The way in which words are co-selected by speakers and writers Word co-occurrences evident in texts 10

11 Aboutness ‘aboutness’ (Phillips, 1983): the phraseology of the language contained in a text or a corpus that is specific to a discipline or a profession 11

12 General vs. specialised corpora ‘general corpus’: “language as a whole” (Sinclair, 2001: xi) ‘specialised corpus’: “characteristics of the genre” (ibid: xi) 12

13 13

14 14 Online RCPCE Profession-specific Corpora Hong Kong Corpus of Spoken English (1 million words, prosodically transcribed) Hong Kong Corpus of Spoken English Hong Kong Corpus of Surveying and Construction Engineering (5.7 million) Hong Kong Corpus of Surveying and Construction Engineering Hong Kong Engineering Corpus (9.2 million) Hong Kong Engineering Corpus Hong Kong Financial Services Corpus (7.3 million) Hong Kong Financial Services Corpus Hong Kong Budget Speeches Corpus 1997 – 2010 (176,515) Hong Kong Budget Speeches Corpus 1997 – 2010 Hong Kong Policy Address Speeches Corpus 1997 – 2009 (153,198) Hong Kong Policy Address Speeches Corpus 1997 – 2009 Corpus of Research Articles (5.7 million) Corpus of Research Articles ConcGramOnline©, Chris Greaves 14

15 Corpus of Research Articles (CRA) 5.6 million words 39 disciplines 20 articles in 20 popular journals with high impact factors in 2007 for each discipline 16 research article sections 15

16 39 disciplines in CRA 1 Accounting & Finance 2 Anthropology 3 Applied Biology & Chemical Technology 4 Applied Linguistics 5 Applied Mathematics 6 Applied Physics 7 Applied Social Sciences 8 Archaeology 9 Building & Real Estate 10 Building Services Engineering 11 Civil & Structural Engineering 12 Computing 13 Design 14 Economics 15 Education 16 Electrical Engineering 17 Electronic & Information Engineering 18 Geography 19 Health Technology & Informatics 20 History 21 History of Art 22 Hotel & Tourism Management 23 Industrial & Systems Engineering 24 Land Surveying & Geoinformatics 25 Law 26 Linguistics 27 Literature 28 Logistics 29 Management & Marketing 30 Mechanical Engineering 31 Music 32 Nursing 33 Optometry 34 Philosophy 35 Politics 36 Psychology 37 Rehabilitation Sciences 38 Sociology 39 Textiles & Clothing 16

17 17

18 18

19 16 sections in CRA  16 sub-corpora Sections 1.Abstract9. Literature Review 2. Application10. Method and Results 3. Conclusion11. Method, Results and Discussion 4. Directions12. Results and Discussion 5. Discussion13. Method 6. Implications14. Recommendations 7. Introduction15. Results 8. Limitations16. Summary 19

20 Three studies 1. To identify section-specific verbs in CRA 2. To identify the phraseological patterns of the most frequent verbs in one section 3. To conduct a concordance study of a phraseology in different sections of CRA 20

21 Study 1 Aim of study: To identify section-specific verbs in CRA Corpus analysis tool: Wmatrix 3 (Rayson, 2013) 21

22 Verb (grammar) a word or group of words that expresses an action (such as eat), an event (such as happen) or a state (such as exist) Oxford Learner’s Dictionary http://www.oxfordlearnersdictionaries.com/definition/english/verb 22

23 Auxiliary verb (grammar) a verb such as be, do and have used with main verbs to show tense, etc. and to form questions and negatives In the question ‘Do you know where he has gone?’, ‘do’ and ‘has’ are auxiliaries. Oxford learner’s Dictionary http://www.oxfordlearnersdictionaries.com/definition/english/verb 23

24 Wmatrix 3 24

25 Wmatrix 3 A web interface to the CLAWS7 corpus annotation tools for automatic part-of- speech (POS) taggingCLAWS7 25

26 CLAWS 7 Part-of-speech (POS) tagging Constituent Likelihood Automatic Word- tagging System (CLAWS) 7 POS tagger with more than 160 tagsets Extract different word classes automatically Assign a POS tag to each word 26

27 Tagsets of lexical verbs VV0 base form of lexical verb (e.g. give, work) VVD past tense of lexical verb (e.g. gave, worked) VVG -ing participle of lexical verb (e.g. giving, working) VVGK -ing participle catenative (going in, be going to) VVI infinitive (e.g. to give..., It will work...) VVN past participle of lexical verb (e.g. given, worked) VVNK past participle catenative (e.g. bound in, be bound to) VVZ -s form of lexical verb (e.g. gives, works) 27

28 Top 20 lexical verbs of 16 sections of CRA Section-specific (genre-specific) lexical verbs 28

29 Top 20 lexical verbs of 16 sections AbstractApplicationConclusionDirections 1 using VVG used VVN 2 based_on VVNlet VV0used VVNinclude VVI 3 used VVNsee VV0 provide VVI 4 used_to VVNshown VVNbased_on VVNincreasing VVG 5 found VVNshows VVZmade VVNusing VVG 6 compared VVNconsider VV0provides VVZneeded VVN 7 show VV0obtained VVNshown VVNprovides VVZ 8 associated VVNdefined VVNfound VVNdetermine VVI 9 developed VVNused VVNsuggests VVZconducted VVN 10 obtained VVNgiven VVN help VVI 11 provides VVZdescribed VVNassociated VVNobtained VVN 12 examines VVZfollows VVZmake VVIseems VVZ 13 shown VVNused_to VVNsuggest VV0Experienced VVN 14 discussed VVNfound VVNconsidered VVNrequire VVI 15 related VVNnote VV0presented VVNexamine VVI 16 suggest VV0moving VVGuse VVIexplore VVI 17 applied VVNimplies VVZprovide VVIcontinue VVI 18 showed VVDexists VVZdeveloped VVNimprove VVI 19 provide VV0based_on VVNneed VV0explored VVN 20 provide VVIexpected VVNunderstand VVIshown VVN 29

30 DiscussionImplicationsIntroductionLimitations 1 see VV0shown VVNusing VVGtest VVI 2 using VVGneeds VVZused VVNexplore VVI 3 given VVNsuggest VV0see VV0using VVG 4 used VVNprovide VVIbased_on VVNrepresent VV0 5 made VVNneed VV0found VVNconsidered VVN 6 found VVNsee VV0associated VVNconducted VVN 7 make VVIpresented VVNdeveloped VVNbased_on VVN 8 seen VVNsuggests VVZshown VVNlearning VVG 9 based_on VVNdeveloped VVNconsidered VVNgiven VVN 10 associated VVNidentify VVImade VVNsolve VV0 11 seems VVZappears VVZrelated VVNstay VVI 12 suggests VVZneed VVIprovide VVIexamine VVI 13 shown VVNconsider VVIgiven VVNinclude VVI 14 found VVDshows VVZprovides VVZindicate VVI 15 considered VVNassociated VVNused_to VVNtaken VVN 16 defined VVNshow VV0provide VV0includes VVZ 17 see VVItend VV0found VVDaffect VV0 18 provide VVIused VVNapplied VVNsuggest VV0 19 use VVIneeded VVNdescribed VVNneeds VVZ 20 became VVDimprove VVIcalled VVNdetermine VVI 30

31 Literature ReviewMethod Method and Results Method, Results and Discussion 1 using VVG used VVNusing VVG 2 used VVN using VVGsee VV0 3 based_on VVNobtained VVNfound VVN 4 based_on VVNshown VVN used VVN 5 shown VVNgiven VVNshows VVZshown VVN 6 given VVNsee VV0 shows VVZ 7 found VVNobtained VVNreported VVNshow VV0 8 defined VVNused_to VVNgiven VVN 9 associated VVNmeasured VVNincreased VVNmade VVN 10 shows VVZdefined VVNfound VVNassociated VVN 11 related VVNdescribed VVNinduced VVNpresented VVN 12 considered VVNincluded VVNnote VV0compared VVN 13 obtained VVNPerformed VVNbased_on VVNdefined VVN 14 described VVNmade VVNused_to VVNobtained VVN 15 developed VVNused VVDforcing VVGremains VVZ 16 made VVNuse VV0asked VVNindicate VV0 17 used_to VVNconsidered VVNrequires VVZuse VV0 18 use VVIcalculated VVNidentified VVNseen VVN 19 reported VVNpresented VVNsuggests VVZindicates VVZ 20 provides VVZconducted VVNdiscussed VVNbased_on VVN 31

32 RecommendationsResults Results and DiscussionSummary 1 work VVIusing VVG based on VVN 2 uses VVZsee VV0shown VVNusing VVG 3 attract VVIshows VVZ provide VVI 4 used VVNfound VVNused VVNfound VVN 5 using VVGcompared VVNsee VV0used VVN 6 based_on VVNshown VVNobtained VVNachieve VVI 7 provide VVIused VVNshow VV0given VVN 8 explore VVIshowed VVDobserved VVNrelated VVN 9 attract VVIobserved VVNseen VVNseem VV0 10 experience VV0associated VVNfound VVNbecome VVI 11 suggest VV0reported VVDbased_on VVNpresented VVN 12 exposed VVNbased_on VVNcompared VVNfound VVD 13 snow VVIrelated VVNreported VVNconsidered VVN 14 act VVIseen VVNgiven VVNexpect VV0 15 raised VVNpresented VVNassociated VVNexpected VVN 16 anchored VVNgiven VVNincreases VVZshows VVZ 17 recommended VVNshow VV0presented VVNapplied VVN 18 used_to VVNrevealed VVDexpected VVNprovides VVZ 19 use VV0indicated VVDmeasured VVNunderstand VVI 20 expect VV0found VVDmade VVNlabeling VVG 32

33 Study 2 Aim of study: To identify the phraseological patterns of frequent verbs in one section (recommendations) Corpus analysis tool WordSmith Tools (Scott, 2012) 33

34 RecommendationsResults Results and DiscussionSummary 1 work VVIusing VVG based on VVN 2 uses VVZsee VV0shown VVNusing VVG 3 attract VVIshows VVZ provide VVI 4 used VVNfound VVNused VVNfound VVN 5 using VVGcompared VVNsee VV0used VVN 6 based_on VVNshown VVNobtained VVNachieve VVI 7 provide VVIused VVNshow VV0given VVN 8 explore VVIshowed VVDobserved VVNrelated VVN 9 attract VVIobserved VVNseen VVNseem VV0 10 experience VV0associated VVNfound VVNbecome VVI 11 suggest VV0reported VVDbased_on VVNpresented VVN 12 exposed VVNbased_on VVNcompared VVNfound VVD 13 snow VVIrelated VVNreported VVNconsidered VVN 14 act VVIseen VVNgiven VVNexpect VV0 15 raised VVNpresented VVNassociated VVNexpected VVN 16 anchored VVNgiven VVNincreases VVZshows VVZ 17 recommended VVNshow VV0presented VVNapplied VVN 18 used_to VVNrevealed VVDexpected VVNprovides VVZ 19 use VV0indicated VVDmeasured VVNunderstand VVI 20 expect VV0found VVDmade VVNlabeling VVG 34

35 Recommendations The most frequent three genre-specific verbs in the Recommendations sub-corpus 1.work VVI [infinitive (e.g. to give..., It will work...)] 2.uses VVZ [-s form of lexical verb (e.g. gives, works)] 3.attract VVI 35

36 WordSmith Tools 6.0 (Scott, 2012) 36

37 WordSmith 6.0 (Scott, 2012) Three main functions: Concord, WordList, KeyWord Using the Concord function, create 3-, 4-, and 5-word clusters of: 1.work VVI 2.uses VVZ 3.attract VVI 37

38 3-,4-,5-word clusters of ‘work’ VVI in Recommendations 3-word clusters4-word clusters5-word clusters to work into work in theto work in the restaurant work in thework in the restaurantto work as restaurant food work in foodto work as restaurantto attract people to work work as restaurantwork in food serviceto work in restaurant service work in restaurantwork as restaurant foodto work in food service to work as workers endorsement to work service workers endorsement to work endorsement to workwork in restaurant servicereasons to work as restaurant people to workto work in restaurantto attract others to work reasons to workto work in foodthe top reasons to work others to worktop reasons to workwork in restaurant service was 38

39 3-,4-,5-word clusters of ‘uses’ VVZ in Recommendations 3-word clusters4-word clusters5-word clusters system uses barcoderoom that extensively uses system uses barcode printed cards palette system uses system uses barcode printed that extensively uses barcoded tags uses barcoded tags palette system uses barcode room that extensively uses barcoded uses barcode printeduses barcoded tags as palette system uses barcode printed uses material culture uses barcode printed cardsuses barcoded tags as physical uses a printeduses material culture to uses material culture to enhance that uses materialthat uses material cultureuses barcode printed cards for that extensively uses that extensively uses barcodedthat uses material culture to movement that usesuses a printed code the palette system uses barcode extensively uses barcoded the palette system usesuses a printed code to 39

40 3-,4-,5-word clusters of ‘attract’ VVI in Recommendations 3-word clusters4-word clusters5-word clusters to attract peoplewill help to attractto attract others to work to attract serviceways to attract serviceto attract people to work to attract otherstool to attract people significant recruiting tool to attract tool to attractto attract service workers ways to attract service workers ways to attractto attract people towill help to attract others attract service workersto attract others totool to attract people to attract others torecruiting tool to attract to attract service workers endorsement attract people tohelp to attract othersto explore ways to attract help to attractexplore ways to attract recruiting tool to attract people attract service workers endorsement attract service workers endorsement to 40

41 Study 3 41 Concordance study of a three-word cluster ‘by using the’ in different sections of CRA A three-word concgram ‘to/identify/and’ in different sections of CRA Illustrating concordances …

42 30 concordance lines of SARS 42

43 43

44 Phraseological variation Clusters/n-grams/ bundles/chunks, i.e. patterns of contiguous words such as ‘you know’, ‘in terms of’, ‘a lot of’, ‘work hard’, etc. But what about patterns with phraseological variations, e.g. – ‘a lot of business people’ and ‘a lot of different types of people’ – ‘work hard’, ‘work very hard’, ‘hard work’, and ‘how hard I had to work’? 44

45 How can the phraseological tendency of a language be objectively and formally identified? Is there a means to fully and automatically extract phraseologies which exhibit variation from a corpus? 45

46 46 How to uncover phraseologies (1) n-gram (bi-grams, tri- grams, etc.) skipgram contiguous words which constitute a pattern of use and which recur in a corpus non-contiguous word co- occurrences of limited membership which constitute a pattern of use and which recur in a corpus e.g., in terms of, in terms of the e.g., a lot of business people, a lot of different types of people 46

47 47

48 ‘ because/so’ in British National Corpus 1 won't know is that she's never bothered to ask because she's not talking so it's okay while there but 2 and you've got to have the front door [unclear] because there's a bar at back so these are special 3 the taxi. He goes well, let me read it. Because, because I'm a complete stranger so I don't have to spend 4 you see, J Julie's likely to do quite a lot because she's got to stay there so you've got to 5 home, Rowan's mother wouldn't let her have it because it was too revealing and so Penny was stuck with 6 the morning Yeah, no it wouldn't be tomorrow because I think my mum's working so Yeah It doesn't 7 No, it's not going to cost her any more, because it's included in the plan, so it's not going to 8 with Chris and Chris insisted that he did it. Because he's got a plan of the site so he wants to know 9 give you a bit of my advice [unclear] on a lead, because er you haven't had the call so you ought to be 10 scratch in Alan's well equipped kitchen. But because Linda has to stop half way through so that other 11 the movement and people need labels. I think, because the society does want to categorise people so 12 1960s were in the lowest housing class. This was because they generally had low and insecure incomes, so 13 he, was he so naughty to you? so Richard's crying because he'd been hitting him the face. He's howling and 14 get the land. Er, so I just make that point because of the debate last week. Thank you. Thank you. 15 I'll put those down, so let's find some of these because obviously you won't have met them all, maybe. 16 hundreds of years. So he brought his family over because negotiations were taking so long, and he 17 at this time. So it must be that one Mm mm because the other chap comes about half past eight in the 18 that's fine, so I'll have to get it in soon because I won't be able to get him in till about for 19 [unclear] So to get this You had to pay this, because when it came to the end of the quarter, you had 20 worried so I thought well I might as well go up because I shall start to worry and things get out of 21 effort so that they will fear losing their jobs because the alternative jobs are less well paid (see 22 that. So you, you've got to think about those, because if you want to survive, and you also want to go 23 it. So I think that that's an important point, because I do believe that weight is placed by the 24 so then we can talk about lobbying Parliament, because we can't do it without them. We need a focus 25 so before anybody jumps for it, think about it, because it's boring. Now down to business I would like 48

49 A sample concordance of ‘political/Hong Kong’ in the Western Media Corpus 2006-2008 49

50 50 ‘Asia/world/city’ 50

51 Findings 51

52 ‘by using the’ (Application) 52

53 ‘by using the’ (Results) 53

54 ‘by using the’ (Conclusions) 54

55 55 A three-word concgram ‘to/identify/and’ in different sections of CRA

56 ‘to/identify/and’ (Abstract) 56

57 ‘to/identify/and’ (Results) 57

58 ‘to/identify/and’ (Conclusions) 58

59 ‘to/identify/and’ (Implications) 59

60 TASK Concordance of ‘findings’ in Hotel and Tourism Management (119,327 words) Optometry (75,338 words) 60

61 Conclusions Disciplines: 39 Genre: Research articles Corpus: Corpus of Research Articles Sub-corpus: 16 sections of research articles Corpus analysis programs/tools Corpus analysis methods (frequencies of words, clusters, concgrams; concordance analysis) 61

62 Conclusions Combining genre analysis and corpus research in the study of the grammar and phraseologies of research articles in 39 disciplines enables us to explore the language patterns (grammar and phraseologies) in different sections of research articles 62

63 Implications for teaching and learning Data-driven learning in capstone projects for UG students Data-driven learning in research publication for PhD students Grammar/ part-of-speech (CLAWS7) Phraseologies 63

64 Implications for future research To compare patterns of grammar and phraseologies in the CRA across 39 disciplines 64

65 65

66 Corpus of Research Articles 66

67 Any questions? What are some possible uses for you? 67

68 References Phillips, M. (1983). Lexical macrostructure in science text. (Unpublished PhD thesis, Department of English, Faculty of Arts, University of Birmingham). Rayson, P. (2013). Wmatrix 3. Lancaster: UCREL. Scott, M. (2012). WordSmith Tools version 6, Liverpool: Lexical Analysis Software. Sinclair, J. McH. (1987). Collocation: A progress report. In R. Steele & T. Threadgold (eds.), Language topics: Essays in honour of Michael Halliday, pp. 319-331. Amsterdam: John Benjamins. 68


Download ppt "(Re)searching the Corpus of Research Articles for discipline- and genre-specific language 1 Winnie Cheng Department of English EDC Workshop 13 Oct 2014."

Similar presentations


Ads by Google