Presentation is loading. Please wait.

Presentation is loading. Please wait.

Corpus analysis (2) Corpus Linguistics Richard Xiao

Similar presentations

Presentation on theme: "Corpus analysis (2) Corpus Linguistics Richard Xiao"— Presentation transcript:

1 Corpus analysis (2) Corpus Linguistics Richard Xiao

2 Outline of the session Lecture –Keyword –Reference corpus –Key keyword Practical –WST keyword –AntConc keyword –Wmatrix keyword / key concept –Extra: keyword analysis with CQPweb 2

3 What is a keyword? Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus –Keywords usually refer to positive keywords –But negative keywords are equally interesting (see Xiao and McEnery 2005) They appear at the very end of your listing, in a different colour in WordSmith They are omitted automatically from a keywords database for key keyword analysis and a keyword plot 3

4 Why keyword analysis? Indicating the aboutness (Scott 1999) of a particular text or corpus –Contents analysis, discourse analysis Also revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005) –Genre analysis, stylistic analysis 4

5 How to do keyword analysis Make a wordlist of the target corpus Locate or make a word list of a reference corpus –Scott (2005) In search of a bad reference corpus –The reference corpus is usually larger than the target corpus –The appropriateness of a reference corpus depends on your research questions! Compare the frequency of each item in the two wordlists to extract keywords – done automatically Analyse and interpret keywords – you will do it! 5

6 Keywords in the party speeches Target corpus – just one text –David Cameron's speech at the Conservative conference (10 October 2012, Manchester) Local copy available (David_speech Unicode text) - download and unzip the file into a file folder: Reference corpus –The 100-million-word BNC: download and unzip (local copy available) Tool –WST Keyword 6

7 Wordlist of Davids speech 7

8 Creating keyword list 8

9 Keyword extraction in progress Warning: It can take time if you have loaded two large wordlists 9

10 Keywords in Davids speech 10 Negative keyword What do these keywords tell us?

11 Keyword: Plot view 11

12 What companies do keywords keep? 12

13 Why marriage? 13

14 Key clusters 14 Similar to word clusters, but only keywords are used.

15 Key keywords A key keyword is one which is "key" in more than one of a number of related texts –The more texts it is "key" in, the more "key key" it is –Can avoid extracting keywords which are unusually frequent in only a small number of files Can be created automatically and as simple to extract as you do for keywords n.b. Negative keywords are omitted automatically from a key keyword list 15

16 Making a batch wordlist 16 Specify a folder where you can write

17 Batch making keyword lists 17

18 Batch making keyword lists 18 Specify a folder where you can write

19 Making a KW database 19

20 Key keywords key coverage of the corpus An "associate" is a keyword that appears in the same text 20

21 Keyword in AntConc 21 target corpus reference corpus

22 Keyword in AntConc Key words in David's speech (in relation to Ed's speech) 22

23 Wmatrix: Keywords and key concepts POS and semantic tagging Keyword / key concept analysis in Camerons speech in comparison with Milibands speech Copy and paste the speeches into two separate text files – – labour-party-conference labour-party-conference Save the two texts as David_speech.txt and Ed_speech.txt 23

24 Wmatrix: Keywords and key concepts Login with your account using zhejiangxx account – 24

25 Tagging Wizard 25

26 Tagging in progress 26

27 Tagging result 27

28 Labour frequency list 28

29 KWIC concordance 29

30 My folders Upload and tag Eds speech …and click on My folders Warning: Your folder view may look different! 30

31 Open David_speech folder and select Ed_speech in Keyword compared to dropdown box 31

32 Keyword list to download! 32

33 Keyword cloud – even more interesting! 33

34 Davids key concepts (Key concepts compared to) 34

35 Keyword analysis in online corpora Using Lancasters CQPweb to compare British English (LOB+FLOB) and American English (Brown + Frown) Login CQPweb – Similar analysis can be done at BSFUs CQPweb corpus hub (different corpora) –http:// /cqp/http:// /cqp/ –Account: ID=pass= test 35

36 Creating subcorpora 36

37 Creating subcorpus BrE 37

38 Creating subcorpus AmE 38

39 Making wordlists 39

40 Wordlist available now 40

41 Computing keywords 41 You can make adjustments to the statistical measure, cut-off point, and minimum frequency according your research purposes.

42 Keywords in BrE and AmE 42

Download ppt "Corpus analysis (2) Corpus Linguistics Richard Xiao"

Similar presentations

Ads by Google