Presentation on theme: "Corpus Linguistics Richard Xiao"— Presentation transcript:
1 Corpus Linguistics Richard Xiao firstname.lastname@example.org Corpus analysis (2)Corpus LinguisticsRichard Xiao
2 Outline of the session Lecture Practical Keyword Reference corpus Key keywordPracticalWST keywordAntConc keywordWmatrix keyword / key conceptExtra: keyword analysis with CQPweb
3 What is a keyword?Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpusKeywords usually refer to positive keywordsBut negative keywords are equally interesting (see Xiao and McEnery 2005)They appear at the very end of your listing, in a different colour in WordSmithThey are omitted automatically from a keywords database for key keyword analysis and a keyword plot
4 Why keyword analysis?Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpusContents analysis, discourse analysisAlso revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005)Genre analysis, stylistic analysis
5 How to do keyword analysis Make a wordlist of the target corpusLocate or make a word list of a reference corpusScott (2005) “In search of a bad reference corpus”The reference corpus is usually larger than the target corpusThe appropriateness of a reference corpus depends on your research questions!Compare the frequency of each item in the two wordlists to extract keywords – done automaticallyAnalyse and interpret keywords – you will do it!
6 Keywords in the party speeches Target corpus – just one textDavid Cameron's speech at the Conservative conference (10 October 2012, Manchester)Local copy available (David_speech Unicode text) - download and unzip the file into a file folder:Reference corpusThe 100-million-word BNC: download and unzip (local copy available)ToolWST Keyword
14 Key clustersSimilar to word clusters,but only keywords are used.
15 Key keywordsA key keyword is one which is "key" in more than one of a number of related textsThe more texts it is "key" in, the more "key key" it isCan avoid extracting keywords which are unusually frequent in only a small number of filesCan be created automatically and as simple to extract as you do for keywordsn.b. Negative keywords are omitted automatically from a key keyword list
16 Making a batch wordlist Specify a folder where you can write
20 Key keywords key coverage of the corpus An "associate" is a keyword that appears in the same text
21 Keyword in AntConctarget corpusreference corpus
22 Keyword in AntConcKey words in David's speech (in relation to Ed's speech)
23 Wmatrix: Keywords and key concepts POS and semantic taggingKeyword / key concept analysis in Cameron’s speech in comparison with Miliband’s speechCopy and paste the speeches into two separate text filesSave the two texts as David_speech.txt and Ed_speech.txt
24 Wmatrix: Keywords and key concepts Login with your account using zhejiangxx account
35 Keyword analysis in online corpora Using Lancaster’s CQPweb to compare British English (LOB+FLOB) and American English (Brown + Frown)Login CQPwebSimilar analysis can be done at BSFU’s CQPweb corpus hub (different corpora)Account: ID=pass=test