Corpus Linguistics Richard Xiao

Slides:



Advertisements
Similar presentations
Numbers Treasure Hunt Following each question, click on the answer. If correct, the next page will load with a graphic first – these can be used to check.
Advertisements

1/2/2014 (c) 2000, Ron S. Kenett, Ph.D.1 Statistics for Molecular Biology and Bioinformatics Instructor: Ron S. Kenett
AP STUDY SESSION 2.
1
Feichter_DPG-SYKL03_Bild-01. Feichter_DPG-SYKL03_Bild-02.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Myra Shields Training Manager Introduction to OvidSP.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
We need a common denominator to add these fractions.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt BlendsDigraphsShort.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Rhesy S.ppt proRheo GmbH
1 Click here to End Presentation Software: Installation and Updates Internet Download CD release NACIS Updates.
Photo Slideshow Instructions (delete before presenting or this page will show when slideshow loops) 1.Set PowerPoint to work in Outline. View/Normal click.
1. 2 Objectives Become familiar with the purpose and features of Epsilen Learn to navigate the Epsilen environment Develop a professional ePortfolio on.
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
This module: Telling the time
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
PP Test Review Sections 6-1 to 6-6
EU market situation for eggs and poultry Management Committee 20 October 2011.
Bright Futures Guidelines Priorities and Screening Tables
EIS Bridge Tool and Staging Tables September 1, 2009 Instructor: Way Poteat Slide: 1.
Health Artifact and Image Management Solution (HAIMS)
Bellwork Do the following problem on a ½ sheet of paper and turn in.
XML and Databases Exercise Session 3 (courtesy of Ghislain Fourny/ETH)
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Sample Service Screenshots Enterprise Cloud Service 11.3.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
1 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt 10 pt 15 pt 20 pt 25 pt 5 pt Synthetic.
1 How Do I Order From.decimal? Rev 05/04/09 This instructional training document may be updated at anytime. Please visit and check the.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Subtraction: Adding UP
: 3 00.
5 minutes.
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Converting a Fraction to %
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
CINAHL Keyword Searching. This presentation will take you through the procedure of finding reliable information which can be used in your academic work.
Immunobiology: The Immune System in Health & Disease Sixth Edition
Physics for Scientists & Engineers, 3rd Edition
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Murach’s OS/390 and z/OS JCLChapter 16, Slide 1 © 2002, Mike Murach & Associates, Inc.
Import DPP (Deprivation Pupil Premium) CSV File. The DfE has made two files available to schools. One file for DPP (Deprivation Pupil Premium) and one.
RefWorks: The Basics October 12, What is RefWorks? A personal bibliographic software manager –Manages citations –Creates bibliogaphies Accessible.
Profile. 1.Open an Internet web browser and type into the web browser address bar. 2.You will see a web page similar to the one on.
South Dakota Library Network MetaLib User Interface South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD © South Dakota.
Presentation transcript:

Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com Corpus analysis (2) Corpus Linguistics Richard Xiao lancsxiaoz@googlemail.com

Outline of the session Lecture Practical Keyword Reference corpus Key keyword Practical WST keyword AntConc keyword Wmatrix keyword / key concept Extra: keyword analysis with CQPweb

What is a keyword? Keywords are those words whose frequency is exceptionally high (positive keywords) or low (negative keywords) in comparison with a reference corpus Keywords usually refer to positive keywords But negative keywords are equally interesting (see Xiao and McEnery 2005) They appear at the very end of your listing, in a different colour in WordSmith They are omitted automatically from a keywords database for key keyword analysis and a keyword plot

Why keyword analysis? Indicating the ‘aboutness’ (Scott 1999) of a particular text or corpus Contents analysis, discourse analysis Also revealing the salient features which are functionally related to a particular genre (Xiao and McEnery 2005) Genre analysis, stylistic analysis

How to do keyword analysis Make a wordlist of the target corpus Locate or make a word list of a reference corpus Scott (2005) “In search of a bad reference corpus” http://www.methodsnetwork.ac.uk/redist/pdf/es1_05scott.pdf The reference corpus is usually larger than the target corpus The appropriateness of a reference corpus depends on your research questions! Compare the frequency of each item in the two wordlists to extract keywords – done automatically Analyse and interpret keywords – you will do it!

Keywords in the party speeches Target corpus – just one text David Cameron's speech at the Conservative conference (10 October 2012, Manchester) http://www.bbc.co.uk/news/uk-politics-15189614 Local copy available (David_speech Unicode text) - download and unzip the file into a file folder: www.fass.lancs.ac.uk/projects/corpus/data/workshop3texts.zip Reference corpus The 100-million-word BNC: download and unzip (local copy available) www.lexically.net/downloads/version4/BNC_World.zip Tool WST Keyword

Wordlist of David’s speech

Creating keyword list

Keyword extraction in progress Warning: It can take time if you have loaded two large wordlists

Keywords in David’s speech What do these keywords tell us? Negative keyword

Keyword: Plot view

What companies do keywords keep?

Why “marriage”?

Key clusters Similar to word clusters, but only keywords are used.

Key keywords A key keyword is one which is "key" in more than one of a number of related texts The more texts it is "key" in, the more "key key" it is Can avoid extracting keywords which are unusually frequent in only a small number of files Can be created automatically and as simple to extract as you do for keywords n.b. Negative keywords are omitted automatically from a key keyword list

Making a batch wordlist Specify a folder where you can write

Batch making keyword lists

Batch making keyword lists Specify a folder where you can write

Making a KW database

Key keywords key coverage of the corpus An "associate" is a keyword that appears in the same text

Keyword in AntConc target corpus reference corpus

Keyword in AntConc Key words in David's speech (in relation to Ed's speech)

Wmatrix: Keywords and key concepts POS and semantic tagging Keyword / key concept analysis in Cameron’s speech in comparison with Miliband’s speech Copy and paste the speeches into two separate text files http://www.bbc.co.uk/news/uk-politics-15189614 http://www.labour.org.uk/ed-milibands-speech-to-labour-party-conference Save the two texts as David_speech.txt and Ed_speech.txt www.fass.lancs.ac.uk/projects/corpus/data/workshop3texts.zip

Wmatrix: Keywords and key concepts Login with your account using zhejiangxx account http://ucrel.lancs.ac.uk/wmatrix3.html

Tagging Wizard

Tagging in progress

Tagging result

Labour frequency list

KWIC concordance

“My folders” Upload and tag Ed’s speech …and click on “My folders” Warning: Your folder view may look different!

Open David_speech folder and select Ed_speech in “Keyword compared to” dropdown box

Keyword list to download!

Keyword cloud – even more interesting!

David’s key concepts (“Key concepts compared to”)

Keyword analysis in online corpora Using Lancaster’s CQPweb to compare British English (LOB+FLOB) and American English (Brown + Frown) Login CQPweb http://cqpweb.lancs.ac.uk Similar analysis can be done at BSFU’s CQPweb corpus hub (different corpora) http://124.193.83.252/cqp/ Account: ID=pass=test

Creating subcorpora

Creating subcorpus BrE

Creating subcorpus AmE

Making wordlists

Wordlist available now

Computing keywords You can make adjustments to the statistical measure, cut-off point, and minimum frequency according your research purposes.

Keywords in BrE and AmE