Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010.

Slides:



Advertisements
Similar presentations
Preparation for IELTS:
Advertisements

Sociolinguistics 7 Acts of identity. The story so far We classify people in terms of general person-types –E.g. Man, Brit, Londoner, Educated We apply.
Academic Writing Writing an Abstract.
Cornell Notes.
“In light of this, it is suggested…”: Comparing n-grams in Chinese and British students’ undergraduate assignments from UK universities Maria LeedhamICAME.
Writing in Biology -. Writing scientific papers Understanding how to do science is a powerful insight Communicating science is critical to success and.
June 19, Proposal: An overall Plan Design to obtain answer to the research questions or problems Outline the various tasks you plan to undertake.
LCD /13/09 Assessment and Evaluation. Announcements Lesson plan / final paper –Due today Evaluations: –Online at
Writing a Research Paper
1 RUNNING a CLASS (2) Pertemuan Matakuliah: G0454/Class Management & Education Media Tahun: 2006.
Lecture 10 Comparison and Evaluation of Alternative System Designs.
Copyright by UNIT III DC Choppers 4/17/2017 Copyright by
Hyperlinks and Scholarly Communication Mike Thelwall Statistical Cybermetrics Research Group University of Wolverhampton, UK Virtual Methods Seminar, University.
ESSAY WRITING Can be fun.
Listening Task Purpose of the test:
Creating Research proposal. What is a Marketing or Business Research Proposal? “A plan that offers ideas for conducting research”. “A marketing research.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Simulation Walk Through Seeing how a simulation could work on your course.
Developing a self-directed online programme for an English for Academic Purposes course –exploring and accommodating learner characteristics Dr. Thang.
Mixed-level English classrooms What my paper is about: Basically my paper is about confirming with my research that the use of technology in the classroom.
17.5 Rule Learning Given the importance of rule-based systems and the human effort that is required to elicit good rules from experts, it is natural to.
Company structure, Product features. Look at your group’s résumés Take turns. Using the résumé as a guide, give a short outline of your professional life.
Module 1 Unit 2 Project: writing an advice letter --By Zhou Zhenghu No
BETA, , V.Turnovo 1 Is it difficult to make a listening test? Svetla Tashevska New Bulgarian University, Sofia.
The Skills series. A new six-level skills-centered English course for young adults and adults based on the highly successful, award-winning Skills in.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Synthesising Identify supporting ideas and contradictory ideas. Check the grouping of ideas? Synthesis is how you integrate and combine materials gathered.
ACADEMIC DISCOURSE B. Mitsikopoulou INTERPRETATION OF DATA: Analysing different types of graphs: Bar Graphs and Histograms Line Graphs Pie Charts Tables.
Academic Communication Lesson 4 Please pick up two handouts from the front desk. You will also need the previous handouts “Typical Organization of …”
Unit 1 Travel Broadens the Mind.  Objectives Objectives  Focus Focus  Warm-up (background) Warm-up (background) Warm-up (background)  19.1 Saying.
EE LECTURE 4 REPORT STRUCTURE AND COMPONENTS Electrical Engineering Dept King Saud University.
What is a reflection? serious thought or consideration the fixing of the mind on some subject;
Introduction Algorithms and Conventions The design and analysis of algorithms is the core subject matter of Computer Science. Given a problem, we want.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
Mining for What’s Missing: How to Find What’s Not in the Speech Application’s Vocabulary AMY NEUSTEIN, Ph.D. LINGUISTIC TECNOLOGY SYSTEMS
Advanced Higher Physics Investigation Report. Hello, and welcome to Advanced Higher Physics Investigation Presentation.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
General EAP writing instruction and transfer of learning Mark Andrew James Arizona State University
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
Developing online activities for postgraduate students in computing Centre for Open Learning of Mathematics, Science, Computing and Technology (COLMSCT)
Unit 1 Introduction to TEYL ELC 688.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
1 And yeah, it was really good! Positive stance in native and learner speech Sylive De Cock Centre for English Corpus Linguistics Université catholique.
AD- LCE001 General English Making Presentation TSIM Kam Wan.
RESEARCH DESIGN & CORPUS COMPILATION. Corpus design is intrinsic and a fundamental part of the analysis. It is guided by the RQ and affects the results.
Department of Industrial Engineering Sharif University of Technology Session# 9.
STANAG OPI Testing Julie J. Dubeau Bucharest BILC 2008.
Introducing Regulatory Impact Analysis into the Turkish Legal Framework Improving Transparency, Consultation and Communication of RIAs March 2009.
Preparing a Written Report Prepared by: R Bortolussi MD FRCPC and Noni MacDonald MD FRCPC.
Science Fair.
In Chapters 6 and 8, we will see how to use the integral to solve problems concerning:  Volumes  Lengths of curves  Population predictions  Cardiac.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
Appendix © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or.
TYPE OF READINGS.
What is a Corpus? What is not a corpus?  the Web  collection of citations  a text Definition of a corpus “A corpus is a collection of pieces of language.
Assistant Instructor Nian K. Ghafoor Feb Definition of Proposal Proposal is a plan for master’s thesis or doctoral dissertation which provides the.
An insight into the NHS Health Check Programme in Birmingham NHS Health Check National Learning Network 14 th Workshop - London 17 th July 2012.
ICAD3218A Create User Documentation.  Before starting to create any user documentation ask ‘What is the documentation going to be used for?’.  When.
Analysis. This involves investigating what is required from the new system and what facilities are available. It would probably include:
5-2-3 Analogue to Digital Converters (ADC). Analogue to Digital Conversion The process is now the opposite of that studied in Topic Now we wish.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Author: Zhenhui Rao Student: 范明麗 Olivia I D:
UEF // University of Eastern Finland How to publish scientific journal articles? 10 STEPS TO SUCCESS lllllllllllllllllllllllllllllllll lllllllllllllllllllllllllllllllll.
Abstract  An abstract is a concise summary of a larger project (a thesis, research report, performance, service project, etc.) that concisely describes.
International English Language Testing System (IELTS) Listening WritingSpeaking.
Learning Usage of English KWICly with WebLEAP/DSR
Comparison of different gearshift prescriptions
THE GENERAL JONAS ŽEMAITIS MILITARY ACADEMY OF LITHUANIA
Applied Linguistics Chapter Four: Corpus Linguistics
Presentation transcript:

Combining intuition with corpus linguistic analysis: A study of lexical chunks in four Chinese undergraduate students’ writing Maria LeedhamFLaRN 2010

BACKGROUND TO STUDY 2 FLaRN 2010 Maria Leedham

Chunking through intuition: Study 1 RQ: To what extent can NSs and NNSs chunk NNS speech? Data: transcripts of 2 intermediate-level Japanese students’ speech students were recorded 3 times with a 2-month gap between each total of approx.1500 words across the 6 transcripts Method Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts (training, examples and practice given first) Step 2: Japanese students asked to identify chunks in their own transcripts Step 3: author chunks transcripts with assistance from WordSmith Tools (Leedham, 2006) FLaRN 2010 Maria Leedham

Example of chunked transcript from Study 1 Key: italics - words classified by the NNS as a chunk. underline – words 2 or 3 out of the 3 NSs classified as a chunk 1ahh…first err I, I learned, learnt? (mmhmm I learnt) err (2.0) I should err.. I 2should be more positive? (right) positive… in UK because ahh…when, when I 3went to London err… last Sunday (mhmm) ahh (2.0) some, some of the 4underground line (mm) line was no service (oh dear) ((speaker laughs)) I was 5really surprised and, because it can, cannot be (mm) in Japan (mm) you know, 6sun- in, in Sunday, on? (mm) on Sunday many, many people (mm) come to 7London (mm) and go around some place (mm)... so everyone need to, need a 8train (mm) so, but maybe four or five lines… was not, no service (mm) so… 9I… I have to think err what I should do ((speaker laughs)) and no, I’ve never, I 10have never been to London that, so, this was the first time I’ve been to London 11(mm) so… FLaRN 2010 Maria Leedham

Findings from Study 1 Findings: little inter or intra-rater reliabilitiy many ‘missing’ chunks (eg ‘of course’, ‘you know’) both across and within raters frustrating and time-consuming task for NSs BUT… the Japanese ss could do this task AND also offered insights into when/why… (eg student M: “I used to say that but now I know it’s not usual”.) the more time spent looking for chunks, the more will be found Coda a further recording, transcribing & awareness-raising cycle suggests that this resulted in uptake both students found it highly motivating to record and analyse transcripts of their talk FLaRN 2010 Maria Leedham

Chunking through intuition: Study 1 Method Step 1: 3 NS linguists asked to underline chunks in the 6 transcripts (training, examples and practice given first) Step 2: Japanese students asked to identify chunks in their own transcripts Step 3: author chunks transcripts with assistance from WordSmith Tools v.5 FLaRN 2010 Maria Leedham

STUDY 2: FLaRN 2010 Maria Leedham

Outline 1.Research questions 2.The students and the texts 3.The two methods 4.Findings 4.1. Method Method 2 5.Conclusions and Implications FLaRN 2010 Maria Leedham

Research Questions 1.What can a study of lexical chunks reveal about these Chinese students’ writing? 2.What does each method contribute? FLaRN 2010 Maria Leedham

The Students Wei Male BSc Engineering Feng Female BSc Food Science with Business Ping Female BA Hospitality, Leisure & Tourism Management (HLTM) Hong Male BA HLTM FLaRN 2010 Maria Leedham Criteria - L1 Chinese (Mandarin or Cantonese) - All secondary education in home country - Contributions from years 1 & 2 and year 3 of undergraduate study

The texts FLaRN 2010 Maria Leedham Reference corpora

Combining intuition and corpus searches FLaRN 2010 Maria Leedham Method 1: Manual analysis Read all 4 Chinese students’ texts Read twice, with 6 months between Read equivalent, randomly- selected English students’ texts Noted ‘salient’ features, then searched corpora of the individual’s texts, the discipline, all Chinese students’ writing, all English students’ writing. Method 2: Key n-gram searches Used WordSmith Tools, v.5 (Scott, 2008) Searched for key n-grams in the corpus of texts from each student, using relevant discipline corpus from L1 English as reference Setting p= , deleted short n- grams within longer n-grams Compiled key n-gram lists Looked at concordance lines and texts for more context

Formulaic sequences in sample of Wei’s writing (Engineering) Introduction A design methodology for a gearbox is presented in this report. The input horse power, the input speed and net reductions in the gearbox are the parameters to be specified. A gearbox takes an input shaft rotating and converts it via a gear train into up to three outputs, the process of designing a gearbox is to figure out which ratios are needed and to implement those ratios in the form of positioning various sizes of connected gears. The specification of the gearbox depends on its area of application. In this report, a gearbox is designed for a commercial meat slicer which has its final shaft rotating at between 80 and 100 rev/min. The input of the meat slicer is a constant speed AC motor running at 1800 rev/min and delivering 1.2 kW. A few points have to be considered on this system, the size of the gearbox is severe restricted, since it has to go onto a work surface where there is severe competition for space. And the motor may be in-line or at right angles to the grinder. Furthermore, the duty is expected to be up to 6 hours per day. FLaRN 2010 Maria Leedham

Outline 1.Research questions 2.The students and the texts 3.The two methods 4.Findings 4.1. Method Method 2 5.Conclusions and Implications FLaRN 2010 Maria Leedham

Idiosyncratic language In one word computer based tools contribute an… In one word the overall system can be described…(Wei, years 2 & 3) In light of this, it is suggested that buying IHG… In light of this, it can be suggested that… In light of this, it is recommended that buying IHG… (Ping, year 3, in 1 text) … but simply writing a responsible tourism policy is no longer enough. It is a must to show practical action,… (Hong, Year 1) a winning city, the authorities of Liverpool have to rebuild its image to get rid of the negative picture. (Hong, Year 2) …and boost its marketing campaigns in order to catch the world’s eyes on Scotland. (Hong, year 3) FLaRN 2010 Maria Leedham

Vague language In catering services, restaurants in Oxford and Bath are more or less the same. (Hong, Year 1) From those tables, the same thing as section 3.1 could be found … (Wei, Year 1). …a measurement system for measuring low-lever force, a kind of cantilever rig which is called… A kind of variable inductance sensor has been chosen… …Furthermore, with processing data, a kind of filter is always needed to separate certain… (Wei, year 2, same assignment) At that time, I found that this hotel is a little bit out of my expectation. (Hong, Year 2) FLaRN 2010 Maria Leedham

Vague language FLaRN 2010 Maria Leedham L1 English students use: ‘a bit of a ‘ + N eg ‘a bit of a problem’, ‘a bit of a shock’, ‘a bit of a dog’s breakfast’ Often this is from reflective writing ‘ The conclusion was also a bit of a victim in my editings, bringing it down to one small sentence for each of the areas of discussion’. (6101c Cybernetics Year 3 essay)

Chunks with – and without – ‘I’ & ‘we’ From the experiment, it was known that the mechanical properties of carbon steel AN and carbon steel N…. It was found out the mechanical properties of carbon steel AN was incorrect in this experiment,… (Wei, Year 1) Meanwhile, if we clipped the current probe round one of the motor supply leads, and connected it to Ch1 of the oscilloscope, we could get two copies of the transient starting current of the motor from the oscilloscope. From these two copies, we could calculated… FLaRN 2010 Maria Leedham

Chunks with – and without – ‘I’ & ‘we’ L1English students FLaRN 2010 Maria Leedham

Linkers This can create a positive image for Scotland, on the other hand, (Ping Year 3) …In other words, people are buying expectations... (Hong, year 3) As a consequence, it can attract many travelers… (Hong, Year 2) On the contrary, the predominance of SMEs... (Ping, Year 2) First of all, the dimension of the brake disc is decided. (Wei, Year 3) What is more, Bath is served by a large number of local bus services… (Hong, Year 1) References to data ‘as shown in table’ (Wei x 2, Ping x 2) ‘according to’ (Wei x 4) ‘as illustrated in table + NUMBER’ (Ping x 2) FLaRN 2010 Maria Leedham

Summary of method 1 findings Salient chunks in the Chinese students’ writing were: Idiosyncratic chunks (‘in light of the’) Vague language (‘a bit of’) – though note English students’ use of ‘a little bit of’ High use of chunks with ‘we’ and low use of chunks with ‘I’ – partly due to English students’ reflective writing Use of favoured linkers (‘on the other hand’) Reference to data in tables and figures (‘according to the equation’) BUT… very difficult to intuit chunks in unfamiliar disciplines FLaRN 2010 Maria Leedham

Outline 1.Research questions 2.The students and the texts 3.The two methods 4.Findings 4.1. Method Method 2 5.Conclusions and Implications FLaRN 2010 Maria Leedham

Method 2: Key n-gram searches Used WordSmith Tools, version 5 (Scott, 2008) Searched for key n-grams (= ‘key clusters’) in the corpus of texts from each of the 4 students Relevant discipline corpus from L1 English used as reference corpus P= , deleted short n-grams within longer n-grams Compiled a key n-gram list for each student Grouped these key n-grams into themes Looked at concordance lines for more context FLaRN 2010 Maria Leedham

N-grams

Idiosyncratic language FLaRN 2010 Maria Leedham Ping's year 2 proposal ‘aim of the’ ‘of the assignment is to design’ ‘to develop an understanding of’ (Wei)

FLaRN 2010 Maria Leedham Discipline-specific n-grams “Marriott Liverpool city centre”, “the Liverpool tourism industry”, ‘the tourism industry’ (Hong) ‘the hospitality industry’, ‘recruitment and selection’, ‘in the hospitality industry’ (Ping) Passive voice ‘be worked out’, ‘can be calculated’ (Wei) ‘there will be’, ‘it is believed that’ (Ping) References to data ‘with reference to appendix’, ‘please see appendix’ (Ping) ‘in the appendix’, ‘briefing sheet in appendix’, ‘is shown as’, ‘tables of data’, ‘were recorded as below’ ‘was calculated with eq.’ (Wei)

Favoured linkers decrease over time FLaRN 2010 Maria Leedham

Summary of method 2 findings Many of the same findings from method 1 – idiosyncratic chunks – some linkers –esp. ‘on the other hand’ – low use of chunks with ‘I’ – references to data Also…. discipline-specific chunks Easy to compare one student’s texts with the discipline reference corpus & each L1 reference corpus Similar findings occur within the Chinese students overall NB Keyness measures difference FLaRN 2010 Maria Leedham

Outline 1.Research questions 2.The students and the texts 3.The two methods 4.Findings 4.1. Method Method 2 5.Conclusions and Implications FLaRN 2010 Maria Leedham

30 of 10 Intuitive reading Key n-grams analysis Finds frequent chunks (n-grams) Plus Large quantities of data can be analysed quickly Accurate Easily replicable Minus Single chunks are missed Arbitrary parameters Conflation of writing from lots of individuals Sense of text as complete document is lost Finds semantically whole units (formulaic sequences) Plus A person can recognise single instances that a computer would miss The text is read as a complete document - as intended by the writer Minus Time-consuming and tiring Problem of inter-rater reliability Problem of intra-rater consistency Hard to replicate

Combining methods… Combine the two methods through a recursive process of reading texts and checking the sequences in a corpus, also searching for key n-grams for less intuitive sequences. “ultimately, the most revealing insights… will be gained from a closer look at the texts, the speakers, and the situational variables; quantitative analysis alone can never provide a satisfactory picture” (Simpson, 2004:41). FLaRN 2010 Maria Leedham

References Foster, P. (2001). "Rules and routines: A consideration of their role in the task-based langage production of native and non-native speakers", in M. Bygate, P. Skehan, and M. Swain, (eds.), Task-Based Learning: Language Teaching, Learning and Assessment. Longman: London. Heuboeck, A., Holmes, J. & Nesi, H The Bawe Corpus Manual. Retrieved from Leedham, “Do I speak better? – A longitudinal study of lexical chunking in the spoken language of two Japanese students”. In The East Asian Learner. Scott, M WordSmith Tools v.5. Oxford University Press. Wray, A. (2002). Formulaic Language and the Lexicon. Cambridge University Press. BAWE corpus- ESRC project number: RES FLaRN 2010 Maria Leedham