Using TEI mark-up and pragmatic classification in the construction and analysis of the British Telecom Correspondence Corpus. Ralph Morton, Coventry University.

Slides:



Advertisements
Similar presentations
Supplementary International Search (SIS) (PCT Rule 45bis)
Advertisements

Letter of Introduction & Personal Statement
Current design issues for digital archives Robert Munro (presented by David Nathan) Endangered Languages Archive (ELAR), School of Oriental and African.
1 Balloting/Handling Negative Votes September 11, 2006 ASTM Training Session Bob Morgan Brynn Iwanowski.
1 Balloting/Handling Negative Votes September 22 nd and 24 th, 2009 ASTM Virtual Training Session Christine DeJong Joe Koury.
Widening the Research Pipeline Update to NSF/CISE BPC Evaluation Workshop December 7, 2006.
1 Mid-Term Review of The Illinois Commitment Assessment of Achievements, Challenges, and Stakeholder Opinions Illinois Board of Higher Education April.
1 Cognitive sociolinguistics Richard Hudson Budapest March 2012.
Smart Qualitative Data: Methods and Community Tools for Data Mark-Up SQUAD Libby Bishop Online Qualitative Data Resources: Best Practice in Metadata Creation.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Diachronic study and language change Corpus Linguistics Richard Xiao
London Scrutiny Officers Network Meeting. Scrutiny Proposals in the Strengthening Local Democracy Consultation Helen Moores 16/09/2009.
Intellectual Property Rights: Policy, Procedure and Response.
Collection-level description & collection management: tool for the trade or information trade-off? Collection Description Focus Workshop 4 Newcastle, 8.
Collection-level description & the Information Landscape: users evaluate strategies for resource discovery Collection Description Focus Workshop 5 Cambridge,
Qualitative methods - conversation analysis
Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
1 Leading and Managing Teacher Development through In-house Training Melanie Cooke and Henry J Peterson Department of Education.
Course Objectives After completing this course, you should be able to:
Good News and Neutral News Messages
Human Capital Investment Programme Disability Activation Project (DACT) WELCOME Support Workshop Thursday 7 th February
Veterans Employment Toolkit Veterans in the Workplace Training Series This material was generated by Corporate Gray and The Burton Blatt Institute at Syracuse.
Transition IEP Using Your IEP to Plan for Your Life After High School
JISC Collections 04 September 2014 | Presentation to PRATT-SILS MA Summer School | Slide 1 JISC Collections.
Care and support planning Care Act Outline of content  Introduction Introduction  Production of the plan Production of the plan  Planning for.
Revision of WIPO Standard ST.14 Committee on WIPO Standards, third session Geneva 15 – 19 April 2013 Anna Graschenkova Standards Section.
WEB OF KNOWLEDGE 5.2
Abuse Prevention and Response Protocol.
The role of interpersonal language in CLIL Ana Llinares ConCLIL Project seminar Jyväskylä, 3rd February.
English Letters 2 Business Letters Business Letters.
History of English Language Assessment Archives in context and as context Database structure ISAAR (CPF) Online Archival Sustainability.
BUSINESS CORRESPONDENCE Based on A. Littlejohn: Company to Company, Fourth Edition, Cambridge University Press, 2005.
LECTURE 17 Business Letters
Research Methods for Business Students
Business Correspondence 2
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
Sending s and letters
1. Introduction Which rules to describe Form and Function Type versus Token 2 Discourse Grammar Appreciation.
Thank You Letters.
1 Course Review (U1-4)  Key concepts  Phrases and vocabulary  Guide for 1 st assignment (BE1)  Preparation for final exam - By Xiang,Shu.
SEN and Disability Green Paper Pathfinders March 2012 Update.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Applying for a job. Applying for a job – Structure of CV Personal details - Name, address, telephone, Education and qualifications – your degree.
1 6-8 Smarter Balanced Assessment Update English Language Arts February 2012.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
What’s the Point of a Cover Letter?  Who can tell me what a cover letter is?  How many of you enjoy writing cover letters?  How many of you struggle.
Business English Upper Intermediate U1S09 John Silberstein
Creating Pathways for Education, Career and Life Success Webinar: Developing a Pathways Plan January 18, 2013 Facilitated by Jeff Fantine, Consultant.
Paragraph one - information about publisher, title of survey and main conclusions.(who, what, when, how, so what…) Remaining Paragraphs - state the figures.
Smarter Balanced Assessment Update English Language Arts February 2012.
ETI 301 Business Correspondence-III Neslihan Kansu-Yetkiner.
Service users at the heart of service evaluation USER FOCUSED MONITORING.
Formal communication. to persuade to inform to request to express thanks to remind to recommend to apologize to congratulate to reject a proposal or offer.
Claire Keatinge Commissioner for Older People for Northern Ireland Commissioner for Older People for Northern Ireland Safeguarding and promoting the interests.
Business Communication 1. Guidelines 2 1. Use common courtesy in your request – ask rather than demand. 2. Include all the information the recipients.
JOB APPLICATION LETTER
Applying for a job.
HOW TO WRITE A PROFESSIONAL WITHIN A BUSINESS. The Business World.
Applying for a job. Applying for a job – Structure of CV Personal details - Name, address, telephone, Education and qualifications – your degree.
Order Unit 5. It is the consumer, and the consumer alone, who casts the vote that determines how big any company should be. ---Crawford H. Greenwalt A.
Language and Gender. Language and Gender is… Language and gender is an area of study within sociolinguistics, applied linguistics, and related fields.
Applying for a job.
CHAPTER 28 Communication. 2 R. Delaney Oral Communications Communication is the transfer of ideas, messages or facts between people. Oral Communications.
Purpose Format Vocabulary Required Sample. What is a cover letter? A cover letter is a letter of introduction attached to, or accompanying another document.
Purpose Format Vocabulary Required Sample. Definition 1. A cover letter is a letter of introduction attached to, or accompanying another document such.
Research Methods: Level 6 Final Year Project Toolkit.
functions and vocabulary
HU113: Technical Report Writing
North Carolina Council on Developmental Disabilities
Letters, Memos, and Correspondence.
Presentation transcript:

Using TEI mark-up and pragmatic classification in the construction and analysis of the British Telecom Correspondence Corpus. Ralph Morton, Coventry University

Outline Project background – British Telecom, New Connections, original aims, initial outcomes Working with Text Encoding Initiative (TEI) compliant XML in corpus mark-up Pragmatic Classification British Telecom Correspondence Corpus – uses and future research

BT (British Telecom) The main telephone network in the United Kingdom between 1912 and 1984 (as a government department then public corporation). Traces its history back to founding of Electric Telegraph Company in 1846 In 1984 British Telecom was privatised One condition of the privatisation was the preservation of their public records

Public archive Located in Holborn Established in 1986 ‘Preserves the records of BT and its predecessors and promotes access to the records and their content internally as a corporate resource, and externally to national and international communities’ BT Archives

New Connections Project JISC-funded collaboration between Coventry University, BT Heritage and The National Archives Project aim ‘to catalogue, digitise and develop a searchable online archive of almost half a million photographs, images, documents and correspondence assembled by BT over 165 years.’

Promoting easier access to the archives - not just two days a week, not limited to Holborn High Street. Engaging with material in new ways Three research projects attached to New Connections, one of which is the British Telecom Correspondence Corpus (BTCC)

British Telecom Correspondence Corpus Original aims - to identify and transcribe around 500 letters - collect contextual information for those letters and encode it using TEI compliant XML - Use corpus analyses to gain new insights into how English business correspondence changed from the mid-nineteenth to late- twentieth century

Corpus vs. Archive Leech (1991:11) - ‘ultimately, the difference between an archive and a corpus must be that the latter is designed or required for a particular 'representative' function’. Hunston (2002: 28) ‘‘being representative’ inevitably involves knowing what the character of the whole is’.

The character of the whole of the BT Archive is unknown at the item level In cataloguing, BT archivists ‘describe the context and function of the folder in relation to the history of BT and its predecessors and not the individual authors of letters’ (Sian Wynn-Jones, personal communication 05/03/2013). ‘archives are organised not classified’ (David Hay 2013, personal communication, 2013) We were provided with ‘sufficient’ Category C files to fulfil our initial request for 500 letters. Letters were extracted through manual examination of digitised folders

One of the advantages of letters is they are rich with contextual data Took an approach normally reserved for spoken language which ‘exists in unknowable quantities and in an unknowable range of varieties’ (Hunston, 2002: 29) Selected a number of factors to control and sampled accordingly: a purposive approach.

Balance between decades as far as possible, incl. every letter from underrepresented decades – to be 'internally contrastive’ (Sinclair 2005) Variety of authors- Including historically interesting letters BUT inclusion of day to day letters too so as to not only preserve histories of prominent individuals (Nurmi, 1999:54) (Prescott 2012). Inclusion of handwritten and typed letters Where available, inclusion of chains of correspondence (Dossena 2004) Sampling

Basic metadata extraction Date Author Recipient General topic of the letter Whether the letter was part of a chain Format (handwritten…etc) Time constraints meant that the initial sampling metadata was very basic. (see right)

- 612 letters (150 OCR, 462 manual transcription) authors - 132, 917 words Initial Findings

Begin to get an idea of the kind of document that the archive contains. Occupation Secretary most frequently listed profession (27) with a further 34 letters from variations on this role (assistant secretary, honourable secretary, under secretary, deputy secretary...). Need for caution when generalising about jobs, case of Secretary > ‘Director General’ > ‘Deputy Chairman of the Post Office Board’. Area for investigation? We see a similar variety in the role of the next most common occupation Director, where there are 17 letters from ‘Director’ and 13 letters from variations on this (e.g. deputy managing director). Detailed Metadata

Companies The correspondence originates from 137 separate companies. Aside from The Post Office/BT the letters predominantly come from communications companies and press organisations. There are also letters from government departments, law firms, charities, universities, district councils and miscellaneous one-off letters from organisations like the National Rifle Association and the Belgian Citizen Band Association

Gender 257 Male, 16 Female authors (106 where it’s not clear from letter or outside sources) Surprising? ‘[women] were eventually deemed capable of replacing men in labour intensive activities like sorting. But more intellectually demanding positions such as clerical posts in the Chief Engineer’s Department remained firmly closed to them’ (Duncan Campbell-Smith, 2012:246)

Working with TEI “The Text Encoding Initiative (TEI) is a consortium which collectively develops and maintains a standard for the representation of texts in digital form.” Text internal Header Information

Correspondence SIG TEI Proposal Focus on header information The sender The place of sending The date of sending The recipient The place of receiving The date of receiving Context?

Henry Schütz-Wilson Assistant Secretary The Electric & International Telegraph Company Telegraph Street London EC CorrespDesc in the BTCC

Fields recorded in our metadata Resource creation BT/Post Office file references – maintaining links Transcription information - project description Letter metadata - and - occupation - company, department - and - location info yyyy-mm-dd, n=“decade” - topic, function - format (handwritten, typed…etc)

Dear Sir, In reply to your favor of yesterday I hasten to forward you the full particulars connected with the matter between us and the International Commissioners and Copy of the Correspondence which took place on the subject. In haste, Yours faithfully L. Walter Courtenay Allows us to look at individual textual features of letter. E.G. use of,, use of letter titles and references and letter and s including salutations. Text Internal

Most frequent by decade

Formal links between openings and closings

“yours faithfully”, “obedient servant”

Marconi Uses combination of relatively familiar - Dear Mr Preece In combination with a variety of familiar and formal closing salutations ‘believe me dear sir yours very truly’ ‘with best regards for you and all your family I remain dear sir yours very truly’ ‘I remain dear sir yours very sincerely’ ‘I remain dear sir yours very truly and sincerely’

To be used as a starting point. There are many other factors to consider Context - relationship between sender and addressee - population of authors Content and function of letter need to be taken into account. Salutations may even provide clues to nature of correspondence, e.g. Nesfield’s “demi-official” correspondence (1917:191) One way into the analysis

Letter function One of the challenges in approaching the analysis of the BT corpus is in how to make meaningful comparisons across so many different years, authors and subject matters. To try and address this we categorised the letters pragmatic by function. Looking at how these functions are realised and how they have changed/remained stable over time

Definitions were generated through a close examination of the letters. Primary Functions – Advice, Suggestion, (Instruction?), Request, Application, Offer, Confirmation, Agreement, Acceptance, Rejection, Outlining, Detailing, Setting Out, Report, Notification, Expressive, Query, Clarification, Reiteration, Correction, Explanation, Complaints, Reminder, Thanking, Enclosing, Forwarding, Copying, Acknowledging, Arguing, Disputing, Arranging, Planning, Instructing, Personal Update, Proposal, Expenditure Review, Commissive, Promise Secondary Functions – Thanking, Apology, Acknowledgement, Expressing, Query, Request, Offer, Advice, Suggestion, Direction, Instruction, Recommendation, Discussion, Informing, Stating, Agreeing, Conceding, Noting Change, Restating, Explanation, Invitation, Report, Notification, Enclosure, Approval Long list was narrowed down to a more manageable list of 19 functions This list was tested at a workshop at Coventry University with six participants asked to identify the main function (+component functions) of a sample of letters

Frequent problems Lack of context What’s the “main” function? E.g. complaints Form/function conflict - Overlap – offer vs. application

Problem cases – author identified functions Dear Sir Robert: When you spoke to me on the telephone on July 18th, I had already received a telegram from Pattison summarizing his conversation with you a few days ago…. …particularly in the matter of possible reference to technical detail, and that the objectives to be stressed should relate to a greater degree to exploration of general policy considerations. However, I need not amplify on these matters in this letter since we are doing our best to get the formal reply to the British despatch out as soon as we can. I am sending this personal note to you, however, to express my appreciation for your own letter…..

Problem cases - multifunctionality. ‘Sir, I am directed to refer to your communication dated 16th May, 1917, No /17 relating to Mr.T.Gilbert, a Telegraph Operator employed at the Ware Post Office and to inform you that instructions have this day been given for this man to be posted to the Royal Engineers, Signal Service, Bletchley through Area Headquarters. I am also to ask you to forward to this Department for countersignature on behalf of the Director of Recruiting, your stock of enlistment Forms 27/Gen.No./6112 (D.R.l.c.) a copy of which was attached by you. It is hoped that this will obviate any further difficulty of the nature referred to in your communication. I am, Sir, Your obedient Servant, W. MacDonald’ The informative part of the letter does not contextualise the directive; the two sections address independent concerns. There are some examples of letters where the authors address multiple independent concerns which carry equal claim to being the primary function of the letter. E.g. 1917_05_19_WM_## reproduced in full below

Three more rounds of inter-rater testing, one with two participants, and two further rounds with three participants Improved pre-discussion agreement c. 80% two raters and 60% three raters -> 82% two raters and 70% three raters Clarifications

Final categories 1. Application, 2. Commissive, 3. Complaint, 4. Declination, 5. Directive, 6. Informative 7. Notification, 8. Offer, 9. Query, 10. Thanking

ApplicationCommissiveComplaintDeclinationDirectiveInformativeNotificationOfferQueryThanking 1850s s s s s s s s s s s s s s Total

Analysis Data Driven analysis of the – Using Frequency Lists, Keywords and Clusters as starting points - By decade (diachronic) - By function

Application york new my call times application salary experiments beg years grove opening hoping convenience request

Directive committee landlord if tenant bbc calls be ireland rayner television northern majesty should ita

Far from perfect inter-rater reliability Issues with multi-functional letters, and component functions Not much data so individual letters can skew results BUT Some promising preliminary results. Need to examine patterns in corpus as a whole and back any claims up with close reading Could be supplemented by qualitative approaches such as analysis of rhetorical moves (see Biber et al. 2007)

BTCC: Current use and future directions Who uses it? - (currently) me - Corpus will be available by request/through e.g. Oxford Text Archive - BT Archives have the data and are looking to incorporate it in their Digital Archive

Linguistic analysis - Somewhat exploratory. Cannot generalise but planning to expand corpus to examine findings further Historical study - transcriptions, metadata For What?

Ways in which historical research is being transformed by Digital Methods opportunity to read historical documents in new ways making publicly available but practically restricted material available providing historical archives with transcriptions and detailed item level metadata potentially bringing related but separate physical archives’ material together in the form of a digital resource (Post Office)

Thank you! References Biber, Douglas, Connor, Ulla, and Upton, Thomas A., eds. Discourse on the Move : Using Corpus Analysis to Describe Discourse Structure. Amsterdam, NLD: John Benjamins Publishing Company, Dossena, M. (2004) ‘Towards a corpus of nineteenth-century Scottish correspondence’ Linguistica e Filologia 18, Hunston, S. (2002) Corpora in Applied Linguistics Cambridge, Cambridge University Press Leech, G. (1991) ‘The state of the art in corpus linguistics’ In Aijmer, K. & Altenberg, B. (eds.) English Corpus Linguistics: Studies in honour of Jan Svartvik, Nesfield, J.C. (1917) Junior Course of English Composition, London, MacMillan and Co. Ltd Prescott, A (2012) ‘Making the Digital Human: Anxieties, Possiblities, Challenges’ delivered at Digital Humanities Summer School, Merton College Oxford [online] available from anxieties.html [20th March 2013] anxieties.html