Presentation is loading. Please wait.

Presentation is loading. Please wait.

6th April 2003PALC 2003 Building a corpus to investigate the presentation of speech, thought and writing in Spoken British English Dan McIntyre, John.

Similar presentations


Presentation on theme: "6th April 2003PALC 2003 Building a corpus to investigate the presentation of speech, thought and writing in Spoken British English Dan McIntyre, John."— Presentation transcript:

1

2 6th April 2003PALC 2003 Building a corpus to investigate the presentation of speech, thought and writing in Spoken British English Dan McIntyre, John Heywood, Tony McEnery, Elena Semino and Mick Short Department of Linguistics and Modern English Language Lancaster University, UK

3 Aims of the project To investigate the forms and functions of speech, thought and writing presentation in spoken data. To investigate the forms and functions of speech, thought and writing presentation in spoken data. To compare the presentation of ST&WP in a corpus of spoken data with the findings from an equivalent corpus of written texts. To compare the presentation of ST&WP in a corpus of spoken data with the findings from an equivalent corpus of written texts. To further test the model of speech and thought presentation outlined in Leech and Short (1981). To further test the model of speech and thought presentation outlined in Leech and Short (1981).

4 What is speech, thought and writing presentation? Prototypically, the presentation in a posterior discourse of what was said, thought or written in a (supposed) anterior discourse. Prototypically, the presentation in a posterior discourse of what was said, thought or written in a (supposed) anterior discourse. Direct speech [DS] Shut up, you silly old fool, [RS] she said. Indirect speech [RS] She told him [IS] that he should shut up. Representation of a speech act [RSA] She commanded him. Speakers words Reporters words

5 Selecting the corpus data 120 transcripts - approximately 260,000 words. 120 transcripts - approximately 260,000 words. Texts taken from the British National Corpus (BNC) and Centre for North West Regional Studies (CNWRS) oral history archives at Lancaster University. Texts taken from the British National Corpus (BNC) and Centre for North West Regional Studies (CNWRS) oral history archives at Lancaster University. CNWRS interview tapes digitised to be time-aligned with text by Softsound Ltd, Cambridge, UK. CNWRS interview tapes digitised to be time-aligned with text by Softsound Ltd, Cambridge, UK. BNC sound files identified where possible. BNC sound files identified where possible.

6 The ST&WP categories SpeechThoughtWriting FDSFDTFDW RS/DSRT/DTRW/DW FISFITFIW RS/ISRT/ITRW/IW RSARTARWA RVRIRN RU A Main categories Shut up you silly old fool! Shut up you silly old fool!, she said. He should shut up, the silly old fool! She said that he should shut up. She commanded him. She shouted at him. We used to call them idiots in those days. She looked at him.

7 ST&WP category features Category features P toPic # problematic tag 1+repetitions e embedded g grammatical negative a marked absence of ST&WP h hypothetical i inferred q quotation r iterative v interrogative p imperative u unfinished 1+ level of embedding

8 ST&WP category features Category features g grammatical negative h hypothetical p imperative She said he wasnt a silly old fool. If you behave foolishly, people will say that youre a silly old fool. Stop being a silly old fool! she said.

9 Annotating the corpus for ST&WP We use the element and mark the ST&WP category within the attribute cat. We use the element and mark the ST&WP category within the attribute cat. Tags designed for concordancing using Wordsmith Tools. Tags designed for concordancing using Wordsmith Tools. 15 fields to mark ST&WP categories. 15 fields to mark ST&WP categories. x used as a placeholder for empty positions. x used as a placeholder for empty positions. element attribute attribute value fields 1 - 15

10 Annotating the corpus for ST&WP We use the element and mark the ST&WP category within the attribute cat. We use the element and mark the ST&WP category within the attribute cat. Tags designed for concordancing using Wordsmith Tools. Tags designed for concordancing using Wordsmith Tools. 15 fields to mark ST&WP categories. 15 fields to mark ST&WP categories. x used as a placeholder for empty positions. x used as a placeholder for empty positions. element attribute attribute value fields 1 - 15

11 Annotating the corpus for ST&WP We use the element and mark the ST&WP category within the attribute cat. We use the element and mark the ST&WP category within the attribute cat. Tags designed for concordancing using Wordsmith Tools. Tags designed for concordancing using Wordsmith Tools. 15 fields to mark ST&WP categories. 15 fields to mark ST&WP categories. x used as a placeholder for empty positions. x used as a placeholder for empty positions. element attribute attribute value fields 1 - 15

12 Annotating the corpus for ST&WP We use the element and mark the ST&WP category within the attribute cat. We use the element and mark the ST&WP category within the attribute cat. Tags designed for concordancing using Wordsmith Tools. Tags designed for concordancing using Wordsmith Tools. 15 fields to mark ST&WP categories. 15 fields to mark ST&WP categories. x used as a placeholder for empty positions. x used as a placeholder for empty positions. element attribute attribute value fields 1 - 15

13 Annotating the corpus for ST&WP We use the element and mark the ST&WP category within the attribute cat. We use the element and mark the ST&WP category within the attribute cat. Tags designed for concordancing using Wordsmith Tools. Tags designed for concordancing using Wordsmith Tools. 15 fields to mark ST&WP categories. 15 fields to mark ST&WP categories. x used as a placeholder for empty positions. x used as a placeholder for empty positions. =

14 A sample extract from a marked- up file Then they went to Hereford and there were Quakers there and he had a hard time of it they didn't like Catholics and I can remember they sent me I was a manageress in the laundry here and they sent me to Kendal when we opened a laundry at Kendal and I was staying at a lodging in Kendal and the man was th they were Quakers and I said to the young lady, I said Would you mind if you made my dinner on Friday it doesn't matter if it's only bread and butter, but no meat, because we don't eat meat on a Friday or no bacon just bread anything plain it doesn't matter what it is but no meat and the old man says I'm sorry for thee and I thought oh but he was a Quaker. Anyway she says shut up you silly old fool

15 Preliminary results: comparative numbers and percentages of speech tags in the Spoken and Written Corpora, in relation to total number of discourse presentation tags Written Corpus Spoken Corpus FDSDSTotal 927 (10.79%) 2047 (23.83%) 2974 (34.62%) 199 (2.26%) 1975 (22.48%) 2174 (24.75%) FIS157(1.82%)83(0.94%) IS1114(12.97%)590(6.71%) (N)RSA1398(16.27%)1349(15.35%) N/RV391(4.55%)858(9.76%) Spoken Corpus Total tags = 34,927 A = 21,467 RU = 255 RS = 2,774 Ambiguities = 1,149 ST&WP tags = 8,783 Written Corpus Total tags = 16,533 N = 3,601 Ambiguities = 885 ST&WP tags = 8,588

16 Preliminary results: comparative numbers and percentages of thought tags in the Spoken and Written Corpora, in relation to total number of discourse presentation tags Written Corpus Spoken Corpus FDTDTTotal 69 (0.80%) 38 (0.44%) 107 (1.24%) 8 (0.09%) 162 (1.84%) 170 (1.93%) FIT275(3.20%)11(0.12%) IT201(2.34%)855(9.73%) (N)RTA114(1.32%)363(4.13%) N/RI1355(15.77%)1530(17.42%) Spoken Corpus Total tags = 34,927 A = 21,467 RU = 255 RT = 1,109 Ambiguities = 488 ST&WP tags = 8,783 Written Corpus Total tags = 16,533 N = 3,601 Ambiguities = 885 ST&WP tags = 8,588

17 Preliminary results: comparative numbers and percentages of writing tags in the Spoken and Written Corpora, in relation to total number of discourse presentation tags Written Corpus Spoken Corpus FDWDWTotal 32 (0.37%) 109 (1.26%) 141 (1.64%) 140 (1.59%) 71 (0.80%) 211 (2.40%) FIW32(0.37%)23(0.26%) IW74(0.86%)37(0.42%) (N)RWA215(2.50%)295(3.35%) NW/RN41(0.47%)234(2.66%) Spoken Corpus Total tags = 34,927 A = 21,467 RU = 255 RW = 145 Ambiguities = 295 ST&WP tags = 8,783 Written Corpus Total tags = 16,533 N = 3,601 Ambiguities = 885 ST&WP tags = 8,588

18 Where next? Further refinement of ST&WP annotation. Further refinement of ST&WP annotation. ST&WP and prosodic discontinuities (e.g. voice quality.) ST&WP and prosodic discontinuities (e.g. voice quality.) Combination of quantitative and qualitative analyses. Combination of quantitative and qualitative analyses. Comparison of findings from the two corpora. Comparison of findings from the two corpora.


Download ppt "6th April 2003PALC 2003 Building a corpus to investigate the presentation of speech, thought and writing in Spoken British English Dan McIntyre, John."

Similar presentations


Ads by Google