Presentation is loading. Please wait.

Presentation is loading. Please wait.

LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness.

Similar presentations


Presentation on theme: "LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness."— Presentation transcript:

1 LIN 3098 – Corpus Linguistics Albert Gatt

2 In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness and balance  external vs. internal criteria: Biber (1992) introduce the multi-dimensional approach to register/genre variation (Biber 1988)

3 Part 1 The concept of register/genre

4 A preliminary example  Compare the following: It is hard to resolve this problem. I find it hard to resolve this problem.  Is one intuitively more “formal”?  Why?

5 A preliminary example  Extraposed to-clause It is hard to resolve this problem.  It (expletive)  Verb be  An adjective (hard) or participle (boring)  Clause starting with to + infinitive verb  Tends to be associated with a formal, “anomymous” style.  Tends to be “static”: Adjective or participle denotes a state, not a dynamic event.

6 A preliminary example  Extraposed to-clause It is hard to resolve this problem.  It (expletive)  Verb be  An adjective (hard) or participle (boring)  Clause starting with to + infinitive verb  If our intuitions are correct, we would expect the distribution of this clause to vary across genres and registers.

7 What is a register?  Would you consider the following to be registers? 1.recipe English 2.legal Maltese 3.specialised language used by ship- builders  What are the crucial characteristics of register?

8 Defining register  Possible definitions (see overview in Paolillo 2000): register = “a field of discourse” or “topic” register = “a combination of all the parameters of the communicative situation” register = “an occupationally determined variety of language”

9 Defining genre  In discourse analysis and related fields, genre is given a “sociologically oriented” definition:  “A socially ratified way of using language in connection with a particular type of social activity” suggests “typical” settings in which language is used e.g. interview, lecture, story…

10 Why is this relevant?  Reminder (see lecture 2): general-purpose corpora aim for balance and representativeness  how genre/register are defined affects the structure and the uses of the corpus corpus-based studies of variation across/within registers need a well- defined notion

11 Balance and representativeness  Balance: refers to the range of types of text in the corpus e.g. the BNC’s construction was based on an a priori classification of texts by domain, time and medium  Representativeness: refers to the extent to which the corpus contains the full range of variation in the language.  Representativeness depends on balance as a prerequisite

12 Biber (1993) on achieving balance  Biber distinguishes: external criteria:  social and communicative contexts in which a particular sample of text/speech is produced  external criteria define registers or genres internal criteria:  linguistic (e.g. lexico-grammatical) features that distinguish texts  internal criteria define text types

13 External vs. internal  Example: academic writing vs. spoken conversation Some external criteria of differentiation:  primary channel (spoken/written/…)  type of addressee  factuality Some internal criteria of differentiation:  more uses of personal pronouns in spoken discourse  more use of passives in academic writing ……

14 Which should come first?  Biber’s argument: “in defining the population for a corpus, register/genre distinctions [i.e. external criteria] take precedence over text-type distinctions. […] identification of the salient text-type distinctions in a language requires a representative corpus of texts…”

15 Biber’s external criteria 1.Primary channel: written/spoken/scripted 2. Format: published/unpublished  includes various publication formats 3. Setting: institutional/other/private-personal

16 Biber’s external criteria 4. Addresse/receiver a.Plurality: unenumerated/ plural/individual/self b.Presence: present/absent c.Interactiveness: none/little/extensive d.Shared knowledge: general/ specialised/ personal

17 Biber’s external criteria 5.Addressor: a.Demographic variation: age, sex etc b.Acknowledgement: acknowledged invididual/insititution 6.Factuality: factual-informational / intermediate / imaginative 7.Purposes: persuade, entertain, edify, inform, instruct… 8.Topics: [cf. the “Domain” definition in BNC texts]

18 The logic behind genre/register comparison  A priori distinction between different genres/registers adequately sampled to be representative  Given these externally-based distinctions, the question is: what linguistic features are characteristic (give rise to) different genres?

19 Part 2 The multifeature/multidimensional framework (Biber 1988, Biber 1995)

20 Biber (1988, 1995)  Compared twenty-one genres in spoken and written British English  Used a precompiled list of 67 linguistic features, comparing: the extent to which these features “cluster together” across genres  high relative frequency of personal pronouns => high relative frequency of questions the extent to which these clusters are more clearly present in different genres

21 Primary goals 1.identify the main dimensions (clusters of features) of variation underlying all registers 2.find similarities and differences between different registers

22 Dimensions  Dimension: group of features that are empirically determined to co-occur in text  Functional interpretation: given a set of features forming a dimension  e.g. pers. pronouns + questions the crucial question is: how do we interpret it functionally? e.g. the cluster containing pers. pronouns and questions shows a high level of interpersonal focus in the text

23 Factor analysis  The MF/MD approach uses factor analysis statistical technique to group together related features based on their co- occurrence resulting clusters of features (“factors) are then interpreted and given a label this is the process of identification and functional interpretation of dimensions

24 Biber’s methodology 1.Identify the grammatical features based on review of existing literature 2.tag all relevant features in the corpus texts 3.post-edit the texts to ensure accuracy 4.count frequency of each feature in each text 5.apply factor analysis to compute co-occurrence patterns among features 6.interpret the resulting dimensions functionally 7.compare different registers to see how much each dimension is represented in them

25 Types of features  Lexical features type-token ratio (indicates the average no. of different types given the number of tokens) word length  lexical semantic features e.g. word classes like hedges (probably, possibly…); speech act verbs (declare), etc

26 Types of features  Grammatical feature classes nouns, prepositional phrases, attributive and predicative adjectives, etc.  Syntactic features: relative clauses, that-complements, pied-piping constructions (Which car does he like?), conditional subordination (should you ever…)

27 The dimensions identified  Involved vs. informational production  Narrative vs. non-narrative production  Elaborated vs. situation-dependent reference  Overt expression of persuasion  Abstract vs. non-abstract style NB. Many of these dimensions define “poles of opposition”

28 Dimension 1: involved vs. informational  Features: 1 st & 2 nd personal pronouns questions reductions stance verbs hedges emphatics adverbial subordination nouns adjectives prepositional phrases long words Typical of conversations, letters (high personal involvement) Typical of informational exposition, e.g. in official documents and academic writing

29 Dimension 2: Narrative vs. non- narrative  Features: past tense perfect aspect 3 rd person pronouns speech act verbs present tense attributive adjectives Typical of fiction Typical of broadcasts, telephone conversations, professional letters

30 Dimension 3: elaborated vs. situation-dependent reference  Features: wh-relative clauses pied-piping phrasal coordination time adverbials place adverbials Typical of “elaborated” text: official documents, professional letters, written exposition Typical of “situation- independent language” Typical of “situation- dependent language”, e.g. broadcasts, fiction, personal letters

31 Dimension 4: Overt expression of persuasion  Features: modals conditional subordination lack of any of the above Defines an “overt expression of persuasion type” e.g editorials, professional letters Language which does not overtly seek to persuade

32 Dimension 5: Abstract vs. non- abstract style  Features: agentless passives by-passives … lack of any of the above An “abstract style”: technical prose, academic prose, official documents Language which is typically not abstract: conversation, public speeches, broadcasts…

33 Biber’s main argument  No one dimension is enough to characterise the properties of a particular register dimensions are coherent, correlated groupings of features every register could be defined in terms of the relative prominence of all 5 dimensions

34 Biber’s main argument  Biber finds no evidence of an absolute difference between spoken and written language e.g. conversations often display similar characteristics to other non-spoken genres  Better to identify different types of speech (broadcast, scripted, spontaneous) view similarities and differences to different types of writing

35 Summary  Biber’s MF/MD approach has proved highly influential in the study of register and genre  Crucially, relies on a priori definition of: features (“what to look for”) registers (“situationally-defined uses of language”)

36 References  Paolillo, J. C. (2000). Formalising formality. Journal of Linguistics, 36: 215—259  Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8 (4): 243-258.  Biber, D. (1995). On the role of computational, statistical and interpretive techniques in multi-dimensional analysis of register variation. Text, 15 (3): 314—370


Download ppt "LIN 3098 – Corpus Linguistics Albert Gatt. In this lecture  Corpora for the study of genre/register variation revisit the concept of representativeness."

Similar presentations


Ads by Google