Presentation on theme: "Evidentiality and Epistemicity in a Corpus of Scientific Biomedical Papers from the British Medical Journal. A focus on “evidence” and “cause/s” *I. Riccioni,"— Presentation transcript:
Evidentiality and Epistemicity in a Corpus of Scientific Biomedical Papers from the British Medical Journal. A focus on “evidence” and “cause/s” *I. Riccioni, *R. Bongelli, *C. Canestrari, *C. Buldorini, **R. Pietrobon, *Andrzej Zuczkowski * University of Macerata (Italy) ** Duke University, Durham, North Carolina (USA) ECitS Conference, 5-7 September 2012 University of Kent, Canterbury, UK
INTRODUCTION As researchers, we analyse linguistic communication, mostly through a qualitative and quantitative analysis of the syntactic, semantic, and pragmatic levels. Our theoretical and methodological background integrates aspects from Conversational Analysis (interruptions, overlaps, negotiation, politeness,etc.); Discourse Analysis (speech acts, for example giving advice in trouble talk contexts, etc) Text Theory (in particular, J. S. Petoefi ‘s structural model of communication.)
We have been working for several years on different types of oral (recorded and transcribed) corpora (naturally occuring conversations, political discourses, humorous interactions, doctor-patient dialogues, psychoterapeutic sessions, etc). We have been working also on the communication of certainty and uncertainty in different types of written texts (academic, biomedical, literary and so on) In 2009 we got involved in the project titled A Corpus of Scientific Biomedical Texts Spanning over 168 years annotated for Uncertainty with an American colleague from the Duke University of North Caroline Professor Ricardo Pietrobon who is a surgeon interested in “research on research” and “scientific writing” https://sites.google.com/site/biouncertainty/.http://goo.gl/zTBPI https://sites.google.com/site/biouncertainty/
The communication of Uncertainty in a corpus of scientific biomedical texts spanning over 168 years Aims: ◦ identify lexical and morphosyntactic markers of uncertainty and their linguistic scope in a corpus of 80 papers randomly selected from BMJ from 1840 to 2007 andscope ◦ detect their trends over time.
Scope: “…the general term that we shall use to describe the semantic ‘influence’ which such words have on neighbouring parts of a sentence. It deserves attention because of its close connection with the ordering of elements.”
LITERATURE BACKGROUND The topic of certainty/uncertainty in communication is related, more or less directly, to what in linguistic literature is called epistemicity and evidentiality (and with related topics/concepts such as subjectivity, modality and hedging or mitigation) This area of study has attracted a great deal of interest over the past three decades or so, inevitably resulting in a multitude of terms and conflicting definitions (see Dendale and Tasmowski 2001).
EPISTEMICITY It refers to those linguistic markers that, according to different authors, reveal speaker’s/writer’s: ◦ attitude regarding the reliability of the information (e.g. Dendale and Tasmowski 2001, González 2005) ◦ judgment of the likelihood of the proposition (e.g. Nuyts 2001b, Plungian 2001, Cappelli 2007, Cornillie 2007) ◦ commitment to the truth of the message (e.g. Sanders and Spooren 1996, De Haan 1999, González 2005)
A piece of information is communicated as certain when the speaker’s/writer’s commitment to its truth is at the maximum or high level, such as in the example (1) “These workers showed that there is an inverse correlation between the height of the hyperbilirubinaemia and the amount of bile excreted in the faecese” (Aethiology of physiological jaundice of the newborn, 1951) (2) “All the ill effects of ruptured perineum and prolapsus uteri are relieved with certainty by a simple plastic operation” (Vesico-vaginal and rectovaginal fistula, 1861)
A piece of information is communicated as uncertain when the speaker’s/writer’s commitment to its truth is at the minimum or low level, such as in the example (3) “the evidence suggests that it is not likely to have been wrong in more than a small proportion” (Lung Cancer, 1956) (4) “ Perhaps, however, the strongest proof of the importance of local rest is furnished by those cases in which a pleural effusion has occurred on the affected side.” (On the importance of rest in the treatment of acute phthisis, )
EVIDENTIALITY With the term evidentiality, scholars generally refer to the coding of ◦ sources of information and ◦ modes of knowing (Chafe 1986, Nuyts 2001a, 2001b, Plungian 2001, Cornillie 2007, Papafragou et al. 2007) i.e. the linguistic markers that reveal how speakers/writers gain access to the piece of information they communicate (Willett, 1988).
If a doctor says (5) “I see a cyst”, he explicitly communicates the information source; though in the sentence there is no epistemic marker, the verb I see is enough to implicitly communicate Certainty.
EVIDENTIALITY & EPISTEMICITY Evidentiality and epistemicity seem to be two sides of the same coin, in that: ◦ When a piece of information is communicated as (if it were) certain (epistemicity) by writers, at the same time it is also communicated as (if it were) known (evidentiality) to them (and vice versa). ◦ When a piece of information is communicated as (if it were) uncertain, at the same time it is also communicated as (if it were) believed by them (and vice versa).
KUB THEORY The multitude of evidential and epistemic markers (lexical and morpho-syntactic) can be led back and reduced to three main macro-markers: I know I do not know I do not know whether (believe) These reflect the three basic evidential and epistemic territories of information (adapting Kamio’s terminology (1991, 1994)) of the Known, the Unknown, and the Believed (KUB)
The Known is all that writers say they know (perceive, remember etc.) in a broad sense. From an epistemic viewpoint such markers communicate Certainty. The Believed is all that writers say they do not know if/whether (impressions, opinions, suppositions etc.). From an epistemic viewpoint such markers communicate Uncertainty. The Unknown is when writers communicate what they do not know, i.e. when the information is unknown to them.
Lexical markersMorphosyntactic markers Known certain verbs (I remember…) adverbs (surely…) verbal expressions (I have no doubt…) declarative sentences in the present, past and future indicative with no lexical evidential or epistemic marker. Unknownnegative form of the verbs of the Known (I don’t remember…) adjectives (unknown…) literal questions Believed uncertain verbs (I suppose…) verbal expressions (It is possible…) adverbs (perhaps…), adjectives (likely…) modal verbs modal verbs in conditional and subjunctive moods if clauses epistemic future
THE PRESENT STUDY For this conference we carried out the present pilot study on how evidence, causality and their relationships are communicated in BMJ papers (i.e. if they are communicated as certain or uncertain; in declarative or hypothetical structures, etc.) In particular, we focused on the terms Evidence Cause /causes Their relations The method combined a qualitative analysis with a quantitative, the latter being performed using the WordSmith Tools version 5 (Scott 2008).
EVIDENCE Out of the 80 papers we extracted and analyzed all 102 fragments where the term “evidence” occurred in a sentence. Our analysis criteria included: types of sentence (affirmative, negative, interrogative); the sentence is communicated as Certain-Known, Uncertain-Believed, Unknown; types of evidence. Affirmative - Certain - Direct observation (6) “Auscultation of the chest revealed evidence of increased activity in the right upper lobe.”(The treatment of pulmonary tuberculosis by nitrogen compression, 1914) Negative - Uncertain - Medical practice (7) This is simply a conjecture, however, which though possible does not seem probable, and has as yet, so far as my experience goes, no evidence to support it. (The treatment of ringworm of the scalp by the x rays, 1905)
CAUSE Out of the 80 papers we extracted and analysed all 103 fragments where the term “cause/s” occurred in a sentence. The analysis criteria included: Types of sentence (affirmative, negative, hyphotetical, interrogative); The causal relations communicated as Certain-Known, Uncertain- Believed, Unknown. Affirmative - Certain (8) “Koch has thus added to our conviction that the bacillus is the cause of the symptoms, seeing that, as he remarks, it is impossible to suppose that an organism can develop in such enormous numbers at the expense of the vital fluid, without exerting a serious influence upon the system. “ (Remarks on micro-organisms 1880)
Affirmative – Uncertain (9) “When confronted with a case of this kind, we must avoid the administration of any drug likely to cause either undue contraction or relaxation of the organ. Absolute rest is the best treatment.” (The determinant of abortion and how to combat them, 1907) Affirmative -Unknown (10) “…the cause of this symptom is unknown, but sleep is an important factor” (Do asthmatics suffer bronchoconstriction during rapid eye movement sleep?, 1986)
EVIDENCE AND CAUSALITY Out of the 80 papers we extracted 42 fragments where a relation between evidence and causality is explicit. We found 7 different types of relations: Type 1. evidence is insufficient to establish a causal link: 14 (33%); (11) “ …Recently there has been much experimental data to show the causative relation of adrenalin to these degenerative changes, but it has not been definitely settled whether this is a direct effect or due to increased tension.” (An address of the treatment of chronic degenerative lesions of the heart and aorta, 1909)
Type 2. evidence establishes a causal link: 9 (21.4%); (12) “…This was pretty conclusive evidence that the organism was the cause of the disease, and that it constituted the true infective element; because any other material that might be supposed to accompany it in the blood of the diseased animal must have been got rid of by the successive cultivations in chicken-broth.” (Remarks on micro-organism, 1880) Type 3. there is no evidence of a causal link: 9 (21.4%); Type 4. evidence denies a causal link: 3 (7.1%); Type 5. evidence suggests the existence of a causal link: 3 (7.1%); Type 6. evidence shows a weak causal link: 3 (7.1%); Type 7. evidence suggests the non existence of a causal link: 1 (2.4%).
MAIN RESULTS Analysis of the terms “evidence” and “cause/s” demonstrates that in the BMJ corpus they are mainly communicated as - Certain-Known and - in affirmative way. Out of the 11 different types of evidence we identified, the most common patterns are: ◦ direct observation 27 (23.5%); ◦ lab analysis, clinical exams, histological analysis 25 (21.7%); ◦ statistical analysis 14 (12.2%). Out of the 7 different relations between evidence and causality we identified, the most common are: ◦ evidence is insufficient to establish a causal link: 14 (33%); ◦ evidence establishes a causal link: 9 (21.4%); ◦ there is no evidence of a causal link: 9 (21.4%).
CONCLUSION & FUTURE STEPS At the end of the project we started in 2009, we will have made a significant improvement in our knowledge about the historical evolution of the communication of certainty, uncertainty, evidence, causality and their relationships in the writing of scientific papers within a 168-year span. We now plan on: ◦ performing a qualitative analysis of the other terms (related to evidence and causality so far, we have only identified their numerical occurrences using WordSmith Tools);other terms ◦ verifying the significance of a trend observed in the distribution of the terms related to evidence and causality during the period we consider
Results : Preliminary results on the corpus data show that there isn’t a significant difference in the use of the different uncertainty markers along the years. The results of the NLP experiments show that most of the Uncertainty markers can be recognized with good accuracy (Bongelli et. al 2012a; Bongelli et. al 2012b); At the moment, we are working on these results and on the identification of the scope of the Uncertainty markers. In their grammar Quirk et al (1985) define this word as “…the general term that we shall use to describe the semantic ‘influence’ which such words have on neighbouring parts of a sentence. It deserves attention because of its close connection with the ordering of elements.”