Download presentation
Presentation is loading. Please wait.
Published byPaulina Lynch Modified over 9 years ago
1
A probabilistic approach to language structure Annarita Felici and Paul Pal Royal Holloway, University of London Helsinki 2-4 June 2008 A.Felici@rhul.ac.ukP.Pal@rhul.ac.uk
2
2-4 June 2008QITL3 Outline Field of investigation Research goals Data Probabilistic analysis Information Theory Entropy results
3
2-4 June 2008QITL3 Field of investigation Repetitive language structure in multilingual legal text EU normative statements in translation Languages of investigation English, French, German and Italian
4
2-4 June 2008QITL3 Field of investigation: legal norms Deontic norms (from the Greek deon = duty). obligations, prohibitions, permissions and authorizations Constitutive performatives The uttering of a performative is, or is part of, the doing of a certain kind of action or speech acts (Austin 1962) Uttering a sentence = doing things
5
2-4 June 2008QITL3 Other norm types Logical necessity necessary requirements or competences Non-binding norms guidelines, correct procedure
6
2-4 June 2008QITL3 Research goals 1. To evaluate the degree of prescriptive standardization in French, German and Italian with reference to English 2. To predict translation equivalents in French, German and Italian
7
2-4 June 2008QITL3 English legal drafting is highly standardized The EU and the main English drafting suggest modal verbs for prescriptive norms (Coode 1843, Driedger 1976, Dickerson 1975, Thornton 1996) Text types under investigation are repetitive and reusable Text types under investigation can be more or less binding under the conditions that:
8
2-4 June 2008QITL3 Data Multilingual parallel corpus Origin:EU Corpus size:1.404.723 words Text type:normative Type of docs:Secondary Legislation ( Regulations,Decisions,Directives, Recommendations) Years: 2001-04 Languages:English, French, German, Italian
9
2-4 June 2008QITL3 Probabilistic Analysis Information Theory To measure the amount of linguistic alternatives when translating a repetitive normative statement from English into French, German and Italian = Quantifying information by reducing uncertainty more alternatives = more uncertainty (high entropy) less alternatives = more standardization, certainty (low entropy)
10
2-4 June 2008QITL3 Probabilistic Variables Categories of expressions Linguistic forms English modals Entry point for parallel retrieval shall, must, may, can, should
11
2-4 June 2008QITL3 Categories of expression Constitutive norms and performatives Logical necessity Permissions and authorizations Capability Non-binding norms
12
2-4 June 2008QITL3 Linguistic forms Indicative (pres.) Modal verbs (mv) Verbal periphrasis (vp) Lexicalized modal expressions (le) Ellipses (0- correspondence)
13
2-4 June 2008QITL3 Linguistic forms Linguistic equivalents used in constitutive and performative norms
14
2-4 June 2008QITL3 Linguistic forms Linguistic equivalents used to convey permissions and authorizations
15
2-4 June 2008QITL3 Given the English system of modality, which is the relative probability of choosing an equivalent modal verb in the translation of may or must and a different linguistic form as the equivalent of shall? Is the probability of a choice in a system affected by a choice in another?
16
2-4 June 2008QITL3 Information Theory the information value or content h(p) is dependent on the probability of occurrence (p) of an event (Shannon 1949) h(p) = - log (p) = log (1/p) Entropy degree of uncertainty (= shortage of information due to the large number of alternatives)
17
2-4 June 2008QITL3 Probabilistic analysis The frequency of occurrence (n i ) of each linguistic form is associated with a category A probability variable (p i ) is derived from the estimated proportion of a particular linguistic form
18
2-4 June 2008QITL3 Probabilistic analysis In English P 1 = p mv → shall = n shall / n;p 2 = p mv → must = n must / n; p 3 = p mv → should = n should / n; p 4 = p mv → can = n can / n; p 5 = p mv → may = n may / n In French, German and Italian p 1 = p indicative + p mv + p vp + p me + p ellipses ; p 2 = p indicative + p mv + p vp + p me + p ellipses and so on.
19
2-4 June 2008QITL3 Linguistic forms and frequencies of occurrences in the EU Regulation for the selected categories of 1) constitutive norms and 2) permissions and authorization
20
2-4 June 2008QITL3 Probabilistic approach The sum of these probabilities produces different information values The expected information content of a system is the sum of the information contents weighted by the probabilities for each possible outcome
21
2-4 June 2008QITL3 Entropy : extrema Variations in the language-specific p(i) values of linguistic forms produce distribution profiles reflecting the characteristics of the corresponding language. Mathematically it can be shown that If all the p(i) values are equal (equi-probable situation), the profile is a uniform distribution and results in maximum entropy. If only one probability p(i) is maximum and the remaining p(i) values are zero, the entropy is minimum (e.g. English). All other distributions lie between these two limits (e.g. French, German and Italian)
22
2-4 June 2008QITL3 A concrete example Regulation document in English, French, German and Italian + a fictitious language. One category of expression: e.g. the constitutive norms. 5 linguistic forms for this category. Total number of modal verbs and alternatives: 2075.
23
2-4 June 2008QITL3 Constitutive norm EnglishFrenchGermanItalianFictitious mv 1382589481276 ind0122311251192276 vp 08716276 me 0434651276 el 0504652276 Frequency of occurrences of expression modes in 4 real languages and one fictitious language
24
2-4 June 2008QITL3 Histogram of 5 modes of expression
25
2-4 June 2008QITL3 Comparison based on Entropy Computed Entropy of Constitutive norm ENH = 0 + Hmv + 0 + 0 + 0 = 0.405 FRH = Hind + Hmv + Hvp + Hme + Hme =0.857 GEH = Hind + Hmv + Hvp + Hme + Hme =1.08 ITH = Hind + Hmv + Hvp + Hme + Hme =0.88 FIH = Hind + Hmv + Hvp + Hme + Hme =2.32
26
2-4 June 2008QITL3 Computed Entropy of constitutive norms (English, French, German, Italian and Fictitious)
27
2-4 June 2008QITL3 Entropy results 1. In the EU Regulation according to the 5 categories of expression (1. Constitutive and performative norms, 2. Logical necessity, 3.Permissions and authorizations, 4.Capability, 5. Non-binding norms) 2. In the EU Secondary Legislation overall according to the 4 types of documents (Regulations, Decisions, Directives, Recommendations)
28
2-4 June 2008QITL3 Entropy in the EU Regulation
29
2-4 June 2008QITL3 Entropy results EU Regulation Logical necessity, permissions and authorizations and capability (< entropy) quite standardized in the 4 languages = almost equivalent translations Constitutive performative norms (> entropy) translation is more difficult to predict Definitions, const. statements, obligations FR: < entropy than IT DE: > entropy (VP sein/haben…zu)
30
2-4 June 2008QITL3 Entropy results EU Regulation Non -binding norms fairly amount of variation among the 4 languages FR/IT: >entropy DE: < entropy (should is most likely translated with sollen- Soll-Vorschriften)
31
2-4 June 2008QITL3 Entropy overall the 4 EU documents
32
2-4 June 2008QITL3 Entropy results EU Secondary Legislation Regulations and Decisions (< entropy) Direct applicability of the norms = more precision and standardization FR looks more standardized than IT and DE Directives (> entropy than Reg. and Dec.) Binding only as to the result to be achieved Recommendations (> entropy) Not-binding: more freedom DE : sollen
33
2-4 June 2008QITL3 Conclusions Given certain conditions, it is possible to predict with some certainty the occurrence of a particular factor If applied to repetitive texts, entropy analysis can enhance research in langauge testing, evaluation and in the development of automated translation’s tools
34
2-4 June 2008QITL3 References Austin, J. L. 1962. How to do things with words.Oxford: Oxford University Press. Coode, G. 1843. Legislative Expressions. Appendix to the Report of the Poor Law Commissioners on Local Taxation. Published separately 1845, 2nd Ed.1852. Driedger, E. A. 1976. The Composition of legislation. Legislative forms and precedents (2 nd Ed.). Ottawa:The Department of Justice Shannon, Cand W. Weaver. 1963 (1949) The mathematical theory of communication. Urbana: University of Illinois Press.USA. Thornton G.C. 1996. Legislative Drafting (4 th Ed.). Butterworths, London. http://publications.europa.eu/code/en/en-6000000.htm
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.