Presentation is loading. Please wait.

Presentation is loading. Please wait.

On translation units and automatic processing Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July.

Similar presentations


Presentation on theme: "On translation units and automatic processing Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July."— Presentation transcript:

1 On translation units and automatic processing Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July

2 Natural Language Processing -Main lexical problems- Disambiguation Multiword expressions All levels of language Point of view: Monolingual Multilingual Interlingual

3 Interlingual task: translation (I) Problem: text segmentation Machine translation: Need for objective criteria for segmentation

4 Interlingual task: translation (II) Multiword segments Multiword expressions Of the same order of magnitude as the number of single words (Jakendoff 1977) 41% - WordNet 1.7 (Fellbaum 1999)

5 Linguistic levels Lexicology (and terminology) Degree of lexicalization Morphology and syntax Components: order, cooccurrence, inflection... Semantics Decomposability, other relationships Pragmatics Context, equivalent words Text analysis

6 Points of view for analysing Traditional Linguistics Since 1957... Computational Linguistics A pain in the neck (Sag et al. 2002) Translation – Machine Translation Need for better approaches

7 Names and definitions for MWE (I) Idiosyncratic interpretations that cross word boundaries (or spaces) (Sag et al. 2002) A sequence of words that acts as a single unit at some level of linguistic analysis (Calzolari et al. 2002) Any phrase that is not entirely predictable on the basis of standard grammar rules and lexical entries (LinGO Lab, Stanford University)

8 Names and definitions for MWE (II) English: Multiword Expressions (MWE) o Units (MWU) (Cowie, 1985) Multi-word lexemes (MWL) (Gates, 1988) Multiword lexical unit (Zgusta, 1967) complex lexemes and lexical units (Lipka, 1983) Basque: lexia konplexuak (Abaitua, 2002) hitz anitzeko unitate lexikalak (HAUL) (Grupo IXA) Spanish: expresiones o unidades multipalabra multiverbales (Alvar Ezquerra, 2000) poliléxicas (Benson, 1985) expresiones pluriverbales (Casares, 1992 [1950]) unidades pluriverbales lexicalizadas y habitualizadas (Haensch et al., 1982) unidad léxica pluriverbal (Hernández, 1989) unidades fraseológicas (UFS) o fraseologismo (Zuluaga, 1980) lexías complejas (Abaitua, 1997)

9 Classification criteria and linguistic description Cooccurrence and/or need of some components Syntactic and semantic transparency Formal and semantic compositionality Frozen or fixed status Selectional restrictions Violation of some general syntactic patterns or rules Degree of lexicalization Degree of conventionality Idiomaticity

10 Taxonomy (I) Lexicalized phrases Fixed expressions Semi-fixed expressions Non-Decomposable idioms Compound Nominals Proper Names Multiword terminology Syntactically flexible-expressions Verb-particle constructions Decomposable idioms Light verbs Institutionalized phrases (collocations) Sag et al., 2002

11 Taxonomy (II) Fixed expressions: Adverbial phrases: Al pie de la letra – to the letter – hitzez hitz De improviso – suddenly – ziplo Prepositional phrases: A causa de – because of - (r)en ondorioz* En torno a – around – inguruan Multiword conjunctions: Mientras tanto – meanwhile – bitartean Con tal de que – so long as – ba...* Latin expressions: Ad hoc, sine dubio, sine die...

12 Taxonomy (III) Semi-fixed expressions Non-Decomposable idioms: kick the bucket / estirar la pata Compound Nominals Viaje de novios – honeymoon – eztei-bidaia Proper Names the (Oakland) Raiders (problemática propia) Multiword terminology Mayoría absoluta – absolute majority – erabateko gehiengo

13 Taxonomy (IV) Syntactically flexible-expressions Verb-particle constructions Non-compositionals: write up, look up / acordarse de, constar de / posposizioak compositionals: break up Decomposable idioms spill the beans – revelar un secreto Light verbs: make, do, have, give hacer, tener, ser, dar egin, izan, eman

14 Taxonomy (V) Institutionalized phrases (collocations) Pay attention – poner/prestar atención – arreta eman Heavy smoker – fumador empedernido – erretzaile amorratua Red wine – vino tinto – ardo beltza (Examples from Testuteka http://paginaspersonales.deusto.es/abaitua/deli/ testuteka/index.html)

15 MultiWord Expression as Translation Unit Translation Units: difficulty in definition and classification Vázquez-Ayora (1977): simple diluted – multiple-to-one-equivalents (Nida) fractionary "In fact there are good reasons for keeping the UT (in the sense of translation atom) in MT as small -and hence as manageable- as possible" (Bennet, 1994)

16 Methods for processing Simbolics Words-with-spaces Hierarchical Lexicon with Default Constraint Inheritance Circumscribed Constructions Lexical Selection Information about Frequency Example: Villavicencio et al. 2004 Statistics F. Smadja: Xtract

17 Conclusions MWEs as Translation Units Approach from Translation and, specially, from Machine Translation Linguistic definition and precision for better processing

18 Thats all folks! ¡Eso es todo amigos! Agur Ben-Hur!


Download ppt "On translation units and automatic processing Patricia Fernández Carrelo University of Deusto CliP 2006, London, 29 June–1 July."

Similar presentations


Ads by Google