1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université.

Slides:



Advertisements
Similar presentations
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Slide 1 12/5/2007 Phone: +33 (0) ou Web: GlobaWare Responsiveness Responsiveness.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Chapter 4.
Lexicography ( Dictionary Skills) Lecture 2
How do we work in a virtual multilingual classroom? A virtual multilingual classroom with Moodle and Apertium Cultural and Linguistic Practices in the.
Confidential property of Belkin International. Unlawful to copy or reproduce in any manner without the express written consent of Belkin International.
Environmental Terminology System and Services (ETSS) June 2007.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
University of Southern California Enterprise Wide Information Systems Getting Started in R/3 Instructor: Richard W. Vawter.
Semantics and Lexicology Generativist semantics. From structuralist semantics Semantic features, components.
Chapter 4.
Methodology Conceptual Database Design
CALL – computer assisted language learning A short course delivered by Dr. Klaus Schwienhorst. MITE January 2002.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Chapter Extension 6 Using Microsoft Access © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
C o n f i d e n t i a l Developed By Nitendra NextHome Subject Name: Data Structure Using C Title: Overview of Data Structure.
Adagio4 Web Content Management EP Information Offices.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
COMPUTER ASSISTED / AIDED LANGUAGE LEARNING (CALL) By: Sugeili Liliana Chan Santos.
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.
Unit A1 What is Translation?
BUSINESS INFORMATICS descriptors presentation Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST) Linkoping.
By: Dan Johnson & Jena Block. RDF definition What is Semantic web? Search Engine Example What is RDF? Triples Vocabularies RDF/XML Why RDF?
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
Learning outcomes for BUSINESS INFORMATCIS Vladimir Radevski, PhD Associated Professor Faculty of Contemporary Sciences and Technologies (CST)
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Verification and Validation in the Context of Domain-Specific Modelling Janne Merilinna.
1 Workshop on Business-Driven Enterprise Application Design & Implementation Cristal City, Washington D.C., USA, July 21, 2008 How to Describe Workflow.
Comparing syntactic semantic patterns and passages in Interactive Cross Language Information Access (iCLEF at the University of Alicante) Borja Navarro,
The UNESCO Thesaurus Meeting for Managers of UNESCO Documentation Networks Meron Ewketu UNESCO Library June
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
C. Huc/CNES, D. Boucon/CNES-SILOGIC, D.M. Sawyer/NASA/GSFC, J.G. Garrett/NASA-Raytheon Producer-Archive Interface Methodology Abstract Standard PAIMAS.
Seminar in Applied Corpus Linguistics: Introduction APLNG 597A Xiaofei Lu August 26, 2009.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
ENeL WG3 meeting: Automatic Knowledge Acquisition for Lexicography Herstmonceux, August 2015 STARTS AT 2:30 PM.
1 INTEGRATION OF THE TEXTUAL DATA FOR INFORMATION RETRIEVAL : RE-USE THE LINGUISTIC INFORMATION OF VICINITY Omar LAROUK ELICO -ENS SIB University of Lyon-France.
1 Branches of Linguistics. 2 Branches of linguistics Linguists are engaged in a multiplicity of studies, some of which bear little direct relationship.
Reference WPx/Tx.y/YY-MM-DD/PP UsiXML project # Generating User Interface for Information Applications from Task, Domain and User models.
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Corpus Linguistics MOHAMMAD ALIPOUR ISLAMIC AZAD UNIVERSITY, AHVAZ BRANCH.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Semantics and the EPA System of Registries Gail Hodge IIa/ Consultant to the U.S. Environmental Protection Agency 18 April 2007.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Extension du formalisme SES pour l’intégration de la hiérarchie d’abstraction et la granularité temporelle au sein de la modélisation et la simulation.
In this lecture, we will learn about: Translation.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
SEMANTIC WEB Presented by- Farhana Yasmin – MD.Raihanul Islam – Nohore Jannat –
Plan Present: lexicographer and translator – a love-hate relationship
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The Semantic Web By: Maulik Parikh.
Committee of Experts World Intellectual Property Organization
Computational and Statistical Methods for Corpus Analysis: Overview
TERMINOLOGY AND TRANSLATION
Using GOLD to Tracking L2 Development
LINGUA INGLESE 2A – a.a. 2018/2019 Computer-Aided Translation Technology LESSON 3 prof. ssa Laura Liucci –
Requirements Document
Presentation transcript:

1 Marie-Claude L'Homme Patrick Leroyer Benoit Robichaud Observatoire de linguistique Sens-Texte (OLST) Département de linguistique et de traduction Université de Montréal Centlex Aarhus School of Business Aarhus University Advanced Encoding for Multilingual Access in a Terminological Data Base A Matter of Balance

2 Outline Objectives  Access to translations of specialized collocations encoded in a terminological database The terminological database: The DiCoInfo  Current contents and structure  Current functionalities and limitations for translation needs A model for accessing specialized collocations  The linguistic apparatus  The technical apparatus Challenges and future work L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

3 Objectives Implementing new translation functionalities in an existing terminological database:  Direct access to data for L1-L2 translation: what is the translation of a specific collocation?  Esp. providing users access to translations of collocations: send a file as an attachment -> envoyer, transmettre un fichier en pièce jointe  Define a method that allows for the enrichment of the database without having to translate collocations one by one  Access functionalities should not presuppose technical linguistic knowledge from the user L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

4 The DiCoInfo: a dynamic, polyfunctional tool (1): an overview L’Homme – Leroyer – Robichaud / TKE 2010 Dublin An XML database containing terms related to the fields of computing and the Internet Approx. 1,000 entries in French and 400 in English Based mainly on the lexical framework of Explanatory Combinatorial Lexicology (ECL, Mel’cuk et al ; 1995) Descriptions based on corpora (2 million words in French; 1 million words in English)

5 The DiCoInfo: a dynamic, polyfunctional tool (2): the term record L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

6 The DiCoInfo: a dynamic, polyfunctional tool (3): the user interface Search Language L’Homme – Leroyer – Robichaud / TKE 2010 Dublin Search Mode Search Precision

7 The DiCoInfo: a dynamic, polyfunctional tool (4): narrowing searches L’Homme – Leroyer – Robichaud / TKE 2010 Dublin Words starting with the string program (14)Terms starting with the word program (6)

8 The DiCoInfo as a translation aid: current functionalities For L1-L2 translation  L1 reception phase:  Comprehensive coverage of the domain  Headwords and definitions  Access to lists of semantically related items  L2 production phase:  Presentation of grammatical data  Actancial structures and linguistic forms of actants  Contexts for pragmatic and stylistic information (professional discourse)  L1 > L2 translation phase:  Equivalents to the headwords  Equivalents to collocations L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

One DicoInfo database = one dictionary 9 Dicoinfo database Lexicographic team Dictionary Interface L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

One DicoInfo database = several dictionaries 10 Dicoinfo database Lexicographic team Search engine L1 & L2 Production Dictionary L1<>L2 Translation Dictionary LSP-learning Dictionary Other dictionary applications L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

11 The DiCoInfo as a translation tool: potential improvements Extensive lists of collocations ◦ Comprehensiveness leads to lists of collocations that are not discriminated according to specific situations or user needs  E.g. file has a long list of collocates (e.g., create, delete, compress. generate, use, edit a file, etc.); its French equivalent fichier has more than 100 collocates Limited multilingual assistance Established at the level of headwords, but not at the level of lexical relationships (this includes collocations)  E.g. there is a formal link between attachment and pièce jointe, but not between send something as an attachment and envoyer qqch. en pièce jointe For L2 production phase, the translator needs direct access to translations of collocations L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Accessing translations of collocations: the model Two components  A linguistic apparatus based on Combinatorial Explanatory Lexicology: lexical functions (LFs)  Encodings and formalization  Explanation-based grouping  A technical apparatus based on advanced search functions  Searching for lexical relations and expressions  Displaying equivalences 12

13 The linguistic apparatus (1) Lexical function (LF) encodes: 1.Syntactic relationship between the base and the collocate: Space bar: Verb + 1 st complement: press the ~ Verb + 1 st complement: release the ~ Verb + 2 nd complement:insert... (a space) with the ~ 2. Argument structure of the base: Space bar: ~ used by someone (arg1) to act on something (arg2) 1 st argument:press the ~ 1 st and 2 nd arguments:insert something with the ~ 3. General and abstract meaning of the collocate: Typical uses: press, release a space bar, insert something with a space bar Creation:create, define a password write, develop a program L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

14 The linguistic apparatus (2) Lexical function (LF) Written: f (x) = y  f = function  x = keyword  y = value Real 1 (space bar) = press the ~ FinReal 1 (space bar) = release the ~ Labreal 12 (space bar) = insert … with the ~ L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

15 The linguistic apparatus (3) L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Searching the database: 16 The technical apparatus: Equivalents of collocates (1) L’Homme – Leroyer – Robichaud / TKE 2010 Dublin Find the term records that describe the searched term in a lexical relation Find the term records of the equivalents

The technical apparatus: Equivalents of collocates (2) Linking the equivalents in the interface: 17 L’Homme – Leroyer – Robichaud / TKE 2010 Dublin ?

18 The technical apparatus: Equivalents of collocations (1) L’Homme – Leroyer – Robichaud / TKE 2010 Dublin Find the term records in which: (i) a first word appears as the headword (ii) a second word appears as a collocate Find the equivalent term records

The technical apparatus: Equivalents of collocations (2) 19 Linking the equivalents in the interface: L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

The technical apparatus: Side effects 20 L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Challenges (1) Different syntactic structures  Different LFs according to syntactic functions of the key word:  Real 12 :Search the Internet for information  Labreal 12 :Chercher de l’information dans Internet Split actants  partition: ~ created by user1 to act on data1 or software1  Labreal 12 1 :Save data on a partition  Labreal 12 2 :Install a program on a partition 21 L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Challenges (2) 22 L’Homme – Leroyer – Robichaud / TKE 2010 Dublin

Concluding remarks We proposed a model to retrieve translations of collocations  That meets user needs  That is transparent (does not presuppose special linguistic or technical knowledge)  That does not require that all collocations be translated on an individual basis 23

24 Future work Extension of the coverage of English terminology  adding English collocations to the database Extension to other languages, namely Spanish which is currently under development Extension to other subject fields  Ongoing project in the field of climate change Extension of search capabilities  To allow users to discover collocates based on an onomasiological search

Go raibh maith agat References L'Homme, M.-C. (2008) Le DiCoInfo. Méthodologie pour une nouvelle génération de dictionnaires spécialisés, Traduire 217, pp L’Homme, M.-C. et al. (2009). Le manuel du DiCoInfo. L’Homme, M.-C. and P. Leroyer (2009). Combining the semantics of collocations with situation-driven search paths in specialized dictionaries. Terminology 15(2), pp Leroyer, P. (2007) Terminologie et dictionnaires: la porte des utilisateurs. In Quirion, J. : Terminologies, Approches Transdisciplinaires. Actes en ligne. Gatineau : Université du Québec en Outaouais. Mel’čuk, I., A. Clas and A. Polguère (1995) Introduction à la lexicologie explicative et combinatoire. Louvain-la-Neuve (Belgique): Duculot / Aupelf - UREF. Mel’čuk, I. et al. ( ) Dictionnaire explicatif et combinatoire du français contemporain. Montréal: Presses de l’Université de Montréal. 25