Presentation is loading. Please wait.

Presentation is loading. Please wait.

Austrian Academy of Sciences

Similar presentations


Presentation on theme: "Austrian Academy of Sciences"— Presentation transcript:

1 Austrian Academy of Sciences
Cognitive corpus-based LSP lexicography – research and implementation issues – a case study on the Multilingual Glossary on Risk Management Gerhard Budin University of Vienna Austrian Academy of Sciences 8th of April, 2011

2 Our empirical research landscape

3 The Making of… a Multilingual Glossary on Risk Management

4 Motivations and Methods: Terminologies for Risk Communication
The Role of LSP Lexicography in domain communication Increasing the “transparency” of terms Help negotiate a common understanding of terms in intra-, inter- and trans-disciplinary and transcultural discourse Help increase the consistency of risk discourse (written and spoken) and increase understanding in target audiences Reduce unnecessary synonyms, disambiguate polysems, help separate homonyms Help create risk terminologies in many languages Support knowledge sharing and knowledge transfer in cooperative work environments Support cross-cultural discourse (e.g. translation and parallel texts)

5

6 The Domains of Risk Management
Multidisciplinary, diverse, and fragmented - or Transdisciplinary, overlapping, converging, integrated, and complementary The need for mediating between different approaches, cultures, and discourses: Technological, engineering, research, science Administration, legislation, monitoring Social, sociological, political, cultural Domain approaches (financial, ecological, chemical, safety, geographical, planning and forecast, health, etc.)

7 WIN Project (FP6 2004-2009): WP “Human Language Interoperability”
Objectives WP 2200 is designed to support international risk management and risk communication processes (within the WIN project and beyond) Achieved results (with ongoing work) Large parallel corpora collection with risk-related texts and lexical resources (fr, en, de, es, ro, fi, hu, ru) Multilingual index with conceptual structure Bibliography and codes of sources Risk Ontology Multilingual online terminology database

8 Integrative R&D Approach
A combination of theoretical approaches and their methods in order to achieve a result that is targeted towards the needs of the project consortium and the cooperation partners Quantitative (computational) and qualitative (intellectual) methods of corpus analysis Lexicographical and terminographical (word/text-oriented and concept/knowledge-oriented) Text linguistics and translation studies Cross-cultural comparative approach and knowledge system approach, multi-domain communication Knowledge engineering, computational semantics/Web 2.0 (ontologies, frame semantics, etc.) Cognitive Science approach (media pedagogy – eLearning, specific learner support, interactive approach (mental lexicon), usability engineering

9 Motivation and Convergence of Research Interests and Contexts
Interest in cognitive science research applied to terminology management, ontology engineering, translation technologies, E-Learning systems design and implementation Research Cluster 1 “Translation – Cognition – Technologies” at the Center for Translation Studies, University of Vienna Interdisciplinary Research Platform on Cognitive Science – Cluster on Cognitive Linguistics Research Priority 1 Lexicology, Terminology, and Parallel Corpora at the Institute for Corpus Linguistics and Text Technology at the Austrian Academy of Sciences

10 Research contexts in several projects
Previous and ongoing projects Dynamont Methodology for Creating Dynamic Ontologies, BMVIT, national research programme “Semantic Systems” – multi-dimensional ontology modelling WIN (Wide Area Information Network on Risk Management) MGRM Multilingual Glossary on Risk Management IP (Integrated Project) in FP6, , focus on creating a multilingual terminology and ontology of risk management – risk ontology for natural hazards Montific - Multilingual ONTology for Internal Financial Control, a LLP project (Leonardo da Vinci II) Building a “learning ontology” for an eLearning environment STABILITY AND ADAPTATION OF CLASSIFICATION SYSTEMS IN A CROSS-CULTURAL PERSPECTIVE - European Science Foundation: COST A 31 project cognitive linguistics – how “classifiers” are embodied in language incl. ontologies TES4IP - Terminology Services for the Intellectual Property Domain (Bridge project funded by FFG, Austrian Research Agency Term extraction, multi-word term recognition, named entity recognition, legal vocabularies and legal ontologies -> Ongoing study Cognitive Ontologies Designing, Generating and Using Domain Ontologies

11 Ontology Engineering and Cognitive Science
Cognitive Aspects have been of interest in a variety of ontology engineering approaches Barry Smith Epistemological focus combined with work on domain ontologies (mainly bio-medical) Criticizing the epistemological foundations of terminology theory in elaborating his foundational theory of ontology Aldo Gangemi DOLCE: Descriptive Ontology for Linguistic and Cognitive Engineering Foundational theory of ontology Many projects, also on tools and on domain ontologies But also many others (Guarino, Sheth, Obrst, Noy, et al) have done research on these aspects Some criticism, that the focus in ontology evaluation is on syntactic evaluation for computational uses (only) – the classical scenario

12 “Cognitive Ontologies”
Conceptual clarification: Ontologies of cognitive processes In neuroscience research, similar to other bio-medical ontologies (cognitive atlas, neuropsychiatric phenomena, ontology of cognitive objects, etc.) Ontologies with a focus on their cognitive aspects DOLCE and other cognitive-oriented approaches Constructivist epistemology for ontology building, concerning the relation to “reality” Increasing convergence of these two concepts

13 Our own research Our previous and ongoing projects have been focusing on cognitive adequacy of domain ontologies and their use in knowledge acquisition in learning situations Terminology studies as a contribution from this perspective (related research by Nistrup Madsen/Erdman Thomsen 2005, 2009, etc.) Using DOLCE design patterns for multi-dimensional conceptual modeling for ontology building the DYNAMONT project From domain corpora to terminologies and from there to domain ontologies for eLearning scenarios – the MONTIFIC project For domain experts – the WIN/MULTH/MGRM project

14 Moving up (and down) the Ontology Spectrum
The challenge: from linguistic-cultural diversity of discourse and free-form lexical structures to a unified, formalized, axiomatized ontology – and back, to support human understanding and social processes such as collaborative learning The method: an integrative, multi-level modelling approach specifying the steps in a process-oriented workflow framework (with variable, combinable steps depending on concrete needs) for Gradual semantic enrichment Gradual semantic formalization Multi- and cross-lingual referencing/alignment for text management Constant interaction between full texts and lex-term resources The technology: a multi-component workbench (i.e. Dynamont-WB incl. ProTerm as a central element), using XML, RDF, OWL, SKOS, WordNet + GlobalWordnet, MLIF (containing TBX, TMX, XLIFF, LMF, TMF, etc.), FrameNet, etc. The advantage: full exploitation of all types of languages resources (LR) and knowledge organization systems (KOS), providing a framework not only for their semantic enrichment and formalization as ontologies but also for ontology-based multilingual authoring, text generation and translation

15 The global risk communication scenario
Several projects since 1994 covering the following activities: Thesaurus building Creating multilingual terminology databases Creating multilingual text corpora Lexicographical glossary Semantic enrichment (e.g. conceptual links, frame semantics) Collection and analysis of relevant knowledge organization systems Annotation of resources Mark-up of resources (TBX, etc.) Ontology building Communication design

16 From texts and terminologies to ontologies - and back to texts
Using the Risk scenario Termbase Export XML Domain Models – meta-models -> patterns Text corpus Term extraction – comparative testing ProTerm, MultiTerm Extract, MultiCorpora Aligning with termbase Convert to RDF Ontology import -> editor Mappings (GMT, XML, RDF, OWL, UML, comma delimited, RDB, for different kinds of lex-term resources, FN->OWL, etc.) The MULTH-WIN Project as an example of methods integration

17

18

19 Terminological frame semantics
INTERVENTION (ACTOR(S), ACTIVITIES/PHASES): RISK DETECTING (PRE-EVENT) - R-ASSESSMENT - R-PERCEPTION (X is risk) - EXPERIENCE (statistics, case studies) - OBSERVATION (monitoring) - METHOD - SATELLITE - PROGNOSES - R-ANALYSIS - R-FEATURES - SITUATION/CONTEXT (danger/hazard) - SIMULATION (course of events) - PROBALISTIC METHODS (safety) - RELIABILITY - R-IDENTIFICATION (DAMAGE) - R-SOURCE - DAMAGE CAUSE - VULNERABILITY (DAMAGE TARGET) - SUSCEPTABILITY (capacity/people) Rothkegel

20 Terminological frame semantics
I. Pre-event B. Public awareness and planning, II. In-event: C. Events and response afflux/Hochwasser durch Aufstau BE [[TYPE=flood], [PLACE=], [TIME=]], HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE= Aufstau]]], DAMAGE [TARGET=, SOURCE=, DEGREE=]], HAPPEN [STATES=, PROCESSES=]] backwater/Rückstau HAVE [CAUSE [[ORIGIN=], [NIEDERSCHLAG [TYPE=]], [STAU [TYPE= Rückstau]]], Rothkegel

21

22 Ordnance Survey

23 Dynamont architecture, tools and workflows

24

25 The Glossary The paper version of the glossary is used by risk managers, civil engineers, but also teachers, students, translators, journalists, etc. Generally, the purpose of such multilingual conceptual glossaries is to improve domain communication and to facilitate mutual understanding across linguistic boundaries. The concepts of risk management and their definitions presented in this glossary were carefully selected from a large body of technical literature and authentic text corpora in the respective languages. These sources are referenced in the bibliography. The multilingual glossary presented here includes 8 languages: English and French as main pivot languages, as well as German, Spanish, Romanian, Finnish, Hungarian, and Russian. It comprises about 230 central concepts of risk management with about 400 definitions and about 1400 terms representing these concepts in each language (including synonyms and hyperonyms), indicating the conceptual relations between the entries.

26 The Glossary The following themes are used as the macro-structure of the glossary: A. Risk assessment and technology assessment B. Public perception of risk, planning, preparation and alarm, C0. Risk events, equipment and operations, general terms C1. Fire - events, equipment and operations C2. Floods - events, equipment and operations C3. Oil spills - events, equipment and operations. Each glossary entry follows the same micro-structure with the following information elements: A conceptual number combined with a theme from the macro-structure The equivalent terms in the 8 languages, accompanied by grammatical information The definitions of the concept in each language, including multiple definitions that may differ from each other, accompanied by the textual source of the definition, also including structural semantic information on the concept Related terms and expressions.

27

28 Research issues Experimental settings User studies, user modelling
Data modelling Corpus-analysis Multilingual – multi-domain – cross-cultural Knowledge dynamics - Dynamic knowledge representations Cognitive studies

29 Conclusions and Outlook I
Online terminology database is continuously used 8-language Glossary Version produced in February 2011 Next steps in 2011: Work in progress! Database to be extended from 5 to 8 languages Full text corpora to be extended Promotion of the glossary in different user communities Term extraction, research Extension into more languages More scientific publications

30 Conclusions and outlook II
Research perspectives Further research in Cognitive ontologies User modelling, usability of terminological databases and LSP dictionaries Corpus-linguistic research – semantic annotation modelling Multilingual, multi-domain, cross-cultural issues

31 Selected References Budin, G. Socio-terminology and computational terminology – toward an integrated, corpus-based research approach. In: De Cilia, Rudolf et al. (eds.). Discourse, Politics, Identity. Tübingen: Stauffenburg Verlag, 2010, 21-31 Budin, G. Semantic Systems supporting Cross-Disciplinary Environmental Communication. In: Hryniewicz, O.; Studzinski, J.; Szediw, A. (eds.). Environmental Informatics and Systems Research. Vol 2 Workshop and application papers. EnviroInfo Aachen 2007, 23-30 CEDIM , Center for Disaster Management and Risk Reduction Technology c/o University of Karlsruhe (2005). Glossar: Begriffe und Definitionen aus den Risikowissenschaften. Gangemi DOLCE Greciano, G. (2001). L'harmonisation de la terminologie en Sciences du Risque. In Proceedings of Security Conference, Montpellier XII. Council of Europe-FER. Strasbourg, France. Greciano, G. (2001). Les sciences du risque: convergences interculturelles. In Proceedings of Risk Conference, Strasbourg X. Council of Europe-FER. Strasbourg, France. Greciano, G. (2001). Pour un glossaire combinatoire plurilingue du Risque. Proceedings of Risk-Conference, Mèze V. Council of Europe-FER.Strasbourg, France. Massué, J.P. (2001). "Mobilisation de la Communauté scientifique au service de l'amélioration de la gestion des risques". Mèze, FER-EUR-OPA.Strasbourg Nistrup Madsen/Erdman Thomsen 2005, 2009

32 Alexei Milko (Strasbourg-Moscou)
Acknowledgements GLOSSAIRE MULTILINGUE DE LA GESTION DU RISQUE Français / Allemand / Anglais / Espagnol / Roumain / Finlandais / Hongrois / Russe édité par Gertrud Gréciano, Gerhard Budin, Danielle Candel, John Humbley avec le soutien de la Commission de l’Union Européenne, des Universités de Strasbourg, Vienne, Helsinki, de la Région Alsace, de la Délégation générale à la langue française et aux langues de France, et de l’Académie des Sciences d’Autriche. Auteurs: Gertrud Gréciano (Strasbourg), Gerhard Budin (Vienne), Annely Rothkegel (Chemnitz), Ulrike Hass (Essen) Traducteurs: Cornelia Cujba (Iasi), Attila Frigyer (Budapest), Luis Gonzalez (Caracas-Paris), Csilla Höfler-Bornemisza (Vienne), Annikii Liimatainen (Helsinki), Alexei Milko (Strasbourg-Moscou)   Coopération scientifique et technique: Steffi Baumann (Chemnitz), Aban Budin (Vienne), Christian Burghard (Chemnitz), Dimitrij Dobrovolskij (Moscou-Vienne), Eva Haas (Munich-Ispra), Natalia Jonkova (Moscou), Andra Moga (Iasi-Vienne), Maren Runte (Essen), Julia Steuber (Essen), Virginie Tombeux (Paris), Elena Volgina (Moscou)

33 Thank you for your attention
Gerhard Budin Center for Translation Studies University of Vienna Institute of Corpus Linguistics and Text Technology Austrian Academy of Sciences


Download ppt "Austrian Academy of Sciences"

Similar presentations


Ads by Google