Accessing distributed linguistic resources An XML based architecture Laurent Romary Laboratoire Loria, Nancy (F) Samuel Cruz-Lara, Patrice Bonhomme, Christophe.

Slides:



Advertisements
Similar presentations
The way to open resources Laurent Romary CNRS. Two aspects of scientific communication Research papers –All types (Conferences, journals, grey literature.
Advertisements

CNES implementation of the ISO standard An extension of the current CNES implementation of the ISO metadata standard.
XML-Based Content Management Framework for Digital Museum Joshua J.S. HONG, National Chi Nan University TAIWAN Samuel CRUZ-LARA,
Distributed search for complex heterogeneous media Werner Bailer, José-Manuel López-Cobo, Guillermo Álvaro, Georg Thallinger Search Computing Workshop.
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Networking of Legal Information Websites in Europe - experiences and challenges Aki Hietanen Ministry of Justice, Finland.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
XML Technology in E-Commerce
MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary,
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Building a Digital Library with Fedora International Conference on Developing Digital Institutional Repositories Hong Kong December 9, 2004.
What is.NET?. The Clients of.NET a) A new generation of connected application b) Microsoft.NET Framework managed execution c) Allows PCs and other smart.
Uncovering the TEI and ODD A pedagogical strip-tease Laurent Romary - Max Planck Digital Library.
MIS 2211 The Internet from a Technology Perspective A network of networks Comprised of hundreds of thousands of networks (nodes) throughout the world Very.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Architecture & Data Management of XML-Based Digital Video Library System Jacky C.K. Ma Michael R. Lyu.
Danius T. Michaelides, David E. Millard, Mark J. Weal, David De Roure Auld Leaky: A Contextual Open Hypermedia Link Server.
EAGLES/ISLE Workshop LREC 2000 Athens, Greece The XML Framework Its Implications for Corpus Access and Use Nancy Ide Department of Computer Science Vassar.
Progress Report 11/1/01 Matt Bridges. Overview Data collection and analysis tool for web site traffic Lets website administrators know who is on their.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
LREC 2000 Athens, Greece An XML-based Encoding Standard for Language Corpora Nancy Ide Vassar College Patrice Bonhomme LORIA/CNRS Laurent Romary LORIA/CNRS.
“This presentation is for informational purposes only and may not be incorporated into a contract or agreement.”
IT 210 The Internet & World Wide Web introduction.
Dr. Kurt Fendt, Comparative Media Studies, MIT MetaMedia An Open Platform for Media Annotation and Sharing Workshop "Online Archives:
An Introduction to XML Patrice Bonhomme & Laurent Romary Lucid-ITLORIA eXtensible Markup Language version 1.0 Recommendation,
Taiwan Network Information Center Introduction to TWNIC RMS (Resource Management System) 15 th APNIC NIR Meeting David Chen Feb 26,
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) in.
1 6th EC/GIS workshop - Lyon - June 2000 Easy and friendly access to geographic information for mobile users David HELLO (Matra.
Brian Matthews, CRIS 2002, 30/08/02 ERIS Workshop, CRIS2002 Architecture Brian Matthews, Business & Information Technology Dept, CLRC
ICS-FORTH January 11, Thesaurus Mapping Martin Doerr Foundation for Research and Technology - Hellas Institute of Computer Science Bath, UK, January.
XML Registries Source: Java TM API for XML Registries Specification.
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
Distributed Information Retrieval Using a Multi-Agent System and The Role of Logic Programming.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Lecture 22: Client-Server Software Engineering
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #22 Secure Web Information.
WEB MINING. In recent years the growth of the World Wide Web exceeded all expectations. Today there are several billions of HTML documents, pictures and.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
XML and Its Applications Ben Y. Zhao, CS294-7 Spring 1999.
TMF - Terminological Markup Framework Laurent Romary Laboratoire LORIA (CNRS, INRIA, Universités de Nancy) ISO meeting London, 14 August 2000.
Lundi 7 décembre 2015 Lavoisier. Motivations data sources provided by many partners –heterogeneity of used technologies objectives –reduce complexity.
Advanced Web Technologies By: Faraz Ahmed. Contents 0 Course Outline 0 Architectures 0 HTTP.
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA — Vassar College.
Fedora Content Modeling for Improved Services for Research Databases Open Repositories 2009 Mikael Karstensen Elbæk Alfred Heller Gert Schmeltz Pedersen.
Human-Centred Knowledge-Based Model Access Service for Engineers Peter Katranuschkov & Alexander Gehre TU Dresden, Germany CIB W78 Conference Aarhus.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
IAEA International Atomic Energy Agency INIS Collection Search: Introduction and main features The Role of the International Nuclear Information System.
Dispatching Java agents to user for data extraction from third party web sites Alex Roque F.I.U. HPDRC.
Model Design using Hierarchical Web-Based Libraries F. Bernardi Pr. J.F. Santucci {bernardi, University of Corsica SPE Laboratory.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Brian Matthews, euroCRIS, 18/09/03 CRIS architecture to support an ERA Brian Matthews.
Google Code Libraries Dima Ionut Daniel. Contents What is Google Code? LDAPBeans Object-ldap-mapping Ldap-ODM Bug4j jOOR Rapa jongo Conclusion Bibliography.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Apache Cocoon – XML Publishing Framework 데이터베이스 연구실 박사 1 학기 이 세영.
2 Copyright © Oracle Corporation, All rights reserved. Basic Oracle Net Architecture.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
CX Introduction to Web Programming
Server Concepts Dr. Charles W. Kann.
Processes The most important processes used in Web-based systems and their internal organization.
The Re3gistry software and the INSPIRE Registry
Magnet & /facet Zheng Liang
A Generic System for Clearinghouses
QoS Metadata Status 106th OGC Technical Committee Orléans, France
Presentation transcript:

Accessing distributed linguistic resources An XML based architecture Laurent Romary Laboratoire Loria, Nancy (F) Samuel Cruz-Lara, Patrice Bonhomme, Christophe de Saint Rat

Overview 4 Objectives 4 General Network organization 4 Role of XML in the architecture 4 Implementation 4 Perspectives

Objectives 4 Distributed access to linguistic resources –Linguistic resources multilingual texts (books, newspaper articles), mono or multilingual dictionaries, transcription of spoken data etc. –Usages Researchers: linguists, lexicographers Professionals: translators, teachers Larger public: information on language use

Objectives - cont. –Distributed servers Local maintenance of resources –Linguistic competence (Finnish!) –Specific philological and/or scholar competencies (historical manuscripts, transcriptions of ethnographic work etc.) –Copyright aspects (local agreements with editors) Distribution and allocation of load –Large amount of data –Main processing done on the server side

General context 4 National –Silfide project CNRS and Agence des Universités Francophones Registering and distributing French linguistic resources 4 European –MLIS/Elan project EU - DG XIII funding Networkig existing LR access environments

General Network Organization

User scenario (workflow) 4 User connection 4 Selection of servers –server profiles 4 Selection of resources –header queries 4 Content queries –Concordances, word lists, statistics etc.

Servers: two main sets of functionalities 4 Local access servers –User identification (User DB) –Query broadcast - Result set merging 4 Resource servers –Query interpretation (resource DB)

An extensive use of XML –Linguistic resources are semi-structured documents (cf. Abiteboul, Buneman etc.) –Linguistic resources have for long (but not everywhere) been encoded in SGML Cf. TEI: Text Encoding Initiative –Historical links between the TEI and XML MC Sperberg-McQueen, Steve de Rose, Henry Thompson etc.

XML and linguistic resources 4 Being able to isolate sub-documents –E.g. dictionary entries, concordance lines etc. 4 Being able to filter|merge|sort data extensively –E.g. combining results extracted from various (and probably heterogeneous) documents 4 Introducing flexibility in document presentation (cf. variety of usages): XSL

Document structure - XML … … … … …

Document structure

XML in the network architecture 4 Why? Coherence between the content and the “glue” E.g. combining results and user information 4 How? At the user level –User identification –Workspace At the information flow level –Queries –Result sets

An umbrella document: SIL 4 SIL: Silfide Interface Language

User Information ( ) 4 : user name Patrice Bonhomme 4 : organization information Attribute status=public|private etc.

Workspace ( ) 4 : List of preferences 4 +: List of resources 4 ?: access history

Queries ( ) 4 A query language combining: –Constraints the XML structure (à la Xpath) –Constraints on the linguistic content ELAN Common Query Language to be implemented (or interfaced) by all servers 4 Rem: To be merged with recent proposals on XQL

Query Language: example

Result sets ( ) 4 : metadata information about the result (cf. query) : a list of elementary results/records Time flies like an arrow

Putting things together SilUI/XML SilWS/XML Query SilQL/XML Broadcast Result SilRS/XML

Implementation 4 Main technical choices –Access servers implemented as Java servlets within an http server –Resource servers interfaced through a servlet 4 A single element of centralization: the Network Management Unit (NMU) –Corba connection to query and administrate the NMU

Administration RS_status NmuClientServlet Dispatcher ResourceServlet Server 1 CORBA HTTP / XML Web Browser RS_status NmuClientServlet Dispatcher ResourceServlet Server 2 N M U Client Applet

RS_status NmuClientServlet Dispatcher ResourceServlet Server 1 CORBA HTTP / XML Web Browser RS_status NmuClientServlet Dispatcher ResourceServlet Server 2 N M U Client Applet

Cache capabilities DB Leiden ElanQueryHandler driver connection + native/SilRS cache Silfide server QueryServlet cache Silfide server QueryServlet DB Birmingham connection + native/SilRS ElanQueryHandler driver cache Silfide server BroadcastServlet SIL/CQL/XML SIL/RS/XML SIL/CQL/XML SIL/RS/XML

Conclusions 4 Experiment A first network with Nancy(FR), Birmingham(UK), Leiden (NL)[, Pisa(IT)] Check demo availability at 4 Genericity of the model –Coping with other distributed information environment

Perspectives –Specific problems associated with linguistic resources –Clusters of documents (e.g. multilingual alignment) — RDF? –On-line edition/annotation of documents –Aiming at a moving target XSL: self-contained filtering mechanisms XQL: real DB+query engines associated with XML? –Still: experimenting is VERY useful to understand problems and make things evolve