Presentation is loading. Please wait.

Presentation is loading. Please wait.

Accessing distributed linguistic resources An XML based architecture Laurent Romary Laboratoire Loria, Nancy (F) Samuel Cruz-Lara, Patrice Bonhomme, Christophe.

Similar presentations


Presentation on theme: "Accessing distributed linguistic resources An XML based architecture Laurent Romary Laboratoire Loria, Nancy (F) Samuel Cruz-Lara, Patrice Bonhomme, Christophe."— Presentation transcript:

1 Accessing distributed linguistic resources An XML based architecture Laurent Romary Laboratoire Loria, Nancy (F) Samuel Cruz-Lara, Patrice Bonhomme, Christophe de Saint Rat

2 Overview 4 Objectives 4 General Network organization 4 Role of XML in the architecture 4 Implementation 4 Perspectives

3 Objectives 4 Distributed access to linguistic resources –Linguistic resources multilingual texts (books, newspaper articles), mono or multilingual dictionaries, transcription of spoken data etc. –Usages Researchers: linguists, lexicographers Professionals: translators, teachers Larger public: information on language use

4 Objectives - cont. –Distributed servers Local maintenance of resources –Linguistic competence (Finnish!) –Specific philological and/or scholar competencies (historical manuscripts, transcriptions of ethnographic work etc.) –Copyright aspects (local agreements with editors) Distribution and allocation of load –Large amount of data –Main processing done on the server side

5 General context 4 National –Silfide project CNRS and Agence des Universités Francophones Registering and distributing French linguistic resources 4 European –MLIS/Elan project EU - DG XIII funding Networkig existing LR access environments

6 General Network Organization

7 User scenario (workflow) 4 User connection 4 Selection of servers –server profiles 4 Selection of resources –header queries 4 Content queries –Concordances, word lists, statistics etc.

8 Servers: two main sets of functionalities 4 Local access servers –User identification (User DB) –Query broadcast - Result set merging 4 Resource servers –Query interpretation (resource DB)

9 An extensive use of XML –Linguistic resources are semi-structured documents (cf. Abiteboul, Buneman etc.) –Linguistic resources have for long (but not everywhere) been encoded in SGML Cf. TEI: Text Encoding Initiative –Historical links between the TEI and XML MC Sperberg-McQueen, Steve de Rose, Henry Thompson etc.

10 XML and linguistic resources 4 Being able to isolate sub-documents –E.g. dictionary entries, concordance lines etc. 4 Being able to filter|merge|sort data extensively –E.g. combining results extracted from various (and probably heterogeneous) documents 4 Introducing flexibility in document presentation (cf. variety of usages): XSL

11 Document structure - XML … … … … …

12 Document structure

13 XML in the network architecture 4 Why? Coherence between the content and the “glue” E.g. combining results and user information 4 How? At the user level –User identification –Workspace At the information flow level –Queries –Result sets

14 An umbrella document: SIL 4 SIL: Silfide Interface Language

15 User Information ( ) 4 : user name Patrice Bonhomme bonhomme@loria.fr 4 : organization information Attribute status=public|private etc.

16 Workspace ( ) 4 : List of preferences 4 +: List of resources 4 ?: access history

17 Queries ( ) 4 A query language combining: –Constraints the XML structure (à la Xpath) –Constraints on the linguistic content ELAN Common Query Language to be implemented (or interfaced) by all servers 4 Rem: To be merged with recent proposals on XQL

18 Query Language: example

19 Result sets ( ) 4 : metadata information about the result (cf. query) 10 20 4 : a list of elementary results/records Time flies like an arrow

20 Putting things together SilUI/XML SilWS/XML Query SilQL/XML Broadcast Result SilRS/XML

21 Implementation 4 Main technical choices –Access servers implemented as Java servlets within an http server –Resource servers interfaced through a servlet 4 A single element of centralization: the Network Management Unit (NMU) –Corba connection to query and administrate the NMU

22 Administration RS_status NmuClientServlet Dispatcher ResourceServlet Server 1 CORBA HTTP / XML Web Browser RS_status NmuClientServlet Dispatcher ResourceServlet Server 2 N M U Client Applet

23 RS_status NmuClientServlet Dispatcher ResourceServlet Server 1 CORBA HTTP / XML Web Browser RS_status NmuClientServlet Dispatcher ResourceServlet Server 2 N M U Client Applet

24 Cache capabilities DB Leiden ElanQueryHandler driver connection + native/SilRS cache Silfide server QueryServlet cache Silfide server QueryServlet DB Birmingham connection + native/SilRS ElanQueryHandler driver cache Silfide server BroadcastServlet SIL/CQL/XML SIL/RS/XML SIL/CQL/XML SIL/RS/XML

25 Conclusions 4 Experiment A first network with Nancy(FR), Birmingham(UK), Leiden (NL)[, Pisa(IT)] Check demo availability at http://www.loria.fr/projets/MLIS/ELAN 4 Genericity of the model –Coping with other distributed information environment

26 Perspectives –Specific problems associated with linguistic resources –Clusters of documents (e.g. multilingual alignment) — RDF? –On-line edition/annotation of documents –Aiming at a moving target XSL: self-contained filtering mechanisms XQL: real DB+query engines associated with XML? –Still: experimenting is VERY useful to understand problems and make things evolve


Download ppt "Accessing distributed linguistic resources An XML based architecture Laurent Romary Laboratoire Loria, Nancy (F) Samuel Cruz-Lara, Patrice Bonhomme, Christophe."

Similar presentations


Ads by Google