Presentation is loading. Please wait.

Presentation is loading. Please wait.

MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation

Similar presentations


Presentation on theme: "MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation"— Presentation transcript:

1 MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation
Intera MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation Peter Wittenburg MPI for Psycholinguistics Nijmegen NL INTERA WP2 Summary November 2004

2 What is Metadata? Intera Annotation Resource Primary Functions of MD
visibility of resources searching/browsing organization of corpus management of corpus event documentation etc Metadata Description Language about Researcher Modalities Content Type Informant Name Age Microphone Type Resource Pointers etc etc Sound Resource Video Resource Emerging Functions of MD metadata is virtual fingerprint of the resource can be used instead of resource ready for the Semantic Web – virtual resource domains INTERA WP2 Summary November 2004

3 Metadata Process Intera Large Collection of LR can be grouped to large
distributed LR collections Large Catalogue of MD can be grouped to large distributed MD catalogues Content Search MD searching for resources possible can be any type of Language Resource (Annotated Media, Lexica, Grammars, etc) Metadata Description MD Creation IMDI provides a core description and special extensions for resource types the creation process is comparatively simple; any time the resource is updated some MD information has to be updated as well Language Resource Resource Creation the creation process is iterative, mostly very complex and dependent on the resource type INTERA WP2 Summary November 2004

4 Strategic Goals and Impact
Intera strategic goals are about survival after project lifetime stimulate the idea of a building a joint metadata domain “critical mass” idea ISO standardization based impact from few subcontractors to over 50 institutions world-wide ISO TC37/SC4 standardization activity (ISO, ->industry) LIRICS – adaptation of relevant tools to ISO DCR DAM-LR – bring the DELAMAN archives into Data-GRID web-based exploration and commentary frameworks MPI, CMU, U Melbourne, etc working on this but metadata creation is hard, it also means organizing, cleaning … needs more evangelization and benefits INTERA WP2 Summary November 2004

5 DAM-LR/DELAMAN GRID Intera EMELD ELAR INL MPI Lund ANLC AILLA AMPM
LACITO PARADISEC INTERA WP2 Summary November 2004

6 Stabilization and Framework
Intera IMDI 3.04 now stable and part of ISO standardization efforts all categories are in ISO DCR (WP3) DCR is key element on the way to Semantic Web IMDI infrastructure now mature and stable (open source, free) professional IMDI Editor (creating correct IMDI XML) CV editor IMDI browser (can operate in linked IMDI XML domains) gateway to OLAC and Dublin Core HTML browsing Google-like and complex searching Access Rights Management portal creation web-based Ingestion (not Intera - in progress) web-based exploration (not Intera – in progress) INTERA WP2 Summary November 2004

7 WP3 Issues Intera Getting Metadata into the Semantic Web Framework
just this whole week ISO TC37/SC4 meeting in Pisa IMDI is in the ISO DCR all ISO and ISO compliant localization of IMDI in DCR (Se, Gr, D, E, Fr, Nl, It, Sp) ISO DCR is based on XML (not RDF) SYNTAX tool at LORIA is web-accessible next steps: integrate OLAC(DC) and TEI (LIRICS) link tools with SYNTAX via Web-services already done for a lexicon tool still deep discussions (is_a, has_a relation) separate relation repositories (in RDF/OWL of course) different layers of DCRs remains an issue INTERA WP2 Summary November 2004

8 WP3 DCR Intera INTERA WP2 Summary November 2004

9 IMDI Editor Intera also supports node creation and profiles INTERA
WP2 Summary November 2004

10 Corpus Structure Building
Intera INTERA WP2 Summary November 2004

11 IMDI Browser Intera also supports lexica, catalogue metadata and profiles INTERA WP2 Summary November 2004

12 Structured IMDI Search
Intera INTERA WP2 Summary November 2004

13 HTML Browsing Intera INTERA WP2 Summary November 2004

14 Unstructured Search Intera INTERA WP2 Summary November 2004

15 Access Rights Management
Intera INTERA WP2 Summary November 2004

16 MD Infrastructure/Portal
Intera Browsing & Searching IMDI Browser & IE IMDI Domain via INTERNET corpus structure generation MPI BAS Metadata Editing IMDI Editor Excel S S S S S S S S S S S S Corpus exploitation (WP4) INTERA Review November 2003 HRELP Workshop London November 2003

17 INTERA Domain Intera State INTERA sub-contracts CORAL ROM
Partner Subcontractor Corpus Type MPI BAS Smartkom multimodal integrated Verbmobil and others Speech, text Meertens Dialect Corpus speech U Florence Lablita speech text CORAL ROM Semantics ext Dutch Spoken Corpus Gesture corpus ESF Second Learner Corpus PMOLL Corpus various others sign speech text USAAR DFKI Negra, Tiger annotated text to be integrated CLPP Bulg HPSG treebank U Iasi 1984 text LORIA ATILF Frantext, etc ELDA catalogue resources various ILSP/ILC textual corpora INTERA WP2 Summary November 2004

18 IMDI Domain Intera Europe ELRA Paris University of Uppsala INALF Nancy
DFKI Saarbrücken University of Saarland Bavarian Speech Archive Munich Meertens Institute Amsterdam University of Florence ILSP Athens ILC Pisa University of Madrid Max-Planck-Institute Nijmegen University of Kiel University of Bochum Free University of Berlin University of Bonn University of Bielefeld University of Helsinki Phonogrammarchiv Vienna University of Groningen Kotus Project Helsinki Sweden’s National Dialect Archive Lund European Sign Language Communities (Se, UK NL, D) University of Utrecht University of Uppsala University of Stavanger University of Lund University of Leipzig University of Erfurt University of Leiden University of Frankfurt International Federal University of Rio de Janeiro University of Colorado University of Buenos Aires University of Kansas University of Victoria University of Sydney University of Melbourne E Michigan University Wayne State University AILLA Austin Big problem: integration and portal effort INTERA WP2 Summary November 2004

19 MD Creation Problems Intera Conclusions
contracts are difficult – much overhead for little money no broad experience for MD creation much interaction necessary over all aspects no standard contract form – adaptations needed institutes often wanted more money than expected rather chaotic situation in some cases as basis some cases no handiness with XML problems with changing student assistants special wishes wrt MD (IMDI flexible enough) MPI expected stepwise availability – delivery at the end is practice strong support for the ENABLER declaration necessary creating MD remains extra work INTERA WP2 Summary November 2004

20 Portal Creation – XML Browsing
Intera Task: creation of a web-site that offers all options for a selected domain of IMDI resources just get the URL’s and create a root node INTERA WP2 Summary November 2004

21 Portal Creation – Searching
Intera harvest all data by traversing links and validate create a fast index file (using Java Library DBMS) just select a button in the browser so: simple, everyone can setup a portal Portal Node IMDI Repositories Fast Index INTERA WP2 Summary November 2004

22 Portal Creation – HTML Support
Intera install Tomcat server and IMDI-Web-Interface software traverses tree to establish database large index file is created under the cover give a HTML entry point (HTTP server) Web Client TOMCAT Server Web-Server MPI Web-Server BAS IMDI-Web-Interface Database INTERA WP2 Summary November 2004 Portal Site IMDI Provider IMDI Provider

23 Portal Creation – DC/OLAC Gateway
Intera DC Service Provider the database can be used to fulfill the OAI protocol for metadata harvesting; any record can be served Servlet OAI-PMH Portal Node Fast Index INTERA WP2 Summary November 2004 IMDI Repositories

24 Dissemination Intera Dissemination / Events
Intern Metadata Workshop Nijmegen November 02 Open Forum on Metadata Registries Santa Fe January 03 Lexicon Workshop Munich February 03 Workshop on Resource Storage and Access Göttingen February 03 Intern Workshop on LR Archiving London March 03 Sign Language Workshop Nijmegen May 03 Intern E-Meld Workshop Ypsilanti July 03 Intern Linguistic Congress Prague July 03 ENABLER Workshop Paris August 03 DRH Meeting Cheltenham September 03 Intern PARADISEC Archiving Workshop Sydney October 03 HRELP Archiving Workshop London November 03 etc LREC 2004 – Demonstration of infrastructure and MD domain Two Metadata Flyer (MPI – U Lund) distributed at various occasions Web-Site Design several training workshops done INTERA WP2 Summary November 2004

25 INTERA Portal Screenshots
WP2 Summary November 2004

26

27

28

29

30

31


Download ppt "MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation"

Similar presentations


Ads by Google