Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation Peter Wittenburg MPI for Psycholinguistics Nijmegen NL Intera INTERA WP2 Summary.

Similar presentations


Presentation on theme: "1 MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation Peter Wittenburg MPI for Psycholinguistics Nijmegen NL Intera INTERA WP2 Summary."— Presentation transcript:

1 1 MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation Peter Wittenburg MPI for Psycholinguistics Nijmegen NL Intera INTERA WP2 Summary November 2004

2 2 What is Metadata? Intera Annotation Resource Metadata Description Language about Researcher Modalities Content Type Informant Name Age Microphone Type Resource Pointers etc etc Sound Resource Video Resource Primary Functions of MD visibility of resources searching/browsing organization of corpus management of corpus event documentation etc Emerging Functions of MD metadata is virtual fingerprint of the resource can be used instead of resource ready for the Semantic Web – virtual resource domains INTERA WP2 Summary November 2004

3 3 Intera INTERA WP2 Summary November 2004 Metadata Process can be any type of Language Resource (Annotated Media, Lexica, Grammars, etc) Language Resource Creation the creation process is iterative, mostly very complex and dependent on the resource type Metadata Description MD Creation IMDI provides a core description and special extensions for resource types the creation process is comparatively simple; any time the resource is updated some MD information has to be updated as well Large Collection of LR can be grouped to large distributed LR collections Large Catalogue of MD can be grouped to large distributed MD catalogues Content Search MD Search searching for resources possible

4 4 strategic goals are about survival after project lifetime stimulate the idea of a building a joint metadata domain “critical mass” idea ISO standardization based impact from few subcontractors to over 50 institutions world-wide ISO TC37/SC4 standardization activity (ISO, ->industry) LIRICS – adaptation of relevant tools to ISO DCR DAM-LR – bring the DELAMAN archives into Data-GRID web-based exploration and commentary frameworks MPI, CMU, U Melbourne, etc working on this but metadata creation is hard, it also means organizing, cleaning … needs more evangelization and benefits Strategic Goals and Impact Intera INTERA WP2 Summary November 2004

5 5 DAM-LR/DELAMAN GRID Intera MPI AILLA EMELD ANLC LACITO ELAR PARADISEC AMPM LundINL INTERA WP2 Summary November 2004

6 6 IMDI 3.04 now stable and part of ISO standardization efforts all categories are in ISO DCR (WP3) DCR is key element on the way to Semantic Web IMDI infrastructure now mature and stable (open source, free) professional IMDI Editor (creating correct IMDI XML) CV editor IMDI browser (can operate in linked IMDI XML domains) gateway to OLAC and Dublin Core HTML browsing Google-like and complex searching Access Rights Management portal creation web-based Ingestion (not Intera - in progress) web-based exploration (not Intera – in progress) Stabilization and Framework Intera INTERA WP2 Summary November 2004

7 7 WP3 Issues Getting Metadata into the Semantic Web Framework just this whole week ISO TC37/SC4 meeting in Pisa IMDI is in the ISO DCR all ISO and ISO compliant localization of IMDI in DCR (Se, Gr, D, E, Fr, Nl, It, Sp) ISO DCR is based on XML (not RDF) SYNTAX tool at LORIA is web-accessible next steps: integrate OLAC(DC) and TEI (LIRICS) link tools with SYNTAX via Web-services already done for a lexicon tool still deep discussions (is_a, has_a relation) separate relation repositories (in RDF/OWL of course) different layers of DCRs remains an issue Intera INTERA WP2 Summary November 2004

8 8 WP3 DCR Intera INTERA WP2 Summary November 2004

9 9 IMDI Editor also supports node creation and profiles Intera INTERA WP2 Summary November 2004

10 10 Corpus Structure Building Intera INTERA WP2 Summary November 2004

11 11 IMDI Browser Intera also supports lexica, catalogue metadata and profiles INTERA WP2 Summary November 2004

12 12 Structured IMDI Search Intera INTERA WP2 Summary November 2004

13 13 HTML Browsing Intera INTERA WP2 Summary November 2004

14 14 Unstructured Search Intera INTERA WP2 Summary November 2004

15 15 Access Rights Management Intera INTERA WP2 Summary November 2004

16 16 MD Infrastructure/Portal INTERA Review November 2003 corpus structure generation Metadata Editing IMDI Editor Excel IMDI Domain via INTERNET BAS MPI Browsing & Searching IMDI Browser & IE Corpus exploitation (WP4) SSSSSSSSSSSS HRELP Workshop London November 2003 Intera

17 17 State INTERA sub-contracts INTERA Domain PartnerSubcontractorCorpusType MPIBASSmartkommultimodalintegrated MPIBASVerbmobil and othersSpeech, textintegrated MPIMeertensDialect Corpusspeechintegrated MPIU FlorenceLablitaspeech textintegrated MPIU FlorenceCORAL ROMSemantics extintegrated MPIDutch Spoken Corpusspeech textintegrated MPIGesture corpusmultimodalintegrated MPIESF Second Learner Corpusspeech textintegrated MPIPMOLL Corpusspeech textintegrated MPIvarious otherssign speech textintegrated USAARDFKINegra, Tigerannotated textto be integrated USAARCLPP BulgHPSGtreebankto be integrated USAARU Iasi1984textto be integrated LORIAATILFFrantext, etctextto be integrated ELDAcatalogue resourcesvariousintegrated ILSP/ILCtextual corporavariousintegrated Intera INTERA WP2 Summary November 2004

18 18 IMDI Domain Intera University of Uppsala University of Stavanger University of Lund University of Leipzig University of Erfurt University of Leiden University of Frankfurt … International Federal University of Rio de Janeiro University of Colorado University of Buenos Aires University of Kansas University of Victoria University of Sydney University of Melbourne E Michigan University Wayne State University AILLA Austin … Europe ELRA Paris INALF Nancy DFKI Saarbrücken University of Saarland Bavarian Speech Archive Munich Meertens Institute Amsterdam University of Florence ILSP Athens ILC Pisa University of Madrid Max-Planck-Institute Nijmegen University of Kiel University of Bochum Free University of Berlin University of Bonn University of Bielefeld University of Helsinki Phonogrammarchiv Vienna University of Groningen Kotus Project Helsinki Sweden’s National Dialect Archive Lund European Sign Language Communities (Se, UK NL, D) University of Utrecht Big problem: integration and portal effort INTERA WP2 Summary November 2004

19 19 Conclusions contracts are difficult – much overhead for little money no broad experience for MD creation much interaction necessary over all aspects no standard contract form – adaptations needed institutes often wanted more money than expected rather chaotic situation in some cases as basis some cases no handiness with XML problems with changing student assistants special wishes wrt MD (IMDI flexible enough) MPI expected stepwise availability – delivery at the end is practice strong support for the ENABLER declaration necessary creating MD remains extra work MD Creation Problems Intera INTERA WP2 Summary November 2004

20 20 Portal Creation – XML Browsing Intera Task: creation of a web-site that offers all options for a selected domain of IMDI resources just get the URL’s and create a root node INTERA WP2 Summary November 2004

21 21 Portal Creation – Searching Intera Portal Node IMDI Repositories Fast Index harvest all data by traversing links and validate create a fast index file (using Java Library DBMS) just select a button in the browser so: simple, everyone can setup a portal INTERA WP2 Summary November 2004

22 22 Portal Creation – HTML Support Intera Database Web-Server BAS Web-Server MPI TOMCAT Server IMDI-Web- Interface Web Client Portal SiteIMDI Provider install Tomcat server and IMDI-Web-Interface software traverses tree to establish database large index file is created under the cover give a HTML entry point (HTTP server) INTERA WP2 Summary November 2004

23 23 Portal Creation – DC/OLAC Gateway Intera Portal Node IMDI Repositories Fast Index the database can be used to fulfill the OAI protocol for metadata harvesting; any record can be served Servlet OAI-PMH DC Service Provider INTERA WP2 Summary November 2004

24 24 Dissemination Dissemination / Events Intern Metadata WorkshopNijmegenNovember 02 Open Forum on Metadata Registries Santa Fe January 03 Lexicon WorkshopMunichFebruary 03 Workshop on Resource Storage and AccessGöttingenFebruary 03 Intern Workshop on LR ArchivingLondonMarch 03 Sign Language Workshop NijmegenMay 03 Intern E-Meld WorkshopYpsilantiJuly 03 Intern Linguistic Congress PragueJuly 03 ENABLER WorkshopParisAugust 03 DRH MeetingCheltenhamSeptember 03 Intern PARADISEC Archiving WorkshopSydneyOctober 03 HRELP Archiving WorkshopLondonNovember 03 etc LREC 2004 – Demonstration of infrastructure and MD domain Two Metadata Flyer (MPI – U Lund) distributed at various occasions Web-Site Design several training workshops done Intera INTERA WP2 Summary November 2004

25 25 INTERA Portal Screenshots INTERA WP2 Summary November 2004

26 26

27 27

28 28

29 29

30 30

31 31


Download ppt "1 MPI WP2/3 Report Metadata Integrated Resource Domain Portal Creation Peter Wittenburg MPI for Psycholinguistics Nijmegen NL Intera INTERA WP2 Summary."

Similar presentations


Ads by Google