Presentation is loading. Please wait.

Presentation is loading. Please wait.

Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization.

Similar presentations


Presentation on theme: "Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization."— Presentation transcript:

1 Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization of Lexical Data Aug. 2-5, 2002

2 Aug 2-5, 2002 EMELD Workshop 2002 2 What Is E-Meld? “Electronic Metastructure for Endangered Languages Data”  5 year collaborative project, begun Sept. 2001  Participants:  The LINGUIST List (Eastern Michigan U., Wayne State U., U. of Arizona)  The Linguistic Data Consortium (University of Pennsylvania)  The Endangered Languages Fund (Yale University, Haskins Laboratories)  Funded by NSF

3 Aug 2-5, 2002 EMELD Workshop 2002 3 The LINGUIST List 16,500 subscribers 106 different countries 4 European mirror sites: Tübingen | Stockholm Edinburgh | Moscow

4 Aug 2-5, 2002 EMELD Workshop 2002 4  …the preservation of Endangered Languages data and documentation  …the development of infrastructure for linguistic archives To aid in … Objectives

5 Aug 2-5, 2002 EMELD Workshop 2002 5 Components  Metadata server facilitating access to language resources  Promulgation of best practice in:  Language identification  Resource description  Markup or annotation  Involvement of linguistic community in deciding best practice  Query Room, where questions can be addressed to native speakers  Demonstration project: texts and lexicons from 10 EL’s marked up according to best practice

6 Aug 2-5, 2002 EMELD Workshop 2002 6 Languages Mocovi (Guaicuruan) 7000 speakers [Grondona] Biao Min (Mienic) 21,000 speakers [Solnit] Ega (Kwa) 300 speakers [Gibbon, Connell Cambap (Mambiloid) 30 speakers [Connell] Lakota (Macro-Siouan) [Whalen] Tofa (Turkic) [Harrison]  Two from: Alamblak, Dadibi, Mapos Buang, Takaulu Kalagan, Tuwali Ifugao - [SIL]  Two from Post-Docs as yet to be determined.

7 Aug 2-5, 2002 EMELD Workshop 2002 7 Outreach  Workshops  2001 – Santa Barbara, CA:  focus: metadata, markup, language codes  2002 – Ann Arbor/Ypsilanti, MI  focus: lexicon markup & metadata  2003, 2004: workshops  2005, 2006: “digital institutes”

8 Aug 2-5, 2002 EMELD Workshop 2002 8 Project Emphasis: Breadth  Widest access to information  Web-based tools  Open standards  Simple interfaces

9 Aug 2-5, 2002 EMELD Workshop 2002 9 2001-2 Progress  Metadata Collection:  Search facility  Metadata editor  Language Identification  Query Room  Markup Ontology (U. of Arizona) ORE Ethnologue + LL CodesEthnologue + LL Codes: used throughout LL site OLAC Service Provider (ELF & Rosetta)

10 Aug 2-5, 2002 EMELD Workshop 2002 10 Markup  Focus: morphosyntactic markup  Objective: a system which allows:  Field workers to submit data in different markups  Searcher to retrieve all relevant data despite varying markups  No “gold standard” in linguistic markup  Instead: ontology to serve as “interlanguage” for translation among markups

11 Aug 2-5, 2002 EMELD Workshop 2002 11 Markup  Tool to translate common markup formats (RDF, Shoebox, Word) into XML  Tool to help linguist identify aspects of markup with concepts in the ontology  More on this today from Langendoen, Lewis, and Farrar

12 Aug 2-5, 2002 EMELD Workshop 2002 12 Data Input Tool  Web-based Web-based  Potentially portable  Creates database input– to be output as xml  Can be customized to fit individual language  More on this tomorrow from Martha Ratliff & Zhenwei Chen

13 Aug 2-5, 2002 EMELD Workshop 2002 13 Affiliation w/OLAC  Resource identification  OLAC Service Provider  OLAC = Open Language Archives Community  Part of Open Archives Initiative  Multi-disciplinary initiative to promote multi-archive searching via http protocols

14 Aug 2-5, 2002 EMELD Workshop 2002 14 OLAC Metadata Set  Contributor  Coverage  Creator  Date  Description  Format  Identifier  Language  Publisher  Relation  Rights  Source  Subject  Title  Type Based on Dublin Core Set of 15 Elements With 2 refinements Subject.language Type.linguistic Type.linguistic: Draft of controlled vocabulary

15 Aug 2-5, 2002 EMELD Workshop 2002 15 Data Provider 2: Individual Data Provider 3 (Archive) OLAC Service Provider http: GET or POST Data Provider (Archive) Metadata LINGUIST List Data Provider 2: Individual

16 Aug 2-5, 2002 EMELD Workshop 2002 16 On LINGUIST  OLAC Search: http://linguistlist.org/olac/http://linguistlist.org/olac/  18 archives, 30,000+ records  Metadata Editor (ORE): http://linguistlist.org/olac/ore/ http://linguistlist.org/olac/ore/  Form-based editor  Creates OLAC metadata in xml  Makes it available to OLAC search engine  Language Lookup: http://linguistlist.org/languageshttp://linguistlist.org/languages


Download ppt "Aug 2-5, 2002 EMELD Workshop 2002 1 Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization."

Similar presentations


Ads by Google