Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U.

Similar presentations


Presentation on theme: "Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U."— Presentation transcript:

1 Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U

2 Nov 21, 2005 University of Texas at Austin E-MELD Electronic Metastructure for Endangered Languages Documentation  5 year NSF project, 2001-6  Linguist List, ELF, LDC  Goal: To aid in …the preservation of endangered languages data …the development of infrastructure for electronic archives

3 Nov 21, 2005 University of Texas at Austin Summary of the problem (2001): EL resources were/are Difficult to find Difficult to use Difficult to preserve Needed: More uniformity in naming, cataloguing, annotating, i.e., interoperable standards More knowledge of how to create digital resources that last

4 Nov 21, 2005 University of Texas at Austin Problems with EL resources  Difficult to find  At distributed sites  Language names ambiguous  No central catalog of resources or cataloging information (metadata)  Lack of interoperability among archives  Difficult to display accurately  Idiosyncratic character encoding  Specific fonts needed

5 Nov 21, 2005 University of Texas at Austin Problems with EL resources, 2  Difficult to compare  Non-standard terminology  Idiosyncratic markup & annotation schemes  Difficult to manipulate or reuse  Specific software needed (incl. specific software version), e.g. MSWord 1.0  Meaning represented via formatting, which was not documented  bold represents “headword”

6 Nov 21, 2005 University of Texas at Austin Problems with EL resources, 3 Impermanent—vulnerable to:  Deterioration of the physical media  Hardware obsolescence  Software obsolescence

7 Nov 21, 2005 University of Texas at Austin PHONOGRAMMARCHIV - AUSTRIAN ACADEMY OF SCIENCE slide from Dietrich Schüller, Director

8 Nov 21, 2005 University of Texas at Austin Toward a Solution: E-MELD Components Involve linguistics community in d eveloping standards Promote consensus about:  Language Identification  Metadata  Annotation and markup Teach and facilitate implementation of “best practices” in the creation of digital language documentation

9 Nov 21, 2005 University of Texas at Austin Promoting consensus : annual workshops 2001, Santa Barbara, CA: The Need for Standards E-MELD 2002, Ann Arbor, MI: Digitizing Lexical Information E-MELD 2003, Lansing, MI: Digitizing Texts E-MELD 2004, Detroit, MI: Databases and Best Practice E-MELD 2005, Cambridge, MA: Linguistic Ontologies & Terminology

10 Nov 21, 2005 University of Texas at Austin 2006 E-MELD Workshop on Digital Language Documentation Michigan State University June 20-22, 2006 In conjunction with the 2006 Summer Meeting of the Linguistic Society of America Topic: Electronic Archiving and Digital Tools: Current State & Future Directions Please come!

11 Nov 21, 2005 University of Texas at Austin Finding resources: metadata OLAC metadata standards (subcommunity of OAI) OLAC search engine on LL site:  http://linguistlist.org/olac http://linguistlist.org/olac OLAC metadata editor on LL site:  http://linguistlist.org/olac/ore http://linguistlist.org/olac/ore XSL Stylesheets for transformation / presentation of OLAC metadata Ethnologue/LL language codes proposed as ISO standard

12 Nov 21, 2005 University of Texas at Austin Using resources: comparing and finding annotation Ontologies developed (as interlanguage between markups and as search aids)  GOLD: General Ontology for Linguistic Description (morphosyntax)  OPF: Ontology of Phonetic Features (based on Ladefoged & Madison) ODIN Project: mining interlinear glossed text on the web (Will Lewis et al)

13 Nov 21, 2005 University of Texas at Austin Using resources: Tools Tools to encourage use of the ontology:  OntoElan: text annotation (modification of MPI’s Elan)  OntoGloss: stand-off annotation tool  FIELD: lexical input Tool to encourage use of Unicode  CharWrite: input of Unicode characters Facility to encourage use of OLAC metadata  Stylesheet library  ORE

14 Nov 21, 2005 University of Texas at Austin Facilitating ‘Best Practices’ in resource creation Creation of reference website School of Best Practices in Digital Language Documentation http://emeld.org/school/ Addressed to the individual linguist who creates language documentation

15 Nov 21, 2005 University of Texas at Austin What should the linguist do? To ensure that digital data endure long into the future: 1.Create an archival copy: Put the materials into an enduring file format. 2.Deposit the materials with an archive that will make a practice of periodically migrating them to new storage media as needed.

16 Nov 21, 2005 University of Texas at Austin Organization of the School Entrance Hall: orientation Classroom: lessons & tutorials Reading Room: bibliography Work Room: online work Tool Room: links to tools Help (incl. Ask an Expert) Case Studies: documentation of 10 ELs digitized according to best practices

17 Nov 21, 2005 University of Texas at Austin Currently School has: Documentation from 12 ELs: MocoviKayardild MonguorPotawatomi TofaEga SalibaNavajo Biao MienW. Sissala (Chorote)(Nivacle)

18 Nov 21, 2005 University of Texas at Austin Current Initiatives Identify and record metadata for legacy documentation Improve the ontology (GOLD) – incorporate suggestions from 2005 E-MELD workshop Finish prototyped software

19 Nov 21, 2005 University of Texas at Austin Future: finish prototyped software OntoElan: ontology-aware modification of MPI’s Elan annotation tool OntoGloss: ontology-aware stand-off annotation tool CharWrite: downloadable tool for web-input of Unicode characters CharWrite FIELD: Field Input Environment for Linguistic Data All but OntoGloss available through the School of Best Practices website

20 Nov 21, 2005 University of Texas at Austin Current Initiatives: School of BP Make the School even more practical  Distinguish between good, better, best practice  Emphasize  explicit ‘how-to’ pages  Different paths for different user types  Advice from experts, e.g. “equipment on a budget” page, Ask-An-Expert

21 Nov 21, 2005 University of Texas at Austin Practices in resource creation Good practice: ensure preservation Better practice: ensure longterm intelligibility  “We don’t want to create another Rosetta Stone” - Whalen, 2003 Best practice: promote interoperability

22 Nov 21, 2005 University of Texas at Austin School of Best Practices in Digital Language Documentation http://emeld.org/school/

23 Nov 21, 2005 University of Texas at Austin Future Directions MultiTree LL-MAP

24 Nov 21, 2005 University of Texas at Austin What is MultiTree? 3-year grant Database of all hypothesized language relations Ultimately linked to GIS database Interface to allow linguists to input updates Panel of experts to assess input

25 Nov 21, 2005 University of Texas at Austin LL-MAP Collect geographically linked linguistic data Build this into a GIS system, allowing layers of information to be built into a single map Then… Build tools for querying, annotating and discussing this data Build tools which allow new language data from linguists and anthropologists to be incorporated into this system


Download ppt "Nov 21, 2005University of Texas at Austin The E-MELD Project Helen Aristar Dry & Anthony Aristar The LINGUIST List Eastern Michigan U & Wayne State U."

Similar presentations


Ads by Google