Presentation is loading. Please wait.

Presentation is loading. Please wait.

Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

Similar presentations


Presentation on theme: "Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)"— Presentation transcript:

1 Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)

2 Introduction IMDI: International Standards for Language Engineering Metadata Initiative DOBES: Volkswagen Foundation’s Documentation of Endangered Languages initiative AILLA: the Archive of the Indigenous Languages of Latin America

3 Types of resources Audio and video recordings in various digital formats Annotation text files, e.g. transcriptions and translations Standalone texts, e.g. dictionaries, poetry Wide range of genres: from verbal art to scholarly analyses

4 Bundles of resources Session (IMDI, 2001): resources resulting from a linguistic elicitation session - recordings and annotations. Only models one kind of resource production - a recording session. Collections will include a greater variety of resources, in sets of related materials.

5 Types of bundles Canonical bundle: the original session. A digitized recording, in different formats, and some textual annotation files, also in different formats. Minimal bundle: a single file. Examples: dictionary, poem, recording of uninterpretable chants. Meta-bundle: a bundle containing other bundles. Example: a book about a set of annotated recordings.

6 Bundle elements Current: –Name of bundle –Date and place of production Proposed: –Resource relations –Date archived –Last modified

7 Major subschemas Project Collector Content Participants Resources References

8 The Content Subschema Genre is the top-level category: –Interaction: conversation, interview … –Explanation: description, recipe … –Performance: narrative, poem, oratory … –Teaching: primer, textbook … –Analysis: grammar, dictionary …

9 Other Content categories Modality: speech, writing, gesture Communication context: –Interactivity –Planning –Involvement Languages Task Description Keys

10 AILLA’s Content Keys Register: a characterization of how the discourse reflects the social context. Example: honorific speech Style: about poetic and stylistic effects. Examples: parallelism, metered verse.

11 The Project subschema Current elements: –Name: a nickname or acronym –Title: official title –ID: a unique identifier –Contact information Proposed element: –Funder: name of funding organization

12 The Collector subschema AILLA renames this Depositor, since this is the individual we have to keep track of (e.g. for Level 3 access permission). When the Depositor is not also the Collector, Collector can be listed under Participants.

13 The Participants subschema Type: functional role, e.g. creator Role: family relationship Name/Full name Language(s) Ethnic group, age, sex: Education Anonymous: True if participant’s Full name is reserved; False otherwise

14 AILLA additions to Participants Origin: Place (country, region, etc) of origin of the creator of the primary resource in the bundle (e.g. the speaker whose voice is recorded). Occupation: Can be relevant in assessing accuracy of some kinds of data.

15 The Resources subschema Resources contains information about formats and provenance of files in a bundle. Media Files: audio, video, etc. Annotation Files: text files. Proposal: call them all Media Files, to reduce redundancy in the database. (All have URL, size, etc. elements.)

16 Text resources Current elements: –Type: type of annotation, e.g. phonetic transcription. –Content encoding: annotation encoding scheme, e.g. EUROTYP. –Character encoding: character set(s) used in a text file.

17 Text resources 2 Proposed elements: –Transcription type –Translation (aka Glossing) type –Software: used to produce transcriptions, translations, other annotations (e.g. Shoebox) Describe Annotator in Participants (along with Translator, etc.)

18 Proposed subschema Place: composed of several elements: –Continent –Country –Region –Subregion (address) Repeated at least twice, in Bundle and in Participants (Origin). Might also be useful in the Language subschema.

19 Conclusion IMDI schema is a flexible tool. Customization through Key/Value pairs allows local modifications. Most of the proposed changes are terminological, moving from the DOBES in-house terminology to more general usage.


Download ppt "Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)"

Similar presentations


Ads by Google