OLAC State of the Archives A summary of implementation practices during the first year.

Slides:



Advertisements
Similar presentations
Ali Alshowaish. dc.coverage element articulates limitations in the scope of the resource, typically along the following lines: geographical, temporal,
Advertisements

DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
Agents and Authority Linking Breakout sessions Oct 4 2-4, Oct 5 3-4:30 Goals: Explore issues in Agent discovery. Do we need an Agents working group? If.
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
IRCS Workshop on Open Language Archives 1 OLAC Access Vocabulary Heidi Johnson / AILLA.
Outreach Jeff Good UC Berkeley. OLAC's Needs Maximal involvement from the whole community –The more data providers involved the more useful the services.
The OLAC Metadata Set Gary Simons Workshop on The Digitization of Language Data: The Need for Standards June 2001.
IUFRO International Union of Forest Research Organizations Eero Mikkola Results of WP2 – Report Introduction to the work of WP2: Metadata, Keywords and.
METSÄNTUTKIMUSLAITOS SKOGSFORSKNINGSINSTITUTET FINNISH FOREST RESEARCH INSTITUTE Expert evaluation Jarmo Saarikko (Metla team) NEFIS WP5 meeting.
Metadata workshop, June The Workshop Workshop Timetable introduction to the Go-Geo! project metadata overview Go-Geo! portal hands on session.
Metadata vocabularies and ontologies Dr. Manjula Patel Technical Research and Development
UKOLN, University of Bath
Configuration management
Configuration management
METSÄNTUTKIMUSLAITOS SKOGSFORSKNINGSINSTITUTET FINNISH FOREST RESEARCH INSTITUTE Expert evaluation - details - Jarmo Saarikko (Metla team)
Copyright Irwin/McGraw-Hill Data Modeling Prepared by Kevin C. Dittman for Systems Analysis & Design Methods 4ed by J. L. Whitten & L. D. Bentley.
RDA: A New Standard Supporting Resource Discovery Presentation given at the CLA conference session The Future of Resource Discovery: Promoting Resource.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Metadata Component Framework Possible Standardization Work.
8/28/97Information Organization and Retrieval Metadata and Data Structures University of California, Berkeley School of Information Management and Systems.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
8/28/97Organization of Information in Collections Introduction to Description: Dublin Core and History University of California, Berkeley School of Information.
Final Search Terms: Archiving (digital or data) Authentication (data) Conservation (digital or data) Curation (digital or data) Cyberinfrastructure Data.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
LIS654 lecture 5 DC metadata and omeka tables Thomas Krichel
Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.
BEN METADATA SPECIFICATION Isovera Consulting Feb
Evidence from Metadata INST 734 Doug Oard Module 8.
1 Dublin Core & DCMI – an introduction Some slides are from DCMI Training Resources at:
WP5: Legal Aspects Overview. 2 Content  Task Overview  Results  Privacy  IPR  Copyright  Trademark.
21-05-xxxx IEEE MEDIA INDEPENDENT HANDOVER DCN: xxxx Title: LB #1b Comment Summary Date Submitted: March, 2007 Presented at.
1 CS 430: Information Discovery Lecture 5 Descriptive Metadata 1 Libraries Catalogs Dublin Core.
The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.
Document Computing Technologies for Managing Electronic Document Collections Ross Wilkinson... [et al.] Circulation Counter [RES3H] ZA4080.D
Application Profiles Application profiles -- are schemas which consist of data elements drawn from one or more namespaces, combined together by implementers,
Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
Module Metadata. Non-Controversial Changes  Changed ModuleMetadata from a Resource to a Type  Added url element – globally unique, referenceable identifier.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
21-07-xxxx IEEE MEDIA INDEPENDENT HANDOVER DCN: xxxx Title: Multiple MIH User Issues Date Submitted: November, 12-16, 2007.
Dublin Core Basics Workshop Lisa Gonzalez KB/LM Librarian.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Some basic concepts Week 1 Lecture notes INF 384C: Organizing Information Spring 2016 Karen Wickett UT School of Information.
Professional Development Programme: Design and Development of Institutional Repository Using DSpace Nipul G Shihora INFLIBNET Centre Gandhinagar
AGLS Metadata Standard
TRSS Terminology Registry Scoping Study
The Use of EAD in Archival Based Repositories
Data Management: Documentation & Metadata
Chapter 2 Database Environment.
Attributes and Values Describing Entities.
WebDAV Design Overview
LB93 Unresolved RFI Comments
Attributes and Values Describing Entities.
Suggested comment resolution on Power save clause
PLE Comment Resolution Update
The role of metadata in census data dissemination
Clarification on CID3778 and Mesh TIM element
Presentation transcript:

OLAC State of the Archives A summary of implementation practices during the first year

Overview Review of archives descriptions Review of element usage –How it was used –Problem practices –Suggestions for improvement –Changes already anticipated Summary: recommendations for implementation aspects in need of guidance

Archives descriptions Some good, some really lacking (none in the middle for reviews submitted) Most often missing: –Curator –Contact information –Access terms and instructions More thorough completion needed as a requirement for registration?

The Elements 15 elements from DCMES 9 additional elements unique to OLAC

Contributor & Creator General meaning of elements clear Distinction between these two not consistent Problematic practices: –Multiple names in single element instance –Name entry form not ready for sort –Quotation marks enclosing corporate names "Institute for Slovene Language ""Fran Ramovs"", Slovene Academy for Sciences and Arts, Ljubljana, Slovenia" –Inconsistency in corporate name forms

Contributor & Creator (cont.) Is it a problem to have so much information loaded in one element content? "Alexandra Jarosov, Slovak Academy of Sciences, Bratislava editorship, corrections Vladimir Benko; Comenius University, Bratislava Suggestions –One name per element instance –Surname, firstname order; Main unit, subunit order –No quotesif name is a translation from its usual form or not usually given in English, use the lang attribute –Means to identify the first author: can the order of instances of an element be significant? Creator and Contributor developments –OLAC Role as an extension applicable to Contributor element through a coded attribute

Coverage Used creatively by one archive for extent information Good potential for use of existing vocabularies as extensions to improve consistency

Date Lots of kinds of dates Problematic practices: –Refining terminology (recorded on, donated on) incorporated into the element text –Coded year value given then mm/dd/yy value given in element text Date developments –DCQ has 8 refining terms for Date: created, valid, available, issued, modified, dateAccepted, dateCopyrighted, dateSubmitted

Description Wide variety of usea catch all concept: –Prose description of resourcean abstract –Lists of subject terms –Description of container/location –Extent –Condition –Access requirements and assistance Case 1 Telephone conversations Material type: 45 minute cassette Condition: good

Description (cont.) Case 2 pronunciation Hub5-LVCSR, EARS 1500 Number of CDs: 0 Recommended applications: speech recognition Member license: [a URL] Nonmember license: [a URL] Online documentation: [a URL] Readme file: [a URL]

Description (cont.) Other perhaps more suitable elements: –Format (for extent) –Subject Description developments: –DCQ has 2 refining terms: tableOfContents, abstract

Format and its refinements Several different semi-controlled vocabularies in evidence Most often used for IMT (sometimes coded, but not clear if repository really meant a Type code, not an Internet Media Type code) Also for medium and extent information (however, no instance for either DC qualifier was actually specified) OLAC had 5 refinements (cpu, encoding, markup, os, sourcecode) but each of these was used very little, if at all Format developments: –OLAC extensions to Format: OS, CPU, Sourcecode, –Markup and character set encoding awaiting attention

Identifier Definition: An unambiguous reference to the resource within a given context Problem practices: –Many non-unique Identifier URLs: In a few archives, multiple resources were identified with the same URL, usually availability info or a further description, but not the resource itself Often apparently mistaken as the only place one could put a URL associated with this resource Sometimes incomplete (relative?) paths given –Some identifiers seemed useful only to the archives, but were not relevant for resource discovery or request for access

Identifier (cont.) Where does availability information belong? –It was placed here as well as in Description, Publisher, Relation, Source, and Rights elements by various archives - Available as a refinement pertains to Date, not to other aspects of availability Identifier would probably benefit from a more thorough best practice document

Language and Subject.Language (grouped here because of structural similarity) Two of the cleanest elements –It contained language name or code –It was usually repeated for multiple languages Relatively low use of the attribute supplying OLAC language code Clarification between these still needed for some archives

Publisher Usually a publisher or the archives itself, sometimes with URL, sometimes URL given in separate instance of element Problem practice: –One archive used it for host publication information, which should be in Relation

Relation IsPartOf and hasPart used most frequently, either through refinement code or noted in element content Previously, See and Recording on most frequent of un-coded relationships (Previously could utilize Replaces) DCQ offers many qualified terms for relation

Rights Not extensively used Not clearly understood Definition: Information about rights held in and over the resource. Comment: Typically, a Rights element will contain a rights management statement for the resource, or reference a service providing such information. Rights information often encompasses Intellectual Property Rights (IPR), Copyright, and various Property Rights. If the Rights element is absent, no assumptions can be made about the status of these and other rights with respect to the resource.

Rights (cont.) Problem practice: –Copyright statement should be in text of element, not in the code Rights developments: –With the Access extension on Rights, OLAC is integrating access and permitted use –Leave the work to the content of the element or a referral to additional information Additional good practice guidance is needed regarding parameters of protection: duration of restriction, entity with authority to override, expectations placed on users

Source Should refer to another resource from which the described resource is derived Problem practices: –Identifier was used (repeated) for what clearly had to be a Source resource URL (based on contextual content of record) –Source was used to give information on the linguistic consultant, with lengthy description. The whole would have been more appropriate in Description element. (Contributor was used also) –Source was used to specify an entity responsible for development, creation, donation, etc. of the resource (in one a Ph.D. granting institution is named, another the gov. agency responsible, others, SIL is named)

Subject Problem practice: –Element should be repeated for multiple subject terms Subject developments –OLAC extensions for Language, Linguistic field, Discourse type –DCQ offers LCSH, MESH (vocabularies), DDC, LCC, UDC (classifications)

Type Exhibited perhaps the most different archive- specific interpretations of its use Evidence of different vocabularies for type used by numerous archives When coded, the codes were generally applied correctly Abused by some poor mappings Type developments –OLAC maintaining best practice application of DC Type vocabulary

Type.Linguistic Confusion in use evident Metadata better placed elsewhere –Description A Comparison of Poman and Yuman (MA Thesis) –Subject Grammar, morphology, verbal suffixes Type.Linguistic developments –OLAC Linguistic Type extension for Type significantly changed

Type.Functionality Not used Highly desiredmetadata placed in: –Type Speech analysis, Speech editing, Speech processing –Description Recommended applications: speech recognition, spoken dialogue systems Functionality development: suggestion for a new element

MORE WORK More thorough best practice guidelines for: –Description –Identifier –Rights –Subject qualifiers and extensions (dealing with overlap, use of multiple schemes) –Type and its extensions The definitions and controlled vocabularies have to be in order FIRST