Presentation is loading. Please wait.

Presentation is loading. Please wait.

Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003.

Similar presentations

Presentation on theme: "Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003."— Presentation transcript:

1 Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003

2 Mellon Metadata Initiative – Slide 2ECDL 2003 – Trondheim, Norway Overview Highlights of the Mellon projects Findings regarding metadata harvesting Questions about the context of metadata and metadata harvesting Next steps, subsequent research projects

3 Highlights of the Projects

4 Mellon Metadata Initiative – Slide 4ECDL 2003 – Trondheim, Norway Andrew W. Mellon Foundation Mellon is a major U.S. private philanthropic foundation that has been involved with the OAI-PMH from the beginning Sought to foster projects exploring how the OAI-PMH could be used by libraries and other organizations supporting research to make metadata concerning scholarly collections more visible to users Funded seven projects in 2001 with total of US $1.5M

5 Mellon Metadata Initiative – Slide 5ECDL 2003 – Trondheim, Norway Seven Projects 1.University of Illinois at Urbana-Champaign 2.The University of Michigan (OAIster) 3.Emory University (MetaArchive) 4.SOLINET / ASERL (AmericanSouth) 5.The Research Libraries Group (RLG) 6.University of Virginia 7.(Woodrow Wilson International Center for Scholars at the Smithsonian)

6 Mellon Metadata Initiative – Slide 6ECDL 2003 – Trondheim, Norway Highlights of Projects OAIster and UIUC Repository harvested millions of records and developed sophisticated search tools Emory and SOLINET MetaScholar projects harvested focused collections, enhanced existing OSS harvesting tools, formed teams of scholars and librarians to study the process and context of metadata harvesting for research portals Other projects examined internal uses of OAI-PMH for cultural scholarship

7 Findings Concerning Metadata Harvesting

8 Mellon Metadata Initiative – Slide 8ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Slow Adoption of the OAI-PMH Most institutions with cultural materials collections have not yet implemented the protocol in the period This is due to many reasons: lack of institutional priority, insufficient technical staff, little organizational understanding of the benefits of the protocol However, both Emory and Illinois found that centralized regional centers providing relatively modest OAI technical expertise to other libraries was very effective in fostering adoption of the protocol

9 Mellon Metadata Initiative – Slide 9ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Problems with Institutional Metadata Wide variations in implementation of Unqualified Dublin Core (UDC) descriptive metadata elements Duplication of records between collaborating institutions, difficult to de-dupe due to lack of unique inter-institutional identifiers Format incompatibilities/collisions, especially between Encoded Archival Descriptions (EAD) and UDC record perspectives Inconsistent access restrictions to content leads to confusion by users

10 Mellon Metadata Initiative – Slide 10ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Problems with Inst. Metadata (cont.) No controlled vocabulary in effect for any UDC field, nor would this make sense for most fields Although universal systems such as US Library of Congress Subject Headings (LCSH) exist, they are not granular enough for most repositories No uniform mechanism in place to express dates or locations (coverage), which can mean many things in UDC, and no authority control for creator field 96% of institutional repositories using Eprints software do not use standard controlled vocabularies

11 Mellon Metadata Initiative – Slide 11ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Need for Metadata Gardening The best way to make metadata effective cross- institutionally is to coordinate the entire life cycle of metadata production Uncoordinated harvesting is relatively easy to do, but the resulting metadata aggregation then suffers from all the problems previously described and needs remediation (which may be effectively impossible)

12 Mellon Metadata Initiative – Slide 12ECDL 2003 – Trondheim, Norway Metadata Harvesting Findings: Need for Metadata Gardening (cont.) Coordinated gardening of metadata is the long- standing solution to this problem Examples include virtually any community of information users that have come up with consistent standards for the metadata they share The problem is that new information communities are still forming, having been enabled by the OAI-PMH Mature information communities are mature precisely because they have well-understood standards and practice in using and sharing information

13 Findings Concerning Metadata Context

14 Mellon Metadata Initiative – Slide 14ECDL 2003 – Trondheim, Norway Metadata Context Metadata without a context is useless, much like encrypted information without the key Metadata is considered useful precisely because it is created in particular contexts by particular communities OAI-PMH only prescribes UDC format UDC is some context, and is (probably?) better than nothing, but many groups inaccurately thought that it was enough context to build robust discovery systems around

15 Mellon Metadata Initiative – Slide 15ECDL 2003 – Trondheim, Norway Metadata Context Findings: Recovering Context Different opinions among the projects over how to recover context for aggregated heterogeneous metadata OAIster made some efforts to normalize some UDC metadata fields after harvesting (UDC type field) Illinois developed mechanism for displaying original EAD context of records disaggregated from finding aid series information Emory/SOLINET AmericanSouth has a team of nationally renowned scholars studying how online scholarship can contextualize metadata and vice versa

16 Mellon Metadata Initiative – Slide 16ECDL 2003 – Trondheim, Norway Metadata Context Findings: Harvesters vs. other Discovery Systems How do we understand harvesters vs. online catalogs, Google, and commercial databases? How do we articulate the difference to users? What information should we aggregate and make searchable? Metadata and crawled web content? Very different information realms need to be bridged through new federated search mechanisms

17 Next Steps and Subsequent Research

18 Mellon Metadata Initiative – Slide 18ECDL 2003 – Trondheim, Norway Next Steps for Emory, Michigan, and Illinois All of these projects learned a great deal during the Mellon Metadata Harvesting Initiative that has informed their subsequent planning for new services All of these projects are in the process of being mainstreamed using various strategies All of these projects continue to grapple with metadata quality and context issues

19 Mellon Metadata Initiative – Slide 19ECDL 2003 – Trondheim, Norway Next Steps: Illinois Additional research is being undertaken on the integration of EAD and OAI Beginning a three year collaboration with the research libraries of other Committee on Institutional Cooperation (CIC) institutions to study the potential of OAI-PMH to facilitate resource sharing NSF grant to develop digital libraries for scientific communities in connection with National Science Digital Library (NSDL) Institute for Museum and Library Services (IMLS) grant to develop an OAI-based registry of IMLS projects

20 Mellon Metadata Initiative – Slide 20ECDL 2003 – Trondheim, Norway Next Steps: Michigan Working on further techniques for metadata remediation –De-duplication –Normalization of more UDC fields –Further tailoring of metadata for research purposes Exploring use of OAIster in connection with campus courseware initatives

21 Mellon Metadata Initiative – Slide 21ECDL 2003 – Trondheim, Norway Next Steps: Emory Undertaking further modeling of scholarly portals based on metadata harvesting, with application to an international Irish Literature portal New grant from the Mellon Foundation to build on previous projects –Experiments in semantic clustering of metadata using support vector machines –Exploration of combining metadata harvesting and web crawling –Developing frameworks for federating loosely-coupled digital library components

22 Mellon Metadata Initiative – Slide 22ECDL 2003 – Trondheim, Norway Appreciation Enormous thanks go to the Andrew W. Mellon Foundation for advancing the understanding of metadata harvesting applications through these projects Mellon continues to be a driving force in the United States and internationally for research into digital library experiments benefiting scholarly communication

23 Mellon Metadata Initiative – Slide 23ECDL 2003 – Trondheim, Norway Contacts Martin Halbert Kat Hagedorn Joanne Kaczmarek

Download ppt "Findings from the Mellon Metadata Harvesting Initiative Martin Halbert, Joanne Kaczmarek, and Kat Hagedorn Monday 18-Aug-2003 ECDL 2003."

Similar presentations

Ads by Google