Presentation on theme: "Building Quality Assurance into Metadata Creation An Analysis Based on the Learning Objects and e-Prints Communities of Practice Jane Barton, Centre for."— Presentation transcript:
Building Quality Assurance into Metadata Creation An Analysis Based on the Learning Objects and e-Prints Communities of Practice Jane Barton, Centre for Digital Library Research Sarah Currier, Centre for Academic Practice University of Strathclyde, UK Jessie M.N. Hey, Intelligence, Agents, Multimedia Group and University Library University of Southampton
Researchers from two countries discuss two communities
Scope of Paper Metadata creation for two parallel communities: learning object repositories and open e-Print archives The content of the metadata record, not the structure Human-generated metadata only Assuring the quality of this process Metadata will only support effective discovery if it is accurate, consistent, sufficient, and thus reliable (Greenberg and Robertson (2002) Semantic Web construction: an inquiry of authors views on collaborative metadata generation. Proceedings of DC2002, 45-52.)
… in the beginning (LO community) …the authoring of metadata itself will be straightforward for most course designers. Because metadata files are machine- writable, authors will simply access a form into which they enter the appropriate metadata information. ( Downes, 2001)
… in the beginning (e-Prints Community) Physicists deposited academic papers in global arXiv Interoperability framework created: Open Archives Initiative Protocol for Metadata Harvesting (OAI- PMH) Emphasis on examining and changing the culture within academia to encourage deposit of e-prints Wider goal of changing the unsustainable economics of scholarly communication Focus on participation - anything perceived as a barrier between academics and institutions tends to be played down (e.g. metadata creation issues)
… but is it really so simple? assumptions in e-learning & e-prints Internet culture: mediation by controlling authorities detrimental & undesirable Time-consuming, costly, barrier to uptake of technology (tedious and difficult) Only authors/users understand their resources Deus ex machina?
… but is it really so simple? some case studies Quality of author-generated metadata? Higher Level Skills for Industry Project (HLSI) - University of Huddersfield e-Prints service providers: UPS and Arc Collaboration between authors and specialists? Bolton Woods Local History Project e-Prints data providers: TARDIS Specialist help needed? Scottish electronic Staff Development Library (SeSDL) ePrints UK and TARDIS
Quality Control?: The HLSI Project 6,500 learning objects with IEEE LOM metadata records created by authors: The same metadata records for many or all components of a content package Inconsistent terminology Description of facets and characteristics of the educational object and not of the content Over-use of software default values Information scientists brought in; at Jun. 03 2,500 metadata records re-edited, taking ca 550 hours and costing ca £6500 (£2.60 ea.) ( Ryan and Walmsley, 2003; Ryan, B. (2003) Creating, Using and Re- using Learning Objects. HLSI Project. [ppt presentation] Online: http://www.cetis.ac.uk/groups/20010809144711/FR20030807121739) http://www.cetis.ac.uk/groups/20010809144711/FR20030807121739
Quality Control?: UPS Preprint Service UPS (Universal Preprint Service Prototype) Slightly pre-OAI; used NCSTRL+ Protocol to harvest ca. 200,000 records from existing archives, made available through single user interface: The lack of quality of the metadata available in the UPS Prototype project has an important, baleful influence on the creation of cross-archive services as well as on the quality of services that can be created. ( Van de Sompel, H. et al., 2000)
Quality Control?: Arc search service Arc search service: first prototype using OAI The effort of maintaining a quality federation service is highly dependent on the quality of the data providers. Some are meticulous in maintaining exacting metadata records that need no corrective actions. Other data providers have problems maintaining even a minimum set of metadata and the records harvested are useless. ( Liu, X. et al., 2001)
Quality Control? … an aside Even when theres a positive benefit to creating good metadata, people steadfastly refuse to exercise care and diligence in their metadata creation. Take eBay: every seller there has a damned good reason for double-checking their listings for typos and misspellings. Try searching for plam on eBay. Right now, that turns up nine typoed listings for Plam Pilots. Misspelled listings dont show up in correctly spelled searches and hence garner fewer bids and lower sale-prices. You can almost always get a bargain on a Plam Pilot at eBay. ( Doctorow, 2002: Metacrap: Putting the Torch to the Seven Straw Men of the Meta-Utopia)
Collaboration?: Findings from the Bolton Woods Local History Project Study compared resource authors & information scientists metadata: Authors did not have a good understanding of purpose or value of metadata Authors understood the context of resources and focused on these elements Information specialists understood the purpose of metadata and included a wider range of metadata elements, but "struggled" with contextual aspects of the metadata Neither handled pedagogic aspects of the resources well ( O'Beirne, 2002)
Collaboration?: The TARDIS project – Targeting Academic Resources for Deposit and Disclosure UK JISC funded FAIR Programme – cluster of projects exploring different aspects Pilot departments metadata errors suggested modifying approach Exploring self-archiving and mediated deposit together Trialing simpler interface to GNU EPrints software for author-generated metadata Testing value of: targeted help; more logical field order; examples created by information specialists; fields required for good citation Mediated service for daunted authors also being trialled and evaluated.
Collaboration?: support from Semantic Web-based DC research … … the integration of expert and author generated descriptive metadata can advance and improve the quality of metadata for web content, which in turn could provide useful data for intelligent web agents, ultimately supporting the development of the Semantic Web. […] If such partnerships are well planned and evaluated, they could make a significant contribution to achieving the Semantic Web. (Greenberg and Robertson (2002) Semantic Web construction: an inquiry of authors views on collaborative metadata generation. Proceedings of DC2002, 45-52.)
Specialists needed?: Scottish electronic Staff Development Library SeSDL Taxonomy Evaluation involved 6 users subject classifying resources: Out of 106 classifications, only 35% had agreement of more than one user E.g.: Resource defining VLE and MLE was classified Student-Centred Learning and Collaborative Learning by one user Without adequate user support, classification is likely to be so inconsistent as to make the browse tree unusable The whole exercise has given me more admiration and respect for librarians--(user) ( Currier, 2001)
Specialists needed?: ePrints UK and TARDIS TARDIS examined current diverse subject classification practices of e-Print archives – is experimenting with simple standard and additional specialised subject community options and mediated entry ePrints UK is experimenting with use of an automatic subject-classification Web service offered by OCLC
Specialists needed?: Research in commercial database abstracting & indexing services shows … … that authors may lack knowledge of indexing and cataloguing principles and practices, and are more likely to generate insufficient and poor quality metadata that may hamper resource discovery (Greenberg and Robertson (2002) Semantic Web construction: an inquiry of authors views on collaborative metadata generation. Proceedings of DC2002, 45-52.)
Lets revisit those assumptions in e-learning & e-prints Some expert mediation may be beneficial. (Metadata does not control access to resources, it provides access to resources) Cost-benefit analysis necessary – metadata metrics. Authors/users expertise can be incorporated into the process; but metadata specialists have a role to play. All problems not resolvable by machine.
Conclusion The metadata creation process is not trivial and needs appropriate planning and management to assure quality and thus enable sharing and reuse of resources Further research is needed to understand how this can best be achieved What constitutes good quality metadata? Who should create metadata and how? How can metadata tools support the process? How can support and training be facilitated?
Resources This paper is built on: Quality Assurance for Digital Learning Object Repositories: How Should Metadata Be Created? (Currier, Barton: ALT-C 2003): http://metadata.cetis.ac.uk/guides/usage_survey http://metadata.cetis.ac.uk/guides/usage_survey For further info / discussion on LO metadata, see CETIS Metadata SIG: http://metadata.cetis.ac.uk/ http://metadata.cetis.ac.uk/ For e-Prints metadata developments, see FAIR Focus on Access to Institutional Resources (FAIR) ProgrammeFocus on Access to Institutional Resources (FAIR) Programme TARDIS http://tardis.eprints.orghttp://tardis.eprints.org
Our help required! I am a great believer in working towards quality assurance. I just never get there. University of Washington faculty member September 2003