Presentation on theme: "Moving Forward with the OpenDOAR Directory Peter Millington SHERPA Technical Development Officer University of Nottingham, England."— Presentation transcript:
Moving Forward with the OpenDOAR Directory Peter Millington SHERPA Technical Development Officer University of Nottingham, England
Outline Brief introduction to OpenDOAR –What it is. Project time line OAI-PMH harvesting exercise –Modus operandi –Results for re-use policies –Technical issues & performance Conclusions & Recommendations Prototype policy generator tool Questions & Feedback
What is OpenDOAR? Directory of Open Access Repositories Coverage –Institutional & Subject-based repositories; Funders OA archives –Not covering: OA journals – see DOAJ – Authoritative evaluated data –More than auto-harvested OAI data –Proactive - more than data supplied by repository administrators –Periodic review for currency and functionality Target users –Search service providers, OA stakeholders, end-users –Active dialogue with providers, administrators, funders, etc
OpenDOAR Project Time Line Started early 2005 –University of Nottingham & University of Lund –Funded by: OSI, JISC, CURL & SPARCEurope First public version –January 2006 –Data built on work by Tim Brody, Southampton, & others –380 repositories (04-May-2006) Developing Version 2 –Additional fields & views –Due summer 2006
Harvesting Modus Operandi Aims –Familiarisation with OAI-PMH –Investigation of repositories policies OAI-PMH protocol –315 Repositories in OpenDOAR with an OAI Base URL –verb=Identify – policies from eprints.xsd schema –Timings recorded & technical glitches noted Microsoft Excel Macros –Prompted for operator interventions –Such events would hamper auto-harvesting PHP –Firewall problems – needed to use HTTP proxy server –PHP functions would not handle HTTPS
eprints.xsd Policy Criteria content –Text and/or a URL linking to text describing the content of the repository –It would be appropriate to indicate the language(s) of the metadata/data in the repository metadataPolicy –Text and/or a URL linking to text describing policies relating to the use of metadata harvested through the OAI interface dataPolicy –Text and/or a URL linking to text describing policies relating to the data held in the repository –This may also describe policies regarding downloading data (full- content) submissionPolicy –Text and/or a URL linking to text describing policies relating to the submission of content to the repository (or other accession mechanisms)
Metadata Policy Results
No policy info for two thirds of repositories –Technical problems with 9% –No data provided for 40% –Undefined for 17% - EPrints default settings Policies given –Nearly all permit re-use for non-commercial purposes –A third seem to allow commercial re-use Many policies copied from other repositories –e.g. CogPrints Issues for service providers –Lack of easily accessible policy statements –Prohibited re-sale of metadata – Why prohibited?
[Full] Data Policy Results
Full Data Policy Results Also no policy info for two thirds of repositories –Technical problems with 9% –No data provided for 42% –Undefined for 17% Policies given –Re-sale of full items nearly universally prohibited –Unclear policy in ~7% of cases –7% prohibit harvesting by robots Prohibited harvesting by robots –Total prohibition prevents full text indexing and analysis –Transient harvesting should be permitted – e.g. CalTech
Content Policies Repository Type –Institutional or departmental repository –Multi-institution subject-based repository Subject Specialities –Up to three, or many Type of Material –e.g. Research papers, Theses, etc Publication Status –Pre-prints (not peer-reviewed) –Final peer-reviewed drafts (post-prints) –Published versions Individual tagging with peer-review and publication status Principle Languages –Up to three
Submission Policies Eligible Depositors –Role and/or Organisation unit –Or their delegated agents Deposition Rules –Who can deposit what – usually own work only –Mandatory deposition of metadata Moderation (vetting) –What, if anything, is vetted by the administrator –e.g. eligibility, relevance, valid layout. Exclusion of spam Content Quality Control (Peer review) –Responsibility for the validity and authenticity of the content –Not checked, or checking by internal subject specialists. Copyright Policy –Responsibility for copyright clearance –Dealing with proven copyright violations
Interim Conclusions The eprints.xsd is not working –Not used at all – or left undefined –Muddled entries – e.g. items under wrong heading Why? –Lack of awareness of its existence –Unsupported by repository software package –Insufficient guidance – possible language issues –Some policies not covered – e.g. preservation But… –Copying indicates a desire for model policies –Plenty of good examples on which to base models –Would be very useful to service providers, advocates, etc.
Recommendations For Repository Administrators –Ensure the eprints.xsd schema is in your OAI configuration –Put real policy info in the schema – not just undefined –Fix any technical issues –Avoid using HTTPS For OpenDOAR –Encourage repository administrators to improve matters –Provide model policies –Provide a policy generator tool for administrators Future Work –Update eprints.xsd or replace with something new –Re-analyse annually to monitor progress
OpenDOAR Policy Generator Aims –Capturing policies using standard formulae –Tool to help administrators formulate their policies Analysis of policies –Identification of recurring phrases and concepts –Natural language cluster analysis Selection of statements & options –Appropriate to the policy type –And meaningful OpenDOAR policy recommendations –Minimum options – achieving OA goals but restricted –Optimum options – refinements for more use or better quality
Proposed Minimum Metadata Policy Anyone may access the metadata free of charge. The metadata may be re-used in any medium –without prior permission for not-for-profit purposes –provided the OAI Identifier and/or a link to the original metadata record are given. The metadata must not be re-used in any medium –for commercial purposes without formal permission.
Proposed Minimum Full Data Policy Anyone may access full items free of charge. Single copies of full items can be: –Reproduced & displayed or performed in any format or medium –for personal research or study, educational, or not-for-profit purposes –without prior permission or charge. Full items must not be harvested by robots –except transiently for full-text indexing or citation analysis Full items must not be sold commercially –in any format or medium –without formal permission of the copyright holders.
Proposed Minimum Submission Policy Items may only be deposited by accredited members of the organisation, or their delegated agents. Authors/Depositors may archive only their own work. The administrator only vets items for the exclusion of spam The validity and authenticity of the content of submissions is the sole responsibility of the depositor. Any copyright violations are entirely the responsibility of the authors/depositors. If the repository receives proof of copyright violation, the relevant item will be removed immediately.
Optimum Policy Ideas Metadata Policy –Allow re-sale of metadata –Increased visibility outweighs exploitation Full Data Policy –Allow multiple copying – for educational purposes –Allow full harvesting – LOCKSS-like preservation Submission Policy –Mandatory deposition of metadata –Mandatory deposition of thesis full texts
What Next? Consultation –SHERPA partners –Other interested parties Policy generator –End-user testing – volunteers needed –Ideas for output – e.g. text for EPrints configuration Refining recommended policies –Ideas for minimum and optimum options –Feedback on our proposals Aiming for release summer 2006
Any Questions or Feedback? Contact Peter Millington
OpenDOAR Organisation The OpenDOAR Team –University of Nottingham, England Bill Hubbard, Gareth Johnson, Peter Millington –University of Lund, Sweden Lars Bjørnshauge, Kristoffer Lundqvist, Salam Baker Shanawa Our Funders –Open Society Institute (OSI) –Joint Information Systems Committee (JISC) –Consortium of Research Libraries (CURL) –SPARCEurope