Presentation on theme: "Moving Forward with the OpenDOAR Directory"— Presentation transcript:
1 Moving Forward with the OpenDOAR Directory Peter MillingtonSHERPA Technical Development OfficerUniversity of Nottingham, EnglandUpdate Seminar for Imperial College, 2nd Dec 04Bill Hubbard, SHERPA
2 Outline Brief introduction to OpenDOAR OAI-PMH harvesting exercise What it is. Project time lineOAI-PMH harvesting exerciseModus operandiResults for re-use policiesTechnical issues & performanceConclusions & RecommendationsPrototype ‘policy generator’ toolQuestions & Feedback
3 What is OpenDOAR? Directory of Open Access Repositories Coverage Institutional & Subject-based repositories; Funders’ OA archivesNot covering: OA journals – see DOAJ –Authoritative evaluated dataMore than auto-harvested OAI dataProactive - more than data supplied by repository administratorsPeriodic review for currency and functionalityTarget usersSearch service providers, OA stakeholders, end-usersActive dialogue with providers, administrators, funders, etc
5 OpenDOAR Project Time Line Started early 2005University of Nottingham & University of LundFunded by: OSI, JISC, CURL & SPARCEuropeFirst public versionJanuary 2006Data built on work by Tim Brody, Southampton, & others380 repositories (04-May-2006)Developing Version 2Additional fields & viewsDue summer 2006
6 Harvesting Modus Operandi AimsFamiliarisation with OAI-PMHInvestigation of repositories’ policiesOAI-PMH protocol315 Repositories in OpenDOAR with an OAI Base URLverb=Identify – policies from eprints.xsd schemaTimings recorded & technical glitches notedMicrosoft Excel MacrosPrompted for operator interventionsSuch events would hamper auto-harvestingPHPFirewall problems – needed to use HTTP proxy serverPHP functions would not handle HTTPSUpdate Seminar for Imperial College, 2nd Dec 04Bill Hubbard, SHERPA
7 eprints.xsd Policy Criteria contentText and/or a URL linking to text describing the content of the repositoryIt would be appropriate to indicate the language(s) of the metadata/data in the repositorymetadataPolicyText and/or a URL linking to text describing policies relating to the use of metadata harvested through the OAI interfacedataPolicyText and/or a URL linking to text describing policies relating to the data held in the repositoryThis may also describe policies regarding downloading data (full-content)submissionPolicyText and/or a URL linking to text describing policies relating to the submission of content to the repository (or other accession mechanisms)
9 Metadata Policy Results No policy info for two thirds of repositoriesTechnical problems with 9%No data provided for 40%‘Undefined’ for 17% - EPrints default settingsPolicies givenNearly all permit re-use for non-commercial purposesA third seem to allow commercial re-useMany policies copied from other repositoriese.g. CogPrintsIssues for service providersLack of easily accessible policy statementsProhibited re-sale of metadata – Why prohibited?
11 Full Data Policy Results Also no policy info for two thirds of repositoriesTechnical problems with 9%No data provided for 42%‘Undefined’ for 17%Policies givenRe-sale of full items nearly universally prohibitedUnclear policy in ~7% of cases7% prohibit harvesting by robotsProhibited harvesting by robotsTotal prohibition prevents full text indexing and analysisTransient harvesting should be permitted – e.g. CalTech
12 Content Policies Repository Type Subject Specialities Type of Material Institutional or departmental repositoryMulti-institution subject-based repositorySubject SpecialitiesUp to three, or ‘many’Type of Materiale.g. Research papers, Theses, etcPublication StatusPre-prints (not peer-reviewed)Final peer-reviewed drafts (post-prints)Published versionsIndividual tagging with peer-review and publication statusPrinciple LanguagesUp to three
13 Submission Policies Eligible Depositors Deposition Rules Role and/or Organisation unitOr their delegated agentsDeposition RulesWho can deposit what – usually own work onlyMandatory deposition of metadataModeration (vetting)What, if anything, is vetted by the administratore.g. eligibility, relevance, valid layout. Exclusion of spamContent Quality Control (Peer review)Responsibility for the validity and authenticity of the contentNot checked, or checking by internal subject specialists.Copyright PolicyResponsibility for copyright clearanceDealing with proven copyright violations
14 Interim Conclusions The eprints.xsd is not working Why? But… Not used at all – or left ‘undefined’Muddled entries – e.g. items under wrong headingWhy?Lack of awareness of its existenceUnsupported by repository software packageInsufficient guidance – possible language issuesSome policies not covered – e.g. preservationBut…Copying indicates a desire for model policiesPlenty of good examples on which to base modelsWould be very useful to service providers, advocates, etc.
15 Recommendations For Repository Administrators For OpenDOAR Future Work Ensure the eprints.xsd schema is in your OAI configurationPut real policy info in the schema – not just ‘undefined’Fix any technical issuesAvoid using HTTPSFor OpenDOAREncourage repository administrators to improve mattersProvide model policiesProvide a ‘policy generator’ tool for administratorsFuture WorkUpdate eprints.xsd or replace with something newRe-analyse annually to monitor progress
16 OpenDOAR Policy Generator AimsCapturing policies using standard formulaeTool to help administrators formulate their policiesAnalysis of policiesIdentification of recurring phrases and conceptsNatural language cluster analysisSelection of statements & optionsAppropriate to the policy typeAnd meaningfulOpenDOAR policy recommendationsMinimum options – achieving OA goals but restrictedOptimum options – refinements for more use or better quality
24 Proposed Minimum Metadata Policy Anyone may access the metadata free of charge.The metadata may be re-used in any mediumwithout prior permission for not-for-profit purposesprovided the OAI Identifier and/or a link to the original metadata record are given.The metadata must not be re-used in any mediumfor commercial purposes without formal permission.
25 Proposed Minimum Full Data Policy Anyone may access full items free of charge.Single copies of full items can be:Reproduced & displayed or performed in any format or mediumfor personal research or study, educational, or not-for-profit purposeswithout prior permission or charge.Full items must not be harvested by robotsexcept transiently for full-text indexing or citation analysisFull items must not be sold commerciallyin any format or mediumwithout formal permission of the copyright holders.
26 Proposed Minimum Submission Policy Items may only be deposited by accredited members of the organisation, or their delegated agents.Authors/Depositors may archive only their own work.The administrator only vets items for the exclusion of spamThe validity and authenticity of the content of submissions is the sole responsibility of the depositor.Any copyright violations are entirely the responsibility of the authors/depositors.If the repository receives proof of copyright violation, the relevant item will be removed immediately.
27 Optimum Policy Ideas Metadata Policy Full Data Policy Allow re-sale of metadataIncreased visibility outweighs ‘exploitation’Full Data PolicyAllow multiple copying – for educational purposesAllow full harvesting – LOCKSS-like preservationSubmission PolicyMandatory deposition of metadataMandatory deposition of thesis full texts
28 What Next? Consultation Policy generator Refining recommended policies SHERPA partnersOther interested partiesPolicy generatorEnd-user testing – volunteers neededIdeas for output – e.g. text for EPrints configurationRefining recommended policiesIdeas for minimum and optimum optionsFeedback on our proposalsAiming for release summer 2006
29 Any Questions or Feedback? http://www.opendoar.org/ ContactPeter Millington
30 OpenDOAR Organisation The OpenDOAR TeamUniversity of Nottingham, EnglandBill Hubbard, Gareth Johnson, Peter MillingtonUniversity of Lund, SwedenLars Bjørnshauge, Kristoffer Lundqvist, Salam Baker ShanawaOur FundersOpen Society Institute (OSI)Joint Information Systems Committee (JISC)Consortium of Research Libraries (CURL)SPARCEurope