Presentation on theme: "Välkomna till ChALS 2003 24 Sep 2003 e-Print Repositories for research visibility : a journey from there to here Pauline Simpson Southampton Oceanography."— Presentation transcript:
Välkomna till ChALS 2003 24 Sep 2003 e-Print Repositories for research visibility : a journey from there to here Pauline Simpson Southampton Oceanography Centre University of Southampton England Scholarly Communication and OAI Välkomna till ChALS 2003 (Chalmers Annual Library Seminars) 24 Sep 2003
Välkomna till ChALS 2003 24 Sep 2003 University of Southampton Research led multidisciplnary university: 20,000 students 5000 staff (1500 researchers) Restructured Aug 2003: from 5 faculties, 65 departments 3 Faculties –Law, Arts and Social Sciences –Medicine, Health and Life Sciences –Engineering, Science and Math 20 Schools Education Humanities Law Management Social Sciences Winchester School of Art Biological Sciences Health Care Innovation Health Professions & Rehab Medicine Nursing & Midwifery Chemistry Civil Engineering & Environmental Engng Electronics and Computer Sciences Engineering Sciences Geography Institute Sound & Vibration Mathematics Ocean and Earth Sciences (SOC) Physics and Astronomy
Välkomna till ChALS 2003 24 Sep 2003 Southampton Oceanography Centre SOC is one of the worlds leading centres for research and education in marine and earth sciences, for the development of marine technology and for the provision of large scale infrastructure and support for the marine research community.
Välkomna till ChALS 2003 24 Sep 2003 Road map Guide us through: –Scholarly Communication –Open Archives Initiative –e-Print Archives Subject and institutional –TARDis – Targeting Academic Research for Deposit and Disclosure
Välkomna till ChALS 2003 24 Sep 2003 Information space : building a global collaboratory The academic world is increasingly global and collaborative and needs the tools to support this …..center without walls, in which researchers can perform their research without regard to geographical location – interacting with colleagues, accessing instrumentation, sharing data and computational resource, and accessing information in digital libraries Kouzes et al 1996 Collaboratories – doing science on the internet Computer, 29(8), 40-46
Välkomna till ChALS 2003 24 Sep 2003 How to get there Developing an infrastructure for data – the GRID –Other people will wish to use the same data so we need tools to preserve and access it Developing an infrastructure for documents through hybrid libraries : –Traditional and digital holdings –Commercial and open (free and interoperable) access –Bibliographic and full text
Välkomna till ChALS 2003 24 Sep 2003 PUBPUB SUBSUB LIBLIB A R Primary channel - Scholarly Communication – present model Bibliometrics – citation analysis, impact factors Evaluation – RAE, Tenure, Promotion Research funding proposals
Välkomna till ChALS 2003 24 Sep 2003 Crisis in Scholarly Communication alternate models Open Access Journals Open Archive Initiatives Open = freely accessible - open access journals Open = interoperable - Open Archives Initiative The Case for Institutional Repositories: a SPARC position paper – prepared by Raym Crow July 2002 Supplemented by: SPARC Institutional Repository Checklist and Resources Guide October 2002
Välkomna till ChALS 2003 24 Sep 2003 Changing Publishing Paradigm A uthors R eaders OAI data providersOAI service providers PUBPUB SUBSUB LIBLIB A uthors R eaders Publish Archive/ access Hybrid roles Information flow through Open Archives model Citation analysis
Välkomna till ChALS 2003 24 Sep 2003 What are Open archives? Electronic repository of e-Prints, usually internet based for free access and dissemination Both Institutional and discipline based archives that allow public access to content and employ the Open Archive Initiative Metadata Harvesting Protocol nb. e-Print archives non OAI registered but still open
Välkomna till ChALS 2003 24 Sep 2003 e-Prints : variable definitions e-Prints are electronic copies of any research output (journal article, book section, conference paper, technical report etc.) – preprints – unpublished papers before they are refereed –postprints – papers after they have been refereed Also narrower and broader definitions: –Peer-reviewed articles – original definition - Stevan Harnad –Broad output – research + learning + datasets + multimedia + internal admin documents etc
Välkomna till ChALS 2003 24 Sep 2003 Variable definitions - spelling and whats in a name ? Eprints ; ePrints ; eprints E-Prints ; e-prints e-Prints (Oxford English Dictionary) Archive - wrong connotations? – repository – depository – service -
Välkomna till ChALS 2003 24 Sep 2003 e-Print origins – invisible university culture »Exchange draft publications paper – high energy physics : 50 – 1000 authors – needed electronic transmission –Evolving digital environment »ARPA Internet 1970s – Web 1990s –Culture + technology fix = the first archive –Electronic preprints archives - Author self- archiving systems ArXiv (Los Alamos now at Cornell) (1991) set up by Paul Ginsparg for high energy physics community ( now physics (incl Atmospheric and Oceanic Physics, Math, Computing Science and nonlinear science).
Välkomna till ChALS 2003 24 Sep 2003 Subject based archives Early e-Print services were subject based and hosted by a single institution. Relied on distributed researchers remotely depositing their papers using the self archiving protocol Despite success of Los Alamos (now arXiv) - cautious uptake by other subject communities - Successful examples : Cogprints(1997), Chemistry Preprints Server, RePEc WoPEc (economics), etc Many of the subject based archives started by individual enthusiasts
Välkomna till ChALS 2003 24 Sep 2003 arXiv recent weekly usage Red - Average number of connections. Blue - Average number of hosts connecting (divide by 10 for correct number). Green - Average number of new hosts. (divide by 10). Growing by 30,000 articles per month
Välkomna till ChALS 2003 24 Sep 2003 Major e-Print Drivers –Crisis in scholarly publication –Growing Call for Open Access Budapest Open Access Initiative http://www.soros.org/openaccess Launched 14 Feb 2002 by George Soross Open Society Institute Worldwide coordinated movement dedicated to freeing online access: OAI based self archiving and alternative journals Open societies need open access Scholars should be able to deposit their refereed journal articles in open electronic archives which conform to OAI standards
Välkomna till ChALS 2003 24 Sep 2003 Support… Stevan Harnad, Univ Southampton ; leading advocate self archiving and now institutional model Cogprints – September98 email list International Scholarly Communications Alliance –Worldwide organisations collaborate with scholars and publishers to establish equitable access to scholarly and research publications Funding … Mellon Foundation $1.5m for seven USA OAI projects Budapest Open Access Initiative - Soros Foundation Open Information Society - $1m /3yrs NSF funding grants for OAI projects (NSDL) $7M Focusinteroperability infrastructure (OAI)
Välkomna till ChALS 2003 24 Sep 2003 Origins of the Open Archive Initiative Oct 1999 – 1 st meeting Santa Fe Convention Universal Preprint Service – prototype – renamed Open Archive Initiative Dienst Protocol Metadata Harvesting Protocol Early 2000 –the Cambridge Meetings Aug 2000 - Support from Digital Library Federation, Coalition for Networked Information and NSF Steering Committee Formed Late 2000 Mission statement
Välkomna till ChALS 2003 24 Sep 2003 The OAI defines two participants Data Providers adopt the OAI technical framework as a means of exposing metadata about their content (held in repositories) –OAI conformant –OAI registered –OAI namespace-registered Service Providers harvest metadata from Data Providers using the OAI protocol and use the metadata as the basis for value added services Conceptually different but in reality Data Providers can offer both a service directly to users and also metadata for automated harvesters data providers need to offer value added services as well
Välkomna till ChALS 2003 24 Sep 2003 Open Archives Initiative Protocol for Metadata Harvesting : OAI-PMH Based on HTTP Carrier protocol Responses are encoded in XML The Open Archives Metadata Set = Dublin Core Metadata Element Set (unqualified) –Data providers must supply Dublin Core data via OAI, so that all harvesters can use their data. Question whether harvesting simple DC = loss of rich metadata from the original record. but Now have a significant solution for open (interoperable) archives Laid down rules which make search services for many distributed archives possible
Välkomna till ChALS 2003 24 Sep 2003 OAI Archive Model Author Open Repositories Data Providers Value-added Services Service Providers Reader Institutional Servers Disciplinary Servers Journals (e.g., PLoS model) Interoperability Standards Workflow Applications Integrated scholarly communities Search tools OAI-PMH
Välkomna till ChALS 2003 24 Sep 2003 Supporting software Many enabling technologies, standards, and protocols to support institutional repositories already exist e.g. the OAI- PMH protocol to enable interoperability The World Wide Web is taken for granted as part of the infrastructure archiving software Initially one software freely available to implementers: Eprints.org
Välkomna till ChALS 2003 24 Sep 2003 eprints.org GNU EPrints Software from IAM group University of Southampton is free Pioneered by Prof. Stevan Harnad to further the cause of self-archiving EPrints 2 (GNU Eprints) developed by Chris Gutteridge
Välkomna till ChALS 2003 24 Sep 2003 Other e-Prints software emerging DSpace -Joint project of MIT Libraries and Hewlett Packard Company (Nov 2002) http://www.dspace.org CDSWare – CERN Document Server software http://cdsware.cern.ch ARNO – Academic Research in the Netherlands Online, Tilburg, Amsterdam, Twente http://www.uba.uva.nl/arno bPress – Univ California (eScholarship) http://www,cdlib.org Other own software (arXiv, Max Planck etc)
Välkomna till ChALS 2003 24 Sep 2003 CogPrints (GNU EPrints) 1600 Records www.orgprints.org (GNU EPrints) 264 Records arXiv (custom software) 230,000 Records D-Space @ MIT (D-Space Software) 769 Records Harvester #1 (Psychology Service) 500 Cogprints 169 D-Space Harvester #2 (Physics Aggregator) 150,000 arXiv 162 D-Space Harvester #3 (General Service) 230,000 arXiv 769 D-Space 264 OrgPrints 1600 CogPrints 150,162 Improved records from physics aggregator Institutional repositories
Välkomna till ChALS 2003 24 Sep 2003 Service Providers (some) ArcSearch engine CallimaSearch engine citeBaseSearcSearch engine with citation ranking CYCLADESSearch engine DP9Search engine – deep web iCiteCitation indexing system for physics My.OAISearch engine NCSTRLUnified access computer sciences OAIsterSearch engine PerseusSearch engine in humanities Scirus Search engine – Elsevier TORII Unified access physics-computer »Ack: David Prosser
Välkomna till ChALS 2003 24 Sep 2003 Service provider - find the pearls
Välkomna till ChALS 2003 24 Sep 2003 Entering another phase : Institutional repositories In 2000 - Complementary model to the subject archives e-Print archives based on research output from one institution. Reawakening to value of greater access to an institutions research Essential increase in visibility of our intellectual output A preservation role (like our traditional archivists)
Välkomna till ChALS 2003 24 Sep 2003 Institutional repositories - early adopters Australian National University Aalborg University Humbodlt-Universitat Lund Universitet National University of Ireland University of Glasgow California Digital Library MIT University of Southampton Univerity of Cambridge University of Tilberg Universite de Montreal LMU Munchen Utrecht University CERN University of Bath University of Nottingham Caltech Academy of Sciences Belarus Hong Kong University Netherlands (DARE) Ack David Prosser
Välkomna till ChALS 2003 24 Sep 2003 Benefits of an Institutional Repository Provides Institutional information asset management Defines Institutional sources of research Identifies Institutions value to funding sources Raises the profile of the Institution Institutional research more visible, more impact and available in electronic form – cited more (Lawrence: Nature) Contributes to national and global initiatives which will ensure an international audience for Institutions latest research. (Other universities are developing their own archives which, together, will be searchable by global search tools)
Välkomna till ChALS 2003 24 Sep 2003 Information community – taking a lead role – (1) Professional skills and expertise map to e-Print support and maintenance profile: –Positioned in the scholarly communication process Recorders of institutional scientific output Publishers on behalf of institution –Collection and dissemination of scholarly resources –Deliverers of seamless systems, e-resources etc –Resource discovery mechanisms in digital environment (eg Z39.50)
Välkomna till ChALS 2003 24 Sep 2003 Information community – taking a lead role – (2) –Database expertise –Records management –Work with metadata and preservation –Apply standards uniformly –IPR issues –Central service provider –Interact at all levels of the institution –Network culture –End user of free research corpus
Välkomna till ChALS 2003 24 Sep 2003 UK Programme 2002 UK Higher Education Funding Council –JISC FAIR Programme (Focus on Access to Institutional Resources) Inspired by the vision of the Open Archives Initiative (OAI) that digital resources can be shared between organisations based on a simple mechanism allowing metadata about these resources to be harvested into services To support the disclosure of institutional assets: To support access to and sharing of institutional content within Higher Education and Further Education and to allow intelligence to be gathered about the technical, organisational and cultural challenges of these processes…
Välkomna till ChALS 2003 24 Sep 2003 FAIR Programme £3 million on 14 projects starting August 2002 –Museums and Images; e-Prints; e-theses; IPR; Institutional portals TARDis: Targeting Academic Resources for Deposit and dISclosure SHERPA: broader - Consortium of Research Libraries – filling archives and joint infrastructure HaIRST: A testbed for Scotland ePrints-UK :harvesting UK e-Print archives also investigating automated subject indexing using Dewey classification (with OCLC software in USA) eFAIR Cluster – exchange of experiences and work- includes e-Theses projects overlap in work areas
Välkomna till ChALS 2003 24 Sep 2003 Univ of Southampton e-Print Archive Project funding 30 months Aug 2002-2005 : Targeting Academic Research for Deposit and dISclosure (TARDis) –Project Manager, Research Assistants x 2, Admin Officer Implement a university e-Print archive – sustainable product – e-Prints Soton Evaluate self and mediated archiving measured against discipline culture Document the technical, organisational and cultural issues of archiving Feedback into the eprint software design
Välkomna till ChALS 2003 24 Sep 2003 TARDis Work Plan Early institutional e-Print archives have had problems with acquisition of content possibly because of self archiving protocol and discipline culture –Investigate the barriers Technical – hardware and software Discipline culture Depositors concerns –Implementation Policy considerations Advocacy Sustainability
Välkomna till ChALS 2003 24 Sep 2003 Barriers – hardware and skills set Hardware and software requirements – GNU Eprints –Apache WWW server –Unix / RedHat Linux Any computer capable of running GNU/Linux or similar operating system –Perl programming language and modules –MySQL – public domain software Different skill sets needed for other software eg. DSpace - requires Java skills
Välkomna till ChALS 2003 24 Sep 2003 Software Configuration (GNU EPrints v2.3) Everything should be made as simple as possible But not simpler. Albert Einstein GNU Eprints - originally intended for self archiving – re- engineer for institutional repository Simplify the deposit process –Reflect the look and feel of host web interface –Additional metadata fields for institutional structure: Faculties, Schools, Departments, Research Groups Language ISBN/ISSN? Coporate author On screen help Information management standards –Citation formats –Metadata fields to describe all document types – presented - logical order –Global subject classification – or thesaurus –Deposit types & Document formats
Välkomna till ChALS 2003 24 Sep 2003 GNU EPrints requested software development Batch import Export to personal bibliographic software – EndNote Authentication Non Techie configuration Automated subject classification Automated metadata quality control Automated metadata from full text Full text searching OpenURL compliant etc
Välkomna till ChALS 2003 24 Sep 2003 Document Formats – multidisciplinary needs Defaults : HTML, pdf, Postscript, ASCII May want to subtract –HTML Unless carefully checked HTML output from Word unsatisfactory Add : –Special document preparation formats: LaTex or common formats such as RTF Accept all formats – all research output, including imagery, Powerpoint, streaming videos etc Open source utility programs available to convert from non supported to supported formats Must ensure we have the viewers for users to download –eg postscript viewer
Välkomna till ChALS 2003 24 Sep 2003 Subject Classification / Thesaurus Early survey showed that all archives used either LoC or cut down version, or their own categories or published thesaurus JEL GNU EPrints Version 2 – installed Library of Congress as Default subject classification –Established global scheme often used in University Libraries –Top Level Headings Subheadings to third level Sufficient granularity? Not a deposit friendly tool Possible to load additional classification or Subject based thesaurus? None at all – rely on title, keywords abstract or faculty structure as retrieval? But how can broad subject areas be harvested from a multidisciplinary archive without classification?
Välkomna till ChALS 2003 24 Sep 2003 Barrier – University culture Survey –No central database record of University research output is maintained. Retrospective central research publications listings collated from individual departments and made available on the web (University Research Report) In interviews - researchers want from an archive –To enter a record only once and use for multiple purposes –Export from e-Print repository for multiple purposes – listings, web pages, University Research Report! etc –Import of existing School databases and listings –Definitive bibliographic records not just full text –Own branding
Välkomna till ChALS 2003 24 Sep 2003 E-Publishing on the University Web Survey: researchers attitude to e-Publishing on the web. –Snapshot –looked at web sites – personal and schools
Välkomna till ChALS 2003 24 Sep 2003 Addressing authors concerns Work load – (central bureacracy, new systems to learn (change overload), file format conversion) –Assisted submission – the library will do it! (medium term) Quality control – loss of peer review. Authors continue to submit articles to high impact traditional journals and also contribute to e-print archives Undermining the status quo –Some editors paid by publishers –Reputations made within the present system –Dislike of anti-publisher stance –Self archive complements status quo
Välkomna till ChALS 2003 24 Sep 2003 Addressing authors concerns Visibility – compared with web pages –Standard search engines do pick up metadata from archive but search must be specific eg Hall agent technology will be found but finding a paper from a subject search presents thousands of results (not efficient yet) - DP9 OAI Gateway Service for Web Crawlers to mine the deep web Ingelfinger rule - prior publication –Publishers gradually changing Authentication – probity (Life Sciences) –JISC project using TARDis as testbed Preservation –Implicit, Secure storage, migration Copyright!
Välkomna till ChALS 2003 24 Sep 2003 IPR particularly Copyright Traditionally authors sign over copyright, whether they own it or not! Univ Southampton does not claim copyright on authored works other than course material. We need to encourage/assist authors: –Place articles with open access publishers –Negotiate agreement with publisher to retain e-Print right –Deposit postprint (pre journal version in archive (Harnad- Oppenheim strategy) –FAIR Project ROMEO Copyright Transfer Agreement List http://www.lboro.ac.uk/departments/ls/disresearch/romeo/index. html
Välkomna till ChALS 2003 24 Sep 2003 Publishers attitudes changing Nature Publishing Group 19 Sep 2003
Välkomna till ChALS 2003 24 Sep 2003 Publishers making themselves OAI compliant Institute of Physics: We are pleased to confirm that we have adopted this standard here at Institute of Physics Publishing and metadata records for our article abstracts are now available in Dublin Core. They can be harvested from our server on request. August 2002 ?How many library catalogues are OAI compliant?
Välkomna till ChALS 2003 24 Sep 2003 Archive Implementation - Policy decisions Software Centralised or distributed databases - document type, university grouping Collection policy (research output from whom ?) File formats Deposit agreements Authentication of depositers Metadata quality control - level Administrative/operational load Sustainability Copyright / IPR institutional policy of non transfer or negotiate –Retain the right to distribute it for free for scholarly scientific purposes in particular, the right to self archive it publicly online on the www. Long term archiving / preservation –Global problem- not just e-Prints – digital assets –UK – Digital Preservation Centre –Stanford USA - LOCKSS – investigating international federated preservation facility
Välkomna till ChALS 2003 24 Sep 2003 Implementation - Advocacyif you build it they will come. Costner: Field of dreams The biggest challenge is encouraging user participation : –Contribute content –Search/use the respository Leaflets e-Print archive - demonstrator Advocacy web site Briefing paper to management – buy-in Literature e.g. SPARC leaflet Institutional magazines Presenting at departmental meetings and university committees Special advocacy events Carrots! – USB stick, pens etc
Välkomna till ChALS 2003 24 Sep 2003 Where are we now? E-Prints Soton –new configuration pre trial with friends feedback –Font, colour, school names, cut and paste, LoC!, Unix systems browsers etc Pilot two Schools –Ocean and Earth Sciences (60 papers already) –Social Sciences Researchers buy in - biggest challenge Demonstrate real value (save them time) Build bibliographic database of university research output not just full text! School branding (Lund example but from a central database)
Välkomna till ChALS 2003 24 Sep 2003 Information space - a national vision: e-Prints + data + e-learning e-Banks UK End of the journey? When data and documents will be linked and easily accessible They will be an integral part of the academic work space just as the World Wide Web is today But the Web will acquire meaning and become the Semantic Web Open Archive protocols and metadata standards are a part of this journey
Thank You Implementing e-Prints is an emerging challenge for the information community Pauline Simpson Southampton Oceanography Centre, University of Southampton, UK firstname.lastname@example.org
Välkomna till ChALS 2003 24 Sep 2003 To keep up to date Peter Suber keeps up to date with all these activities with the Free Online Scholarship Movement Read his Open Access News blog (previously FOS Newsletter) http://www.earlham.edu/~peters/fos/aboutblog.ht m#namechange Produced a Timeline to record the real momentum of archiving!
Your consent to our cookies if you continue to use this website.