Presentation on theme: "Access Changes Everything The Benefits of Open Access and Open Semantics for Researchers Leslie Carr Intelligence, Agents and Multimedia Group University."— Presentation transcript:
Access Changes Everything The Benefits of Open Access and Open Semantics for Researchers Leslie Carr Intelligence, Agents and Multimedia Group University of Southampton
Salutary Warning A scholar is just a librarys way of making another library –Daniel Dennett, Consciousness Explained
Thanks to Tim Brody and Stevan Harnad (Southampton University)
Outline Open Access –Visionary Foundations –Rationale: Research Impact –Effect of Open Access on Research Impact –Tools and Services –Initiatives Semantic Web –Introduction –Resource Description –Examples Concluding Thoughts
H. G. Wells, World Brain: The Idea of a Permanent World Encyclopaedia Encyclopédie Française, August, 1937 encyclopaedias of the past sufficed for the needs of a cultivated minority –universal education was unthought of –gigantic increase in recorded knowledge –more gigantic growth in the numbers of human beings requiring accurate and easily accessible information
Permanent World Encyclopaedia Discontented with the role of universities and libraries in the intellectual life of mankind Universities multiply but do not enlarge their scope –thought & knowledge organization of the world No obstacle to the creation of an efficient index to all human knowledge, ideas and achievements
Vannevar Bush, As We May Think Atlantic Monthly, July 1945 Director of the Office of Scientific Research and Development in USA, coordinating 6,000 American scientists during WWII Turns to making our bewildering store of knowledge more accessible For many years inventions have extended mans physical powers rather than the powers of his mind.
Memex The Memex (never built) was to be a mechanised device to allow a library user to –consult all kinds of written material –organize it in any way the user wanted –add private comments and link documents together at will. A personal library station which held all written articles and journals on microfilm. –system of levers allowed users to add links –create trails
Doug Engelbart Inventor of the mouse, was inspired by Bushs article. Computers were too expensive to be used interactively and for non-numeric tasks Augment project (1962) to develop computer tools to augment human capabilities and productivity
Ted Nelson Hypertext is more than text (1965) Literature is a system of interconnected documents Project Xanadu was a global literature: a repository of documents, their multiple versions and their interconnections.
Stevan Harnad, Scholarly Skywriting, Psychological Science (1990). Internet provides improvements in storing and communicating ideas. The reward is improvement in generating ideas: research. Greatest reward is the possibility of much greater intellectual productivity in one lifetime.
Tim Berners-Lee Inventor of the WWW (1990) Intended as a tool for physicists at CERN Aim was to help quickly share research results in collaborative projects Achieved through simple document, communications and linking standards. –simple standards caused rapid adoption
Paul Ginsparg Creator of the Los Alamos preprint archive (1991) Now contains 280,000 articles –High Energy Physics –Computing –Maths –Qualitative Biology Founder of the Open Archiving Initiative
Various Visions Wells : a centralised, managed global knowledge repository to combat fragmenting academic authority. Bush : a cross-disciplinary scholarly paradigm to combat fragmenting scientific knowledge. Engelbart : computers augment productivity Nelson : computers create a global literature Harnad : Internet to boost personal research impact Berners-Lee : low-impact, standards-based document dissemination for scientific research Ginsparg : Web to speed up personal scientific communication against publication delays
Fast Forward to Open Access The Optimal and Inevitable for Researchers. –The entire full-text refereed corpus online –On every researchers desktop, everywhere –24 hours a day –All papers citation-interlinked –Fully searchable, navigable, retrievable –For free, for all, forever Stevan Harnad, Les Carr OpCit International DLI Project Proposal (1999)
Open Archiving Initiative Initially UPS: Universal Preprint Service –discussions initiated by Los Alamos HEP archive (Paul Ginsparg) –Inaugural meeting October 1999, Santa Fe Protocols to facilitate exchange of metadata –HTTP / XML Schema / Dublin Core Data provider / service provider distinction
EPrint Archiving Software A simple, turnkey environment for setting up an OAI compliant archive –Self archiving –Institutional archives (other software available: DSpace, Fedora etc)
The Literature: As We Imagine Integrated Available
Twin Peaks Problem Access Have-Nots Harvards financial firewalls Impact
The Research-Impact Cycle Open access to research output maximizes research access maximizing (and accelerating) research impact (hence also research productivity and research progress and their rewards)
Refereed Post-Print Accepted, Certified, Published by Journal Impact cycle begins: Research is done Researchers write pre-refereeing Pre-Print Submitted to Journal Pre-Print reviewed by Peer Experts – Peer- Review Pre-Print revised by articles Authors Researchers can access the Post-Print if their university has a subscription to the Journal 12-18 Months New impact cycles: New research builds on existing research
Researchers can access the Post-Print if their university has a subscription to the Journal Refereed Post-Print Accepted, Certified, Published by Journal Impact cycle begins : Research is done Researchers write pre-refereeing Pre-Print Submitted to Journal Pre-Print reviewed by Peer Experts – Peer-Review Pre-Print revised by articles Authors Pre-Print is self- archived in Universitys Eprint Archive Post-Print is self- archived in Universitys Eprint Archive 12-18 Months New impact cycles: Self-archived research impact is greater (and faster) because access is maximized (and accelerated)
Research Impact I.measures the size of a research contribution to further research (publish or perish) II.generates further research funding III.contributes to the research productivity and financial support of the researchers institution IV.advances the researchers career V.promotes research progress
Online or Invisible? (Lawrence 2001) average of 336% more citations to online articles compared to offline articles published in the same venue Lawrence, S. (2001) Free online availability substantially increases a paper's impact Nature 411 (6837): 521. http://www.neci.nec.com/~lawrence/papers/online-nature01/
Research Assessment, Research Funding, and Citation Impact Correlation between RAE ratings and mean departmental citations +0.91 (1996) +0.86 (2001) (Psychology) RAE and citation counting measure broadly the same thing Citation counting is both more cost-effective and more transparent (Eysenck & Smith 2002) http://psyserver.pc.rhbnc.ac.uk/citations.pdf
Time-Course of Citations (red) and Usage (hits, green) Witten, Edward (1998) String Theory and Noncommutative Geometry Adv. Theor. Math. Phys. 2 : 253. 1. Preprint or Postprint appears. 2. It is downloaded (and sometimes read). 3. Eventually citations may follow (for more important papers). 4. This generates more downloads, etc.
Usage Impact is correlated with Citation Impact (Physics ArXiv: hep, astro, cond, quantum; math, comp) http://citebase.eprints.org/analysis/correlation.php http://citebase.eprints.org/analysis/correlation.php (Quartiles Q1 (lo) - Q4 (hi)) All r=.27, n=219328 Q1 (lo) r=.26, n=54832 Q2 r=.18, n=54832 Q3 r=.28, n=54832 Q4 (hi) r=.34, n=54832 hep r=.33, n=74020 Q1 (lo) r=.23, n=18505 Q2 r=.23, n=18505 Q3 r=.30, n=18505 Q4 (hi) r=.50, n=18505 (correlation is highest for high- citation papers/authors) Most papers are not cited at all Average UK downloads per paper: 10 (UK site only: 18 mirror sites in all)
Some old and new scientometric (publish or perish) indices of research impact Peer-review quality-level and citation-counts of the journal in which the article appears citation-counts for the article citation-counts for the researcher co-citations, co-text, semantic web (cited with whom/what else?) citation-counts for the preprint usage-measures (hits, webmetrics) time-course analyses, early predictors, etc. etc.
The Budapest Open Access Initiative Gold Green Two open-access strategies: Gold and Green
GoldGreen The two open-access strategies: Gold and Green Open-Access Publishing (OApub) (BOAI-2) 1.Create or Convert 23,000 open-access journals (1000 exist currently) 2.Find funding support for open-access publication costs ($500-$1500+) 3.Persuade the authors of the annual 2,500,000 articles to publish in new open-access journals instead of the existing toll-access journals Open-Access Self-Archiving (OAarch) (BOAI-1) 1.Persuade the authors of the annual 2,500,000 articles they publish in the existing toll-access journals to also self-archive them in their institutional open-access archives.
Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html The pertinent passages: Open access [means]: 1. free... [online, full-text] access 2. A complete version of the [open-access] work... is deposited... in at least one online repository... to enable open access, unrestricted distribution, [OAI] interoperability, and long-term archiving. [W]e intend to... encourag[e].. our researchers/grant recipients to publish their work according to the principles of... open access.
What is needed for open access now: goldgreen 1.Universities : Adopt a university-wide policy of making all university research output open access (via either the gold or green strategy) 2.Departments : Create and fill departmental OAI-compliant open-access archives 3.University Libraries : Provide digital library support for research self- archiving and open-access archive-maintenance. 4.Promotion Committees : Require a standardized online CV from all candidates, with refereed publications all linked to their full-texts in the open-access journal archives and/or departmental open-access archives gold green 5.Research Funders : Mandate open access for all funded research (via either the gold or green strategy). Fund (fixed, fair) open-access journal peer-review service charges. Assess research and researcher impact online (from the online CVs). goldgreen 6.Publishers : Become either gold or green.
Green Light RoMEO Directory of Publishers who have given their Green Light to Self-Archiving http://www.sherpa.ac.uk/romeo.php http://romeo.eprints.org http://www.sherpa.ac.uk/romeo.php http://romeo.eprints.org green light Proportion of journals already formally giving their green light to (already 83%): author/institution self-archiving (already 83%) continues to grow: Green light to self-archive: Journals % Publishers % 10,673(100%)88(100%) Neither yet 1,79317%3742% Preprint 3,253+30% (=83%)7+8% (=58%) Postprint 1,772+17% (=53%) 14 +16% (=50%) Postprint and Preprint 3,85536%3034%
green PUBLISHERS Percentage of green PUBLISHERS grew from 42% - 58% from 2003-2004 green JOURNALS Percentage of green JOURNALS grew from 55% - 83% from 2003-2004
OAIster, a cross-archive search engine, now covers over 250 OAI Archives (about half of them Eprints.org Archives) indexing over 3 million items (but not all research papers, and not all full-texts). http://oaister.umdl.umich.edu/o/oaister / …but there are 2.5 million journal articles published per year!
Declaration of Institutional Commitment to implementing the Berlin Declaration on open-access provision Our institution hereby commits itself to adopting and implementing an official institutional policy of providing open access to our own peer-reviewed research output -- i.e., toll-free, full-text online access, for all would-be users webwide -- in accordance with the Budapest Open Access Initiative and the Berlin Declaration UNIFIED OPEN-ACCESS PROVISION POLICY: (OAJ) Researchers publish their research in an open-access journal if a suitable one exists otherwise (OAA) Researchers publish their research in a suitable toll-access journal and also self-archive it in their own research institution's open-access research archive. To sign: http://www.eprints.org/signup/sign.php http://www.eprints.org/signup/sign.php A JISC survey (Swan & Brown 2004) "asked authors to say how they would feel if their employer or funding body required them to deposit copies of their published articles in one or more… repositories. The vast majority... said they would do so willingly. http://www.jisc.ac.uk/uploaded_documents/JISCOAreport1.pdf
Archiving: More than Articles Metadata collection and distribution Basis of OAI But extra effort for researcher
Semantic Web W3C activity to improve Web resources –By providing metadata –Formal descriptions of resources –Based on strict standards RDF - Resource Description Format RDF(S) - Schema Language for defining types or resources and types of properties OWL - Ontology language for more complex relationships
Old Web Service Web server sends a document to a user
Modern Web Services Web server sends data to a program invoice item name price ref name price number=1 number=2 type = info id = xyz
Semantic Web Semantic web provides resources to users and their semantics to computers invoice item name price ref name price number=1 number=2 type = info id = xyz
RDF: Metadata Data about data –information about documents title, author, journal, date, keywords –information about people role, history, salary, expertise –information about exhibits catalogue number, price, date, artist –information about metadata validity, purpose, compiler, authority
Catalogue information. artist, title of the image or picture, date acquired, dimensions. Syntactic content. primitive features, e.g. colour, texture and shapes. Semantic content. what its supposed to represent, e.g. painting of a landscape or a representation of happiness. Height Width TitleArtistContent: Some hills, a lake and the sun Represents: peace tranquility
RDF Model http://www.w3c.org/Intro.html Tim Berners-Lee Author
RDF Model http://www.w3c.org/Intro.html Tim Berners-Lee Author subject object predicate
RDF Model http://www.w3c.org/Intro.html author Tim Berners-Lee name email@example.com email subject object predicate
Semantic Web Examples Example Projects –CSAKTive Space –Web Photos Ontologies –Role of ontologies –How they dovetail in with OAI –Dspace / SIMILE –Bridging the semantic gap
CS AKTive Space Integrating info from –Eprint archives –Home pages –Funding agencies
Web Conference Photo Attendees upload photos for public display Can then be publicly annotated List of known people collected –community
Web Photo RDF Model Ontologies used –Dublin Core –Friend-of-a- Friend –Creative Commons Rights Management –Geographical Locations –Calendar Events
Simile DSpace / MIT / HP / W3C Semantic Web and Digital Library project Many resources in many sites catalogued with different schemes for different purposes Use ontologies to switch between domains and perform cross-domain searches
Simile Scenario Started on ARTstor island –SUBJECT: Abstract Roamed around island SUBJECT: Abstract, CREATOR: Gorky Travelled over Gorky bridge to OCW island CREATOR: Gorky, IS PART OF:... Found resource not on ARTstor island Travelled over Graham bridge To another part of ARTstor island (Taken from Dspace User Group slides)
Semantic Web raison detre Bridging between resources Through shared semantics of metadata Made possible by ontologies
Lessons for Open Access Collect and organise metadata –and explain to authors the benefits of their investments Researchers become responsible maintainers of their output –For sharing with their community –For sharing with posterity Build value-added services that build on shared agreements about meaning
Final Thoughts Open access improves science Network effect –more participants -> better services Just do it! –But start with small steps