Image BioInformatics Research Group Department of Zoology University of Oxford, UK Oxford University Press 16 February 2012 The.

Image BioInformatics Research Group Department of Zoology University of Oxford, UK http:/ibrg.zoo.ox.ac.uk Oxford University Press 16 February 2012 The Future of Scholarly Publishing © David Shotton, 2012 Published under the Creative Commons Attribution-Noncommercial-Share Alike 3.0 Licence e-mail: david.shotton@zoo.ox.ac.uk David Shotton

We live in interesting times! Richard O’Beirne (Online Publishing Manager, Oxford Journals) 11 January 2012 “I've just read the Force11 White Paper – The Future of Research Communication and e-Scholarship. Very thought provoking!  “Would you be interested in speaking at OUP about it?”  “It would be great to identify actionable things we can do to move in the direction you describe.” 6 February 2012 “I see another Elsevier boycott movement is starting up...  “My feeling is that traditional commercial publishing could have its own Arab Spring, and the landscape could change very quickly...”

Outline Boycotts, enemies, prophesies and warnings  what others are saying about scholarly publishing Background - a brief introduction to my own recent activities  Adventures in Semantic Publishing – an examplar  The SPAR (Semantic Publishing and Referencing) Ontologies  The Open Citations Corpus, MIIDI and Open Research Reports Academic publishers and the digital revolution – where are we now The Force11 White Paper The Five Stars of Online Journal Articles What should OUP do?  Peer review, open access, enhanced content, datasets and metadata How to kick-start the revolution at OUP  Embrace the new technologies, and get developers involved

The Elsevier boycott A 20% increase in four days Today’s figure: 6,155 individuals

Why are people choosing to boycott Elsevier? They charge exorbitantly high prices for journals and for open access, e.g.  European J Cell Biology - institutional annual subscription, €2,038  J Web Semantics – to publish one article open access, $3,000 They force libraries to buy very large "bundles” of subscriptions in order to obtain access to the essential journals, thus increasing their income They make obscene levels of profit  an astounding 36% profit on revenues of >£2 billion in 2010 They support measures such as  The Stop Online Piracy Act (SOPA),  The Preventing Real Online Threats to Economic Creativity and Theft of Intellectual Property Act (PIPA)  The Research Works Act that aim to restrict the free exchange of information and would threaten, for example, PubMed Central as a repository for Open Access journal articles, including 13,481 OUP articles

Is OUP an enemy of science? Dr Mike Taylor is in the Department of Earth Sciences, University of Bristol

In the article, he says:

Essential reading from a true prophet of our times... http://michaelnielsen.org/blog/is-scientific-publishing-about-to-be-disrupted/ Michael cites examples of other industries that failed to appreciate a technological disruption, and as a consequence died

Remember the PDP-11? PDP-11 VAXs were 16-bit minicomputers manufactured by the Digital Equipment Corporation (DEC) from 1970 to the 1990s They were the computers to work on, leaving IBM mainframes in the dust In the 1980, 32-bit microcomputers started to out-perform them Unlike IBM, DEC did not see what was coming After a long, lingering decline, DEC ceased business in 1998 Similar stories are now unfolding for the newspaper and music industries Is academic publishing any better off?

Quotations from Michael Neilsen’s article (in 2009!) “The only way for an organization to adapt to disruptive technologies is to make drastic, painful changes.” “The last people to know an industry is dead are the people in it.” “An early sign of impending disruption is when there’s a sudden flourishing of startups serving an overlapping customer need, but whose organizational architecture is radically different  ChemSpider, Mendeley, SciVee, JOVE, OpenWetWare, WordPress.” “One element in this new flourishing scientific publication ecosystem is the rise of scientific blogs as a serious medium for research  e.g. Tim Gower’s PolyMath Project.” “When new technologies are being developed, the organizations that win are  those that aggressively take risks,  put visionary technologists in key decision-making positions, and  attain a deep organizational mastery of the relevant technologies.” “Unfortunately, few scientific publishers are attempting to become technology- driven in this way. The only major examples I know of are NPG and PLoS.”

Some background – The Semantic Web and RDF Skilful marketing of Semantic Web concepts under the banner of ‘Linked Data’ have recently brought them into widespread acceptance and use (e.g. BBC) The principles are quite simple  Facts are expressed as relationships - subject – predicate – object ‘triples’  The syntax defined by W3C’s Resource Description Framework (RDF)  Entities and their relationships are identified by unique URIs  Their meaning is defined in publicly available ontologies Examples: cerif:Project dcterms:title “The Open Citations Project”. cerif:Project foaf:homePage. Such statements can be combined into interconnected information networks (RDF graphs) – forming ‘linked data’  in which the truth content of each original statement is maintained  thereby creating a web of knowledge, the Semantic Web

The Web of Linked Data has grown over the past four years Credits: Linking Open Data cloud diagrams by Richard Cyganiak and Anja Jentzsch.http://lod-cloud.net/

Exemplar semantic publishing

What do I mean by semantic publishing? The use of simple Web and Semantic Web technologies to enhance the meaning of on-line published research articles to provide access to published data in actionable form to link articles with their cited references and other information sources to link articles to the research datasets that underpin them to provide machine-readable summaries of an article’s content to facilitate integration of semantically related scientific information from heterogeneous distributed resources so that data, information and knowledge can more easily be found, extracted, combined and reused

Examplar semantic enhancements Exemplar semantic enhancements to a research article by Reis et al. (2008) published in PLoS Neglected Tropical Diseases 2: e228 Enhanced article available at: http://dx.doi.org/10.1371/journal.pntd.0000228.x001 That work is described in: Shotton D, Portwin K, Klyne G, Miles A (2009). Adventures in semantic publishing: exemplar semantic enhancement of a research article. PLoS Computational Biology 5: e1000361. http://dx.doi.org/10.1371/journal.pcbi.1000361

The article we chose to semantically ‘enliven’

About real people suffering real disease – not just academic The study area, a favela in Salvador, Brazil, from the air Tiny houses are crowded together This photo is in dry weather, but in the rainy season the path along the stream bed along the top becomes an open sewer

The enhanced paper by Reis et al. (2008) http://dx.doi.org/10.1371/journal.pntd.0000228.x001

Highlighting of terms in the text

Links from organism names to the uBio taxonomic database Embedded link: http://www.ubio.org/browser/details.php?namebankID=2481308

Interactive figures Compare the original static image - http://bit.ly/xRVior  hard to superimpose the individual panels of the image mentally... with our interactive version of the original figure, http://imageweb.zoo.ox.ac.uk/pub/2008/datavisualisationwidgets/overlay/fig3/, in which individual panels can be dragged and superimposed on one another

Image fusion with Google Maps

Re-orderable reference list

Different types of enhancements created Better integration of the paper into the Web  Provision of hyperlinks to relevant Web sites  Live DOI links to full text of cited papers  Machine-readable metadata and reference files (RDF N3 and RDFa) Additions to the paper  The datasets in the table and figures downloadable in actionable form  Semantic mark-up of terms in the text, with links to authorities  Enhanced Portuguese Abstract; Re-orderable reference list  Interactive figures, and the Supporting Claims Tooltip (exemplars) Analysis of the content of the paper  Document summarization, including tag cloud and study summary  Citation frequency analysis and citation typing; marked-up references Data fusion (mashup) services  Geo-temporal mashups with Google Maps  Integration with relevant disease incidence data in other publications

Citation typing Better integration of the paper into the Web  Provision of hyperlinks to relevant Web sites  Live DOI links to full text of cited papers  Machine-readable metadata and reference files (RDF N3 and RDFa) Additions to the paper  The datasets in the table and figures downloadable in actionable form  Semantic mark-up of terms in the text, with links to authorities  Enhanced Portuguese Abstract; Re-orderable reference list  Interactive figures, and the Supporting Claims Tooltip (exemplars) Analysis of the content of the paper  Document summarization, including tag cloud and study summary  Citation frequency analysis and citation typing; marked-up references Data fusion (mashup) services  Geo-temporal mashups with Google Maps  Integration with relevant disease incidence data in other publications

The annotated reference list The first three references from the reference list of our enhance version of Reis et al. (2008), with the citation typing display turned on CiTO, the Citation Typing Ontology - http://purl.org/spar/cito/

Clustering of CiTO relationships by similarity Positive Agrees with Confirms Credits Supports Neutral Cites Cites as related Discusses Reviews Extends Negative Corrects Qualifies Disagrees with Disputes Refutes Critiques Parodies Ridicules Cites as authority Cites as evidence Obtains background from Obtains support from Contains assertion from Uses data from Uses method in Cites as data source Cites for information Documents Updates Includes excerpt from Includes quotation from Plagiarizes Cites as metadata document Cites as source document Shares authors with Rhetorical Factual

Uses of CiTO, the Citation Typing Ontology To permit the existence of a citation between a citing work and a cited work to be recorded in RDF cito:cites.  Even this simple statement that a citation exists opens significant possibilities, for example in enabling the easy creation of citation networks simply by combining the RDF citation lists from several papers To permit the nature of the citation between a citing work and a cited work to be characterized,  both factually reviews, sharesAuthorsWith, usesMethodIn, etc  and rhetorically confirms, corrects, refutes, etc CiTO is now part of SPAR - Semantic Publishing and Referencing Ontologies, a suite of eight generic OWL 2 DL ontologies covering all scholarly publishing  Available from http://purl.org/spar/

Ontologies for scholarly publishing

http://purl.org/spar/ SPAR – Semantic Publishing and Referencing Ontologies

The SPAR Ontologies These SPAR ontologies are described at http://purl.org/spar/ and in my blog Open Citations and Semantic Publishing at http://opencitations.wordpress.com CiTO, the Citation Typing Ontology http://purl.org/spar/cito enable characterization of the nature or type of citations, both factually and rhetorically FaBiO, the FRBR-aligned Bibliographic Ontology http://purl.org/spar/fabio is an ontology for describing bibliographic entities (books, articles, etc.) BiRO, the Bibliographic Reference Ontology http://purl.org/spar/biro is an ontology to define bibliographic records and references, and their compilation into bibliographic collections and reference lists, respectively (FaBiO and BiRO classes are structured according to the FRBR schema of Works, Expressions, Manifestations and Items)

The SPAR Ontologies, continued C4O, the Citation Counting and Context Characterization Ontology http://purl.org/spar/c4o allows the characterization of bibliographic citations in terms of their number (both locally and globally), and their textual context DoCO, the Document Components Ontology http://purl.org/spar/docooco provides a structured vocabulary of document components, both structural (e.g. heading, paragraph) and rhetorical (e.g. Abstract, Introduction) PRO, the Publishing Roles Ontology http://purl.org/spar/pro is an ontology for the roles of agents (e.g., author, editor, publisher, librarian) in the publication process, and the times during which those roles are held PSO, the Publishing Status Ontology http://purl.org/spar/pso is an ontology for the temporal status of a document (e.g. draft, under review, published, Version of Record) during the publication process PWO, the Publishing Workflow Ontology http://purl.org/spar/pwo describing the steps in the workflow associated with the publication of a document or other publication entity

Bibliographic information encoded in RDF using SPAR # The citing paper, Reis et al., 2008 a fabio:JournalArticle ; # expression frbr:realizationOf [ a fabio:ResearchPaper ] ; # work pso:holds [a pso:StatusInTime ; pso:withStatus pso:peer-reviewed ] ; cito:cites ; # Reference [6]; Ko et al., 1999 frbr:part [a biro:BibliographicReference ; biro:references ; c4o:hasInTextCitationFrequency "10"^^xsd:nonNegativeInteger ] ; cito:obtainsBackgroundFrom ; cito:usesDataFrom ; cito:confirms ; cito:extends ; cito:sharesAuthorsWith. # Reference [6], the cited paper, Ko et al., 1999 dcterms:bibliographicCitation "Ko AI, Reis MG, Ribeiro Dourado CM, Johnson WD Jr, Riley LW (1999). Urban epidemic of severe leptospirosis in Brazil. Salvador Leptospirosis Study Group. Lancet 354: 820-825."; prism:publicationDate "1999-09-04"^^xsd:date ; cito:isCitedBy ; # The citing paper, Reis et al., 2008 c4o:hasGlobalCitationFrequency [ a c4o:GlobalCitationCount ; c4o:hasGlobalCountValue "309"^^xsd:integer ; c4o:hasGlobalCountDate "2011-09-07"^^xsd:date ; c4o:hasGlobalCountSource ].

Structured summaries of articles

Enhancing metadata – the Reis et al. (2008) exemplar http://dx.doi.org/10.1371/journal.pntd.0000228.x001

Summary information from Reis et al. 2008 Impact of Environment and Social Gradient on Leptospira Infection in Urban Slums PLoS Neglected Tropical Diseases 2(4): e228. Summary at http://dx.doi.org/10.1371/ journal.pntd.0000228.x002 Limitations: 1Hand crafted 2No data model 3Not in RDF

MIIDI http://www.miidi.org/ MIIDI is a Minimal Information standard for an Infectious Disease Investigation I held an international MIIDI workshop in September 2009 to get an initial draft In January 2011, Tanya Gray started work with me to develop MIIDI properly She has now develop MIIDI into a validated XML data model, and has created a MIIDI Form that permits easy metadata entry conforming to the MIIDI standard  http://www.miidi.org:8080/input-form/ The MIIDI standard can be used not only to create structured metadata for  journal articles, but also to describe  data sets,  mathematical models,  experimental workflows and  software relevant to an infections disease investigation, providing metadata to accompany data repository deposit

The MIIDI XML data model

The MIIDI input form for Research Investigation information

The MIIDI input form - Research Investigation findings

MIIDI Study details

The MIIDI input form for Journal Article information

MIDDI output formats – HTML (also XML, JSON, RDF)

The Open Citations Corpus

The reference lists from all 204,637 articles in the Open Access Subset of PMC (as of 24 January 2011), and encoded all the citations as linked open data These reference lists contain 6,325,178 individual references, some unique, but many from different citing articles to the same highly cited papers These refer to 3,373,961 unique papers, mostly outside the Open Access Subset  ~ 20% of all PubMed papers published between 1950 and 2010  includes ALL the highly cited papers in every biomedical field These citations are now encoded as Open Linked Data, and are freely available under a CC0 waiver from http://opencitations.net/data/ via a SPARQL endpoint We would now like to expand the corpus to include reference citations from articles that presently lie outside the Open Access subset of PMC  specifically, all OUP journal article reference lists

Viewing citation networks at http://opencitations.net

The outward citation network of Reis et al. (2008)

Open Research Reports

Using the citation data on disease articles Top Papers for Open Research Reports Number of papers cited Pubmed IDs of 20 most highly cited papers (with number of times cited) Disease name 1234 Cholera1,993109523014715242645442836362251643219924 Dengue fever3,85817510324449665979421372617341557793832 HIV/AIDS54,4329516219122121678631019539414861274279883 Leprosy1,1471123400270176047181815894530131290189312 Leptospirosis9401129264047146522023712712204271502870226 Malaria25,290123688642301236479114678184013412893887101 Measles1,719117423912216262740191579884318897439213 Pneumonia6,901899508660156990795311463916491052495247 Schistosomiasis3,036158663104912973350461679038243467564440 Trypanosomiasis5,86416020726108160207257510215027574309235 Tuberculosis16,09196342301179157152831274279883838181480 Amyotrophic lateral sclerosis2,380844617046170236593211386269221521734922 Spinal muscular atrophy55578130122810339583201192556420907488415 Total exluding ALS and SMA121,271 Total124,206 Average9,554

The problem of access to the biomedical literature The free access to biomedical journals in developing countries offered by the HINARI Programme, set up in 2002 by WHO together with major publishers, is at risk The Lancet Editorial, 22 January 2011: DOI:10.1016/S0140-6736(11)60066-4 “When news came last week that several large publishers—including Elsevier (our publisher), Lippincott Williams & Wilkins, and Springer—had withdrawn journals from HINARI’s Bangladesh programme (and other countries too, such as Kenya and Nigeria), there was a collective cry of betrayal.” “Elsevier says that Bangladesh is a country that could move to a ‘discounted commercial agreement’, and that there will be other countries too.” “Our view is that any country designated as “low human development” by the UN justifies a clear and unambiguous commitment by all publishers to full and free access to research results through HINARI.”

EJE Euro 1455 The real question: How do we get from here...

... to here?

Our vision: Open Research Reports Tackling the most cited papers for major infectious diseases first, to create a structured digital summary encapsulating the basic facts in each infectious disease article, using the MIIDI standard to determine its content To use tools such as the MIIDI Input Form to facilitate this For a traditional subscription-access publisher, this is clearly a disruptive technology The question is, whether to ignore it, to fight it, or to facilitate it To publish these Open Research Reports in both human- and machine-readable form in an open access ‘instant’ journal, Open Research Reports in Infectious Disease, using Annotum

Open Research Report for Reis et al. (2008) in Annotum

Scholarly journal publishing today

Scholarly publishing really hasn’t changed much in 346 years 4 th Aug 1666 1 st Jan 1888 19 th March 2012

The structure and presentation of a journal article We still have a linear narrative, with a title and abstract at the start, a set of logically arranged sections, and a reference list at the end We now publish journals on-line as well as on paper However, in an age of multimedia, smartphones and 24/7 social networks, scholars and researchers continue to communicate their thoughts and research results primarily by means of the selective distribution of ink on paper, or at best via electronic facsimiles of the same  The norm is to publish online journal articles as static PDF file, mimicking the printed page This is totally antithetical to the spirit of the Web, and ignores its great potential

So, what’s so special about the Web? In the print world articles are finite, edited, privately, reviewed and published as finished documents – ‘versions of record’ closed world – if not in literature... discussions of content have to be pre-composed; authors must anticipate objections publication costs are high papers are of finite length paper does not scale or link references do not ‘work’ data are hard to include if an article has been published, the knowledge ‘exists’ – it’s up to the scholar to find it in the library On the Web information is extensive, scattered, of variable quality, incomplete – articles are just part of the picture open world – other sources (blogs) discussions and peer review can occur both before and after publication, as an ongoing process publication costs are very low there are no practical limits to size links are everything, the Web scales references take you to cited papers data are easy to link to if a journal article does not embrace Web technologies, it will be ignored and the knowledge effectively lost

Publishing technology – an analogy Academic publishers are thus, at this mid-point in the digital revolution, in an ill- defined transitional state—a ‘horseless carriage’ state—that lies somewhere between the world of print and paper and the world of the web and computers, with the former still exercising significantly more influence than the latter We started here: We’re now here (online): Great – that’s a significant start

Publishing technology – an analogy... but this is really where we need to be!

The Force11 White Paper and the Five Stars of Online Journal Articles

The Force11 White Paper The Force11 White Paper is the output of a workshop on the Future of Research Communication and e-Scholarship held at Schloss Dagstuhl, Germany, in August 2011, available at http://force11.org/white_paper It summarizes key problems facing scholarly publishing today, and presents a vision that addresses these problems, proposing concrete steps that key stakeholders can take to improve the state of scholarly publishing.

The Force11 diagnosis – technologies and formats Existing formats needlessly limit, inhibit and undermine effective knowledge transfer  Rethink the unit and form of the scholarly publication Improved knowledge dissemination mechanisms produce information overload  Develop tools and technologies that better support the scholarly lifecycle Claims are hard to verify and results are hard to reuse  Add data, software and workflows into the publication as first-class research objects

The Force11 diagnosis – business models and credit There is a tension between commercial publishing and the provision of unfettered access to scholarly information  Derive new financially sustainable models of open access Traditional business models of publishing are being threatened  Derive new business models for science publishers and libraries Current academic assessment models don’t adequately measure the merit of scholars and their work over the full breadth of their research outputs  Derive new methods and metrics for evaluating quality and impact that extend beyond traditional print outputs to embrace the new technologies However, it’s hard to distill specific action points from the Force11 White Paper So I’ve come up with the Five Stars of Online Journal Articles

Tim Berners-Lee’s Five Stars of Linked Open Data (from Berners-Lee, W3C Design Issues, Linked Data, 2009: http://www.w3.org/DesignIssues/LinkedData.html) ★ Make your data available on the web (in whatever format), but with an open licence, to be Open Data ★★ Make them available as machine- readable structured data (e.g. excel instead of image scan of a table) ★★★ As (2), but use non-proprietary formats (e.g. CSV instead of Excel) ★★★★ All of the above, plus: Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff ★★★★★ All the above, plus: Link your data to other people’s data to provide context

The Five Stars of Online Journal Articles ★ Peer review Ensure your article is peer reviewed as openly as possible, to provide assurance of its scholarly value, quality and integrity ★ Open access Ensure others have cost-free open access both to read and to reuse your published article, to ensure its greatest possible readership and usefulness ★ Enriched content Use the full potential of Web technologies and Web standards to provide interactivity and semantic enrichment to the content of your online article ★ Available datasets Ensure that all the data supporting the results you report are published under an open license, with sufficient metadata to enable their re-interpretation and reuse ★ Machine-readable metadata Publish machine-readable metadata describing both your article and your cited references, so that these descriptions can be discovered and reused automatically

The Five Stars of Online Journal Articles Shotton D (2012). The Five Stars of Online Journal Articles – an article evaluation framework. D-Lib Magazine 18 (1/2), p 1. doi:10.1045/january2012-shotton Available datasets 0 No published data 1 Figures and tables available for download 2 Article data downloadable in actionable form 3 Underlying datasets available 4 Data available to peer-reviewers Peer review Machine-readable metadata Available datasets Enriched content Open access The proposed Five Stars of Online Journal Articles are complementary, forming a constellation arranged along five independent axes within a multi-dimensional publishing universe, each of which can be evaluated on its own merits The degree of achievement along each of these publishing axes can vary, equivalent to the different stars within the constellation shining with varying luminosities

Reis et al.’s Five Stars, before and after enhancement Reis et al. (2008) PLoS Neglected Tropical Diseases 2: e228 As originally published After semantic enhancement doi:10.1371/journal.pntd.0000228 doi:10.1371/journal.pntd.0000228.x001 Overall rating: 9 out of 20 Overall rating: 16 out of 20

So what can OUP do? OUP has significant advantages A university publishing house – indeed a ‘department’ of Oxford University  with a primary loyalty to scholarship, rather than to financial profit  and to researchers, rather than to shareholders An organization that has already made bold moves into Open Access  its Open Access articles are under a Creative Commons attribution license that permits non-commercial text mining and reuse Oxford Journals is (hopefully)  small and agile enough to change  yet large enough to influence others in the publishing world, leading by example Let’s take a look at what OUP could do, relative to each of the Five Stars

★ Peer review Responsive peer review At the very least, ensure all OUP journals permit authors to made substantive responses to reviewers’ comments before articles are accepted or rejected  e.g. at present, Rheumatology only permits appeals after rejection Post-publication peer comments Permit readers to post attributed comments after publication of the article Open peer review Consider adopting an entirely transparent open peer review process  Submitted manuscripts immediately put on the journal's website  Reviews and comments from readers are welcomed, and are considered alongside the formal peer reviews solicited from experts by the journal  All the reviews, the author’s responses, and the original and final versions of the article are published, and the appointed reviewers and editors are acknowledged by name in the final version

★ Open access

Expand the number of your journals that are fully Open Access Expand the number of your journals in which authors can elect, for payment of a fee, that their article should be published as OA under a Creative Commons attribution license Provide an API so that the text of OA articles can be accessed programmatically Free the reference lists of all OUP journal articles from the copyright restrictions that surround the body text, and make these data fully open under a Creative Commons CCZero open data waiver that does permit commercial re-use  See the Panton Principles of open scientific data (http://pantonprinciples.org/)  and the OKF definition of what is truly ‘open’ (http://opendefinition.org/) so they can be included in the Open Citations Corpus for everyone’s benefit  CrossRef are very willing to cooperate in this, working with publishers Publish citation data for all OUP articles in RDF at an OUP SPARQL endpoint

★ Enriched content Encourage authors to provide Web links to information and sites of direct relevance to the article, for example to authors’ home pages, suppliers' catalogues and biological databases. Ensure there are links to all cited articles. Provide semantic enrichment of the text:  Key terms and concepts within the text identified and linked to external resources, e.g. definitions and databases  Mouse-over pop-ups providing information pulled by live Web services  References with citation typing using CiTO Provide ‘lively’ content: interactive figures, semantic lenses revealing numerical data beneath graphs, pop-ups providing excerpts from cited papers relevant to the textual citation contexts, re-orderable reference lists, etc. Enable data fusions (“mash-ups”) with pre-existing information from other articles, databases or elsewhere on the Web  e.g. publish KML files to enable visualization of geographical location data within the article in data fusions with Google Maps.

★ Available datasets The longevity of data submitted as supplementary files is notoriously short Anderson et al. BMC Bioinformatics 2006 7:260 doi:10.1186/1471-2105-7-260

★ Available datasets Supplementary files are also difficult to find, and hard to cite So set in place policies that authors should archive and publish the datasets underpinning their OUP journal articles in an appropriate data repository For data in the figures and tables of research articles, ensure these are made available in actionable form, e.g. as CSV or spreadsheet files, not as images  ideally in a generic CSV text format rather than in a proprietary spreadsheet format such as MicroSoft Excel Make the datasets associated with the article available to peer-reviewers  Dryad will facilitate this For example, the Dryad Data Repository (http://datadryad.org/) Several OUP journals have already signed up as part of our Dryad-UK Project

★ Machine-readable metadata Ensure article metadata are available in machine-readable form, ideally as RDF Such RDF should  be embedded within the XHTML of the article’s Web page using RDFa  and also available as a downloadable RDF/XML or Turtle files The metadata should cover (in increasing order of complexity!)  The bibliographic record of the journal article (i.e. its own citation)  The references cited by the OUP journal article (i.e. the reference list)  A structured summary of the content of the OUP journal article, for example using MIBIO, the generic (non-disease) version of MIIDI that we have developed for other journal articles in the life science domain These metadata should be published under a CCZero open data waiver

Metadata for describing bibliographic entities – next steps The National Library of Medicine DTD has become the de facto standard for many publishers, including OUP, to create XML mark-up for journal articles The most recent version of the NLM DTD is the Journal Article Tag Suite (JATS), now a NISO standard (NISO Z39.96) In collaboration with Deborah Lapeyre of Mulberry, who created it, we plan to map the Journal Article Tag Suite to RDF, using SPAR and other appropriate ontologies (Dublin Core, PRISM, FRBR), and to publish this mapping as open data  We would welcome the collaboration of OUP in doing this OUP should ensure that the XML and/or RDF structural markup of its journal articles is available to readers, not thrown away during creation of a PDF

How to move forward How will OUP become the Ferrari of publishers? I have made 18 suggestions for moving Oxford Journals firmly into the new Web world Start with some of the low-hanging fruit ★ P - Author responses and post-publication comments ★ O - An API to permit text mining of OUP open access articles ★ E - Richer web links, and KLM files for Google Map mashups ★ A - Encourage dataset submission to Dryad ★ M - Publish JATS markup and reference lists as RDF I would be delighted to collaborate with you to facilitate these changes

Let developers create cool apps over OUP content Let’s arrange an OUP Hackathon and invite all the bright minds

Remember the Elsevier Grand Challenge? Elsevier gave text mining access to its entire life sciences corpus, and encouraged competitors to invent new creative ways of using its content The winner, Reflect (http://reflect.ws/), is a Web service that provides on-the-fly entity extraction, markup and database links for protein and gene names What about an OUP Grand Challenge, to see what clever minds can do with your content OUP GRAND CHALLENGE Steve Wan, a runner-up, used his CSIBS text mining system to automate the creation of ‘citations in context’, and has since come to work with me to implement this across the PubMed Central Open Access subset

A word of wisdom from the past Come writers and critics who prophesize with your pen And keep your eyes wide, the chance wont come again And don’t speak too soon for the wheels still in spin And there’s no tellin’ who that it’s namin’, For the loser now will be later to win For the times they are a-changin’. Bob Dylan, 1964 Semantic enrichments and open publishing are already happening Continuing to do the same old thing is not an option at the present time!

Image BioInformatics Research Group Department of Zoology University of Oxford, UK Oxford University Press 16 February 2012 The.

Similar presentations

Presentation on theme: "Image BioInformatics Research Group Department of Zoology University of Oxford, UK Oxford University Press 16 February 2012 The."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Image BioInformatics Research Group Department of Zoology University of Oxford, UK Oxford University Press 16 February 2012 The.

Similar presentations

Presentation on theme: "Image BioInformatics Research Group Department of Zoology University of Oxford, UK Oxford University Press 16 February 2012 The."— Presentation transcript:

Similar presentations

About project

Feedback