Presentation on theme: "Evan Owens Chief Information Officer, Publishing American Institute of Physics JATS Conference 2 November 2010 The Evolving Information Ecosystem of Publishing."— Presentation transcript:
Evan Owens Chief Information Officer, Publishing American Institute of Physics JATS Conference 2 November 2010 The Evolving Information Ecosystem of Publishing
This Presentation The Past & Present Standards The Future New Challenges 2
The World View in the 1990s How to prepare for the electronic publishing future: – Create a version of record in SGML full text – Make the perfect master file – Prepare to publish simultaneously to print and online Multiple outputs was the perceived benefit of SGML How did you make that happen? – Write your own DTD – Work with your vendors – Set up SGML-based production processes A very document-centric view But what place did standards have in this picture? 3
Journal Article Standards A much cited paper on the history of journal standards: A Decade of DTDs and SGML in Scholarly Publishing: What Have We learned? Bruce Rosenblum and Irina Golfman, Extreme Markup Languages 2002 “The AAP and DTDs were important projects. They laid the structural foundations for subsequent DTDs used in journal publishing. They did not succeed, however, in their goal of becoming industry-standard DTDs. This goal was not reached because, while these DTDs were generalized for the needs of the industry, they did not meet the specific business requirements of individual organizations within the scholarly publishing community.” AAP Serial DTD (Z39.59, 1983 to 1987) ISO (ANSI 1988, ISO 1993; last updated 1995) NLM Tag Suite (v1 2003…v3 2010) NISO JATS (in progress) 4
Standards are Great: Everyone Should Have One!
Standards Role of standards – Codify existing practices – Enable new practices or technologies Success of standards – Technical value – Business / political Must meet real biz needs Costs must align with benefits Conventional wisdom in the 90s: – SGML succeeds best in highly concentrated industries with strong exchange requirements; e.g., aviation, auto, defense – Scholarly Publishing was a highly fragmented industry 6
What has Changed in the Ecosystem? Rise of aggregations Move away from proprietary delivery platforms Publishers now managing current and back content – Early online, current online, digitized back file Exchange of data has changed business needs – CrossRef for metadata – Multiple hosting, preservation for full text – Text mining will drive future Enormous amounts of content flowing around – Every publishing deal now includes “and also send to X, Y, Z” Business conditions are now ripe for standardization 7
Early Adopters Typesetting service providers saw the need for standards well before their customers: Vendor A (1990s) produced content in their internal house DTD then exported to the customer DTD Vendor B (various) produced content in the Elsevier DTD because they could, then exported to the customer DTD Vendor C (2010) would rather produce content in NLM then export to the customer’s DTD Vendor A (an early adopter) produced all content in SGML/XML workflows and just discarded it if the customer wanted only the PDF returned 8
Why Adopt NLM / JATS Now? Preaching to the choir... Delivery platform requirement Business need for compatibility Leverage the experience in the design Concentrate on your specific customizations – Rather than reinventing the wheel Good documentation University of Chicago Press moved to NLM when it moved to a shared delivery platform AIP will moved to JATS in
Where are We Now? Is the battle over? Every problem solved? Just implement NLM / JATS and all your publishing problems will be solved? 10 We may have won this battle, but the real challenges of truly digital publishing are just starting to appear. For the first decade, online journal publishing was like old wine in new bottles; now we are seeing real innovations.
SIDEBAR: Books versus Journals Strong metadata exchange needs (e.g. Amazon) – Strong standards and groups Came later to online and electronic publishing E-Book readers are intrinsically different: – External to publisher’s platform – Forces standards conformance EPUB standard – Focus was packaging rather than text structuring – But is evolving quickly A different ecosystem, but the boundaries are beginning to blur Perhaps we (books and journals) will meet in the middle? 11
The Future 12
Current and Future Trends in Journal Publishing Articles, not issues Rapid publication with limited prepress Multimedia and “supplemental” stuff Multiple “manifestations” and “expressions” – HTML, PDF, app, reader – Article, Podcast Revisions (?) Comments, annotations, blogs Magazine-like features Semantics, text mining Information, not articles 13
Ecosystem: The XML Instance We have come a long way! Mechanics are easier – Unicode, MathML, table models, etc. Managing the structure of the content – Much of this conference – XML Versioning Workshop at Balisage 2008 Managing the instances – Version & validation checking But the journal publishing world is becoming less static, less document-centric... and a lot more complicated! 14
Ecosystem: Content and Metadata The XML instance as pseudo-database: What metadata goes inside and what lives outside? – Descriptive (bibliographic) – Provenance (process history) – Structural (components) – Technical (formats, versions) Is the XML instance just a piece of a larger system? – How does it fit into a larger information architecture? – Is the XML instance where this information should live? An implementation / design decision 15
Ecosystem: Reference Linking Connecting XML documents to external resources Do we rewrite the XML or externalize the links? – An implementation question only? ApJ, NASA ADS, bibcodes – Linking identifiers that could be pre-calculated – Resolution could be added afterwards CrossRef and DOI linking – Backfill problem: early or late binding – Dynamic resolution solutions ; e.g., Elsevier, AIP – Externalizes big parts of the document 16
Ecosystem: Semantic Enrichment An old-school example: updating classification schemes – Do you update the instances retroactively? Some approaches to semantic enrichment: – Known entity identification – Generic entity extraction Resolution/identification done later – Inline markup; e.g., Entities are known in advance – Completely externalized solutions In a separate delivery system or repository In a search engine or XML database, not in the content 17
Ecosystem: Identity Management ORCID (Open Research Contributor ID) – Logistical issues: Known in advance or applied retroactively? Future publications and/or historical? Store in article instances or an external layer? Larger identity management issues: – Bibliographic identity – Business identity (author, reviewer, subscriber, etc.) – Community identity (ORCID, social networking, etc.) Another potential use of layered information architectures – Feels like an RDF kind of problem! 18
Some Things to Think About Content management strategy – Standards, standards, standards – Versioning, formats, validation, necessary metadata Information lifecycle should inform everything – Not just publish once and we’re done – Formats change, needs change, even content changes Content is going to come at us from many directions – User-contributed, not just the formal publishing process Information architecture strategy – Think beyond just fixed documents – Plan for interactions with external systems 19
NLM’s Contribution to Our Industry 20
Evan Owens Chief Information Officer, Publishing American Institute of Physics Questions? Comments?