Presentation on theme: "Stephen Rhind-Tutt, President ITHAKA Sustainable Scholarship Conference September 2011."— Presentation transcript:
Stephen Rhind-Tutt, President ITHAKA Sustainable Scholarship Conference September 2011
The Challenge By 2020 the Web will contain…??? 90% of published works prior to 1923 Majority of works published to 2020 > 20 billion pages of , phone logs, databases, blogs and websites (currently 12 billion) > 10 billion photographs > 40 million pages of facsimiles of manuscripts > 50 million audio files > 500 million video files
A Darwinian environment
–SilverPlatter MEDLINE (>$10m in sales) –Royalties to the NLM (<$200k) –Seven other vendors also making $$$ –SilverPlatter ERIC ($1.5m in sales) –Royalties to Dept. of Education (<$100k) –Many other vendors –SilverPlatter SEC Online –No royalties going back to the SEC What I remember of the environment in the early 1990s
–PubMed provides free access to the world –ERIC offered free to the world –SEC filings offered free to the world –Whats happened to the vendors? Environment in 2011
–Ovid and others continue to profit from public domain MEDLINE –New entrants – SilverChair, Collexis… –SEC filings continue to sell – Bloomberg, Yahoo and many new entrants –Aries Systems moved into publisher services –CSC provides free access to all for ERIC with a 5 year contract for $29m Environment in 2011
Whats going on?
This is a commodity…
This is not a commodity
Information isnt a commodity! Black & White Grayscale 24 bit color 48 bit color 100 dpi 600 dpi JPG TIFF Citation MARC Record Dirty OCR % rekeying Semantic Indexing Thumbnails 100 dpi Page Collection Letter Facsimiles Transcriptions EAD Finding Aid Repository Mobile Web TCP-IP
Information isnt a commodity Source: Data, Information, Knowledge, and Wisdom, Gene Bellinger, Durval Castro, Anthony Mills. thinking.org/ Who, What, When, Where? Therefore Why?
Evolution of tasks Fading Growing Typesetting Printing Print monograph Print directory Public domain reprints Simple, one database search Rare and unpublished material Linking Licensing Free materials Semantic indexing Process integration Unified search software Workflow tools Warehousing Community building Asset management Commissioning? Editorial? Quality? Selection? Speed?
With literally billions of pages… What tools will we need ? Beyond paper Higher editorial value High functionality Semantically organized More comprehensive Individually customizable Discipline, community centric Web/network centric
Add value to public domain –Rare, hard to find materials –Contextual essays and supporting material –Semantic Indexing –Unique functionality Go beyond public domain –Publish copyright material –Persuade publishers to release key content for electronic publication –Commission new material ourselves ASP experience…
The American Civil War Research Database
Women and Social Movements Collaboration with the Center for the Historical Study of Women and Gender at SUNY Binghamton and ASP Original site is free –new content is for fee. Usage across the free site dipped only slightly – more usage following commercial launch. Added video, audio, > 200k pages, new functionality.
Be of the web Music Newspapers Websites Monographs Primary Works Journals
Building the network… Unhelpful Legal warnings not to link Changing links constantly Disabling links No permanent URLs No crawling Randomly changing URLs Insisting on one interface and one access point Unattached pages Helpful Visibility Permanent URLs RSS feeds OpenURL Design for multiple interfaces Open to crawling Published open APIs Welcome linking Ask others to do the same