Presentation on theme: "OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC Members Council 17 May 2005."— Presentation transcript:
OCoLR 20041025 #53928015 OCLCR Making data work harder Lorcan Dempsey OCLC Members Council 17 May 2005
May 2005 Members Council Web hub servicesOWCPresentationExamples A comprehensive discovery experience Yes Predictable, often immediate, fulfilment In progress Data works hard Being improvedYesCurioser FAST Open to intermediate consumers In progress Co-created with users Not yetYesWorldCat Wiki
Making data work hard The user experience: from search to rich browse Capturing user contribution Data mining
May 2005 Members Council Context: value Amazoogle: we can add significant value. We should be looking for organizational frameworks within which we can do this. ROI: libraries invest in data but do not extract as much value as they might from it. Unless we release more value, then the argument for this investment becomes weaker. The user experience Management intelligence
Top Sets for Fiction (Records) RecordKeys 1,296defoe, daniel\1661 1731/robinson crusoe 1,267 carroll, lewis\1832 1898/alices adventures in wonderland 971 cervantes saavedra, miguel de\1547 1616/don quixote 828 stevenson, robert louis\1850 1894/treasure island 689 twain, mark\1835 1910/adventures of huckleberry finn 624 twain, mark\1835 1910/adventures of tom sawyer 618 swift, jonathan\1667 1745/gullivers travels
May 2005 Members Council FRBR & FAST FRBR Interim FRBR in OWC FRBR in research projects FictionFinder Curioser xISBN Algorithm Top 1000 FRBR in FirstSearch – late this year Curioser …. FAST Moving FAST headings into OpenWorldCat Experiment: mapping Yahoo! categories to FAST headings Recognized value …
May 2005 Members Council WIKI in WorldCat Capture user input in structured ways
May 2005 Members Council Extending Wikis utility Wiki: supported markup: wikitext page editing: a single text block searches: full text searching collections managed: one per wiki MetaWiki: supported markup: wikitext structured data (e.g., MARC, METS, DC…) page editing: a single text block, or, field level searches: full text searching fielded searching collections managed: one/multiple per MetaWiki Built on top of standards (OAI, OpenURL, SRU)
May 2005 Members Council Management intelligence: data mining Data Bibliographic data Transaction logs … Need to mine this data for intelligence that creates value for libraries and users OCLC Research undertaking a number of data-mining projects aimed at: Knowing more about the characteristics of library collections Creating interesting and useful data displays Generating intelligence to support library decision-making
May 2005 Members Council Know Your Audience! Implies: we can infer materials audience level from holdings patterns, which in turn can support: Collection management Readers advisory services Reference services Information retrieval Holdings represent selection decisions by librarians … implies there are about 1 billion individual selection decisions in the WorldCat holdings file Selections are made to serve the interests of a librarys target community … Associate target community (audience level) to particular library profiles - e.g., ARL, non-ARL academic, public, K-12 school … Paper forthcoming! ?
May 2005 Members Council The Implications of Google Libraries … Potentially covers about one third of print books in WorldCat ~60 percent of total G5 books held by only one of the Google 5 Less than 5 percent held by all of the Google 5 ~20 percent of total G5 print books out of copyright Paper forthcoming …
May 2005 Members Council Last Copy: Identifying At-Risk Materials ~23 million WorldCat records have only a single holding attached Libraries need to know what portions of their collections are: Rare … Rare and valuable … Last copy (artifact and/or content) Identification of rare materials essential intelligence in support of storage, digitization, and preservation decision-making Data-mining study of Vanderbilt holdings in WorldCat: Identified 23,000 items held uniquely by Vanderbilt ~60 % are print books ~60 % produced prior to 1950; ~25 % produced after 1970 Paper forthcoming!
May 2005 Members Council Looking at Library Print Book Collections … Systematically 32 million print books, representing 26 million distinct works Half of print books published after 1977; more than 80% still in copyright Rareness is common! Only a third of print books have more than five holdings; half have two or less OCLC/Ithaka collaboration: Use WorldCat to characterize the system-wide print book collection – i.e., aggregate print book holdings in WorldCat Intelligence of this kind can help establish digitization priorities and inform preservation planning More information: http://www.oclc.org/research/presentations/lavoie/cni2005.ppt Only about 120,000 works had both print book and e-book manifestations
May 2005 Members Council Thank you! OCLC Research: http://www.oclc.org/research/ http://www.oclc.org/research/