Presentation is loading. Please wait.

Presentation is loading. Please wait.

OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University

Similar presentations


Presentation on theme: "OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University"— Presentation transcript:

1 OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University http://staffweb.library.vanderbilt.edu/breeding Redefining Libraries: Web 2.0 and other Challenges May 2007 Xiamen, China

2 The Invisible Web A great amount of information cannot be found on the Web since it is locked inside databases. A great amount of information cannot be found on the Web since it is locked inside databases. Search engines are getting better at unlocking database content, but it involves help from site administrators Search engines are getting better at unlocking database content, but it involves help from site administrators Goal: Move TV News from the Invisible Web to the Open Web Goal: Move TV News from the Invisible Web to the Open Web

3 History and Background of the Archive Conceived by Nashville insurance executive Paul Simpson Conceived by Nashville insurance executive Paul Simpson Established by Vanderbilt University in August 1968, initially as a 3-month experiment that grew into a permanent institution. Established by Vanderbilt University in August 1968, initially as a 3-month experiment that grew into a permanent institution.

4 A Unique Archive The largest, most comprehensive collection of national broadcast news available to the general public. The largest, most comprehensive collection of national broadcast news available to the general public. Vanderbilt has systematically archived completed news programs since Aug 5, 1968 Vanderbilt has systematically archived completed news programs since Aug 5, 1968 A large amount of unique material A large amount of unique material Material available at costs affordable by scholars and researchers Material available at costs affordable by scholars and researchers Only resource where researchers can search across all the major national news networks. Only resource where researchers can search across all the major national news networks.

5 An extensive collection Over 825,000 abstracts in our news database Over 825,000 abstracts in our news database ~30,000 hours of regular nightly news programs ~30,000 hours of regular nightly news programs ~10,000 hours of special news broadcasts ~10,000 hours of special news broadcasts

6 Videotape loan service Lend material from the collection on VHS format Lend material from the collection on VHS format Compilation of requested news segments Compilation of requested news segments Duplications of complete programs Duplications of complete programs Service fees based on affiliation: Vanderbilt & Sponsors, Educational, Others Service fees based on affiliation: Vanderbilt & Sponsors, Educational, Others Material provided for viewing only. Tapes must be returned. Material provided for viewing only. Tapes must be returned. All licensing of materials borrowed must be negotiated with the original network All licensing of materials borrowed must be negotiated with the original network

7 Open Web Project

8 Problem: Hard to find us on the Web unless users already know we exist Hard to find us on the Web unless users already know we exist Searches on news content terms do not lead searchers to our Web site Searches on news content terms do not lead searchers to our Web site Content in our TV-NewsSearch trapped in a closed database Content in our TV-NewsSearch trapped in a closed database

9 Project Goals Provide better service Provide better service Expand use of the collection Expand use of the collection Increase Web site activity Increase Web site activity Boost service fee income Boost service fee income

10 Considerations and constraints Maintain value of paid subscription product. Maintain value of paid subscription product. Keep control of database content. Valuable intellectual property built through 35 years of manual labor. Keep control of database content. Valuable intellectual property built through 35 years of manual labor.

11 Business Model TV News Archive must operate on a sustainable business plan TV News Archive must operate on a sustainable business plan Mandate to eliminate VU subsidy Mandate to eliminate VU subsidy Income: Income: –Institutional subscription to online service: CNN streaming video –Stipend from the Library of Congress –Service fees for videotape loan service

12 Project strategy Project strategy Increase discovery through managed exposure of metadata to the Internet search engines Increase discovery through managed exposure of metadata to the Internet search engines Initial focus on Google since it represents the majority of Web search activity. Initial focus on Google since it represents the majority of Web search activity.

13 Troubling statistic Where do you typically begin your search for information on a particular topic? College Students Response: 89%Search engines (Google 62%) 89%Search engines (Google 62%) 2%Library Web Site (total respondents -> 1%) 2%Library Web Site (total respondents -> 1%) 2%Online Database 2%Online Database 1% E-mail 1% E-mail 1% Online News 1% Online News 1% Online bookstores 1% Online bookstores 0% Instant Messaging / Online Chat 0% Instant Messaging / Online Chat OCLC. Perceptions of Libraries and Information Resources (2005) p. 1-17.

14 Library Discovery Model Library Web Site / Catalog Web Library as search Destination

15 Web Discovery TV-NewsSearch Database Search and Retrieval + e-commerce request system TV News Web site Web Sucessful search Terms: “tv news” “vanderbilt tv archive” “vanderbilt television news archive” “news archives”

16 OpenWeb Strategy TV-NewsSearch Database Search and Retrieval + e-commerce request system Generate 805,000+ Static Pages TV News Web site OpenWeb Mirror Site Web Successful search Terms: All words and phrases in TV-NewsSearch Database

17 Implementation Details Create OpenWeb mirror site Create OpenWeb mirror site –Static Web page for each database record –Design each page to maximize content terms exposed to Google –Funnel users to existing site –Not meant to be an alternative interface

18 Generating the Open Web Perl script to systematically query each record and generate html page Perl script to systematically query each record and generate html page Create browse page to link all the record pages Create browse page to link all the record pages Processes entire database in about 2 hours Processes entire database in about 2 hours Refresh weekly Refresh weekly

19 Helping out Googlebot Google SiteMap protocol Google SiteMap protocol XML index that tells Google about your site XML index that tells Google about your site Limit of 50,000 links per index Limit of 50,000 links per index Multiple sitemaps can be tied together in a sitemap index Multiple sitemaps can be tied together in a sitemap index

20 Google Webmaster’s account Provides an interface to: Provides an interface to: –Submit sitemaps –Register sitemaps –Monitor googlebot’s access to sitemaps –Monitor how Google indexes your site –Monitor how users access your site through Google –Statistics, etc –Constantly evolving functionality.

21 OpenWeb Progress Initial planning: Jun 2005 Initial planning: Jun 2005 Generate Pages July 2005 Generate Pages July 2005 Submit html index to Google Jul 2005 Submit html index to Google Jul 2005 Submit XML sitemap: Aug 2005 Submit XML sitemap: Aug 2005

22 Monitoring Activity Analysis of Public Web logs Analysis of Public Web logs Analysis of OpenWeb logs Analysis of OpenWeb logs Impact on searching? Impact on searching? Impact on videotape requests? Impact on videotape requests? Write a script to trace each request to determine origin. Write a script to trace each request to determine origin.

23 Google Analytics Full-featured Web site use analysis utility Full-featured Web site use analysis utility Specializes in measuring site goals and conversions Specializes in measuring site goals and conversions Depends on data sent to Google via Javascript rather than Web server logs Depends on data sent to Google via Javascript rather than Web server logs

24 Results Significant improvement in the interest in the archive and in the use of the collection Significant improvement in the interest in the archive and in the use of the collection

25 Google Web Manager’s Account

26

27 Loan service income

28 New User Registration

29 Questions / Discussion For further information contact: For further information contact: Marshall Breeding Director for Innovative Technology and Research marshall.breeding@vanderbilt.edu


Download ppt "OpenWeb: Expanding access to Digital Collections Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University"

Similar presentations


Ads by Google