YAHOOGLE! GREAT NEWS FOR SEARCHERS! Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member.

Slides:



Advertisements
Similar presentations
EBooks and Audiobooks. This class will give you an overview of eBooks and electronic Audiobooks available from the Library. We will also explain the basic.
Advertisements

Basic Searching Engineering Village. Agenda What is Engineering Village? Setting up a personal account Searching Engineering Village How to.
Accessing and Using the e-Book Collection from EBSCOhost ® When an arrow appears, click to proceed to the next slide at your own pace. To go back, click.
Modern Language Association (MLA) International Bibliography Hosted by Gale Cengage Welcome to our Guided Tour Tour takes about 7 minutes. The show will.
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
Advanced Searching Engineering Village.
Engineering Village ™ Basic Searching.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Information & Library Services Australian Education Index, British Education Index and ERIC Sally Giffen August 2006.
Engineering Village ™ ® Basic Searching On Compendex ®
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
ADMINISTRATION Sources of Information REVISION – BLOCK 6.
Chapter 5 Searching for Truth: Locating Information on the WWW.
Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
Introduction Web Development II 5 th February. Introduction to Web Development Search engines Discussion boards, bulletin boards, other online collaboration.
Google Account Basics: Getting Started with free Google applications.
Creating Online Class Communities Jennifer Dorman Discovery Education
Alexander Hartmann.  Free service offered by Google that generates detailed statistics about the visitors to a website. A premium version is also available.
Search Optimization Techniques Dan Belhassen greatBIGnews.com Modern Earth Inc.
Yahoo! Proprietary. Not for re-distribution. 0  Trip Planner is a tool to help consumers envision, research, plan, and share their travel experience 
New Web of Science Rachel Mangan Customer Education
PowerPoint Presentation to Accompany GO! with Internet Explorer 9 Getting Started Chapter 3 Exploring the World Wide Web with Internet Explorer 9.
With Internet Explorer 9 Getting Started© 2013 Pearson Education, Inc. Publishing as Prentice Hall1 Exploring the World Wide Web with Internet Explorer.
Web 2.0: Concepts and Applications 2 Publishing Online.
1 Web Developer Foundations: Using XHTML Chapter 11 Web Page Promotion Concepts.
MyiLibrary® ‘Search & View’ Website Training June 8, 2010.
1 Web Developer & Design Foundations with XHTML Chapter 13 Key Concepts.
Wasim Rangoonwala ID# CS-460 Computer Security “Privacy is the claim of individuals, groups or institutions to determine for themselves when,
Chapter 5 Searching for Truth: Locating Information on the WWW.
Search Engine Marketing Shelly Brown Director of Web Services Southwest Baptist University.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
©2006, CSA Creating and Managing Your COS Expertise Profile Managing Your CV and Promoting Your Work ® Resources for Research, Worldwide.
What is the Internet? Internet: The Internet, in simplest terms, is the large group of millions of computers around the world that are all connected to.
Tutorial 1: Browser Basics.
Business Software What is database software? p. 145 Allows you to create, access, and manage data Add, change, delete, sort, and retrieve data Next.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Chapter 8 Browsing and Searching the Web. Browsing and Searching the Web FAQs: – What’s a Web page? – What’s a URL? – How does a browser work? – How do.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Fourth Edition Discovering the Internet Discovering the Internet Complete Concepts and Techniques, Second Edition Chapter 3 Searching the Web.
Do's and don'ts to improve your site's ranking … Presentation by:
UNESCO ICTLIP Module 1. Lesson 61 Introduction to Information and Communication Technologies Lesson 6. What is the Internet?
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Chapter 8 Browsing and Searching the Web. 2Practical PC 5 th Edition Chapter 8 Getting Started In this Chapter, you will learn: − What is a Web page −
Strategies for Conducting Research on the Internet Angela Carritt User Coordinator, Oxford University Library Services Angela Carritt User Education Coordinator,
Discovering Computers Fundamentals, Third Edition CGS 1000 Introduction to Computers and Technology Spring 2007.
PubMed Overview From the main HINARI webpage, we can access PubMed by clicking on Search HINARI journal articles through PubMed (Medline). Note: If you.
1 OSTI - Accelerating Science Information Dr. Walter L. Warnick Director U.S. Department of Energy Office of Scientific and Technical Information Federal.
MODULE 3 Internet Basics © Paradigm Publishing, Inc.1.
1 SEARCHING FOR TRUTH Locating Information on the WWW chapter 5.
Selected Internet Search Engines Search Engine Database Advanced/ Boolean Other search options Miscellaneous Google Google google.co m Advanced Search.
Internet Power Searching: Finding Pearls in a Zillion Grains of Sand By Daniel Arze.
Computer Skills (1) Internet Explorer. To open the Internet Explorer: –Double click on the Internet Explorer icon on Desktop. –Or, from Start  All Programs.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Effective Internet Search Strategies: Search Engines & Directories Wendy E. Moore, M.S. in L.S. Acquisitions/Serials Librarian University of Georgia School.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Discovering Computers Fundamentals, 2011 Edition Living in a Digital World.
The Internet and the WWW IT-IDT-5.1. History of the Internet How did the Internet originate? Goal: To function if part of network were disabled Became.
Searching the Web for academic information Ruth Stubbings.
Chapter 8 Browsing and Searching the Web
LMEvents SharePoint Portal How-to Guide
WorldCat: Broad Web visibility for our collection
InfoTrac/PowerSearch Interface Enhancements
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Presentation transcript:

YAHOOGLE! GREAT NEWS FOR SEARCHERS! Michael Hunter Reference Librarian Hobart and William Smith Colleges For Rochester Regional Library Council Member Libraries’ Staff Sponsored by the Rochester Regional Library Council Supported by Regional Bibliographic Databases and Resources Sharing (RBDB) funds granted by the New York State Library 2006

For Today ….. The Landscape of Search Today The Landscape of Search Today A Look Under the Hoods A Look Under the Hoods Database Database Ranking Ranking Personalization Personalization Search Features Search Features E-text initiatives E-text initiatives Foreign language content Foreign language content Yahoogle and World Politics Yahoogle and World Politics

Web 2006 Who’s crawling the Web? Yahoo Yahoo Owns AlltheWeb, Altavista, Inktomi, Overture Owns AlltheWeb, Altavista, Inktomi, Overture Google Google MSN MSN AskJeeves owns Teoma AskJeeves owns Teoma Gigablast Gigablast NOTE: Ownership is different from NOTE: Ownership is different from database affiliation database affiliation

Most popular services As of July, 2005 Google 48% Google 48% Yahoo29% (up 20% from 2004) Yahoo29% (up 20% from 2004) MSN 8% (up 30% from 2004) MSN 8% (up 30% from 2004) All others 15% (AOL, AJ, Net, Gig) All others 15% (AOL, AJ, Net, Gig) Study by Harris Interactive (must purchase) Study by Harris Interactive (must purchase) Reported in Search Day, 7/12/05 Reported in Search Day, 7/12/05 rticle.php/

The Landscape of Search: Search Engine Overlap Results compared from 12,500 random queries from the largest engines Results compared from 12,500 random queries from the largest engines 85% were unique to one engine 85% were unique to one engine 11% were shared by any two 11% were shared by any two 3% were shared by any three 3% were shared by any three 1% were shared by all 1% were shared by all Study by Dogpile, U Pittsburgh and Penn State Study by Dogpile, U Pittsburgh and Penn State CompareSearchEngines.dogpile.com/OverlapAnal ysis CompareSearchEngines.dogpile.com/OverlapAnal ysis

Yahoo! Inc Jerry Yang, David Filo (Stanford) 1994 Jerry Yang, David Filo (Stanford) “Yet Another Hierarchical Officious Oracle” Human-edited subject directory 1996 IPO 1996 IPO 2001 Major financial crisis and massive layoffs 2001 Major financial crisis and massive layoffs 2002 Crawler results from Google added as a separate service (subject directory continues) 2002 Crawler results from Google added as a separate service (subject directory continues) 2004 Replaced Google with its own crawler 2004 Replaced Google with its own crawler

Yahoo! Inc. Revenue – Commercial sites pay for inclusion, advertising and pay-for- placement Revenue – Commercial sites pay for inclusion, advertising and pay-for- placement Paradigm – Portal “your home on the Web” Paradigm – Portal “your home on the Web” Ethos – Human input from the beginning Ethos – Human input from the beginning

Google Inc Sergey Brin, Lary Page (Stanford) 1998 Sergey Brin, Lary Page (Stanford) Play on “googol” (a coined word for 1 followed by 100 zeros) 2001-present Acquired/developed 2001-present Acquired/developed User-responsive ranking algorithms Application Programming Interfaces (API) Usenet archiveNews FroogleBloggerVideo Search and Store LocalMobilePrintBase 2004 IPO 2004 IPO r

Google Inc. Revenue – Advertising, pay for placement (Sponsored Links, Adwords); Content (Video store, Google Base) Revenue – Advertising, pay for placement (Sponsored Links, Adwords); Content (Video store, Google Base) Paradigm – “Clean” look (hidden portal) Paradigm – “Clean” look (hidden portal) Ethos – Technology-based from the beginning; prides itself on a minimum of direct human input in results ranking Ethos – Technology-based from the beginning; prides itself on a minimum of direct human input in results ranking

A Few Yahoogle Metrics Keynote Systems study Dec., ’05 Keynote Systems study Dec., ’05 % reporting task success for Local search % reporting task success for Local search Y – 82%G – 83% % reporting task success for Image search % reporting task success for Image search Y – 66%G – 71% Study of News site popularity by Greg Jarboe Oct., ’05 Study of News site popularity by Greg Jarboe Oct., ’05 #1 Yahoo #2 CNN #3 MSNBC #14 Google

A Look Under the Hoods: Database, Ranking, Personalization Search Features, E-text, Foreign Language Content

Database Size August End of “Size Wars”? August End of “Size Wars”? Y “19 billion web documents” G page count over 8 billion; page count removed soon afterwards March 15, 2006 – Searching for a filtered word with “Strict Filtering” activated March 15, 2006 – Searching for a filtered word with “Strict Filtering” activated Y 44.3 billion (web index) G 25.3 billion (web index)

Database Freshness G Aims to completely refresh its entire database every 3 weeks. G Aims to completely refresh its entire database every 3 weeks. G News sites, blog sites and other rapidly changing sites are crawled every hour or sooner G News sites, blog sites and other rapidly changing sites are crawled every hour or sooner G Sites that change little are re-crawled less frequently G Sites that change little are re-crawled less frequently Study at U. of Dusseldorf (eprints.rclis.org/archive/ /01/jis_ preprint.pdf) G “best overall”; Y updating “more chaotic” Study at U. of Dusseldorf (eprints.rclis.org/archive/ /01/jis_ preprint.pdf) G “best overall”; Y updating “more chaotic”

Depth of Indexing Y includes first 500 K of each page crawled Y includes first 500 K of each page crawled G includes first 101 K of each page crawled G includes first 101 K of each page crawled G includes “partially indexed” or “unindexed” pages G includes “partially indexed” or “unindexed” pages

What types of pages in Google are unindexed? Dead or inaccurate links Dead or inaccurate links Duplicate pages Duplicate pages Database-generated URLs Database-generated URLs Pages with robots.txt or noindex meta tags Pages with robots.txt or noindex meta tags Pages on an intranet Pages on an intranet Pages “waiting” to be indexed fully Pages “waiting” to be indexed fully

Non-html filetypes No reliable current 3 rd party data available No reliable current 3 rd party data available Search term: bush (3/12/06) Search term: bush (3/12/06).pdfY 5.9mG 20m.docY 719kG 1m.pptY 1kG 1k

Results Ranking Relevancy and currency increasingly important Yahoo Yahoo Purchased AV, AlltheWeb Blends hits from its Directory with crawler results Discloses little about its relevancy processing

Results Ranking at Google PageRank, Hilltop and more! PageRank’s link-based processing: PageRank’s link-based processing: Layer I Layer I Do others think your site is of value as demonstrated by linking to you? Do others think your site is of value as demonstrated by linking to you? IF SO … Layer II Layer II Are these “others” in turn linked to by sites recognized through linkage within “web communities”? Are these “others” in turn linked to by sites recognized through linkage within “web communities”?

PageRank’s Multi-layered processing A Favorable Ranking Scenario A Favorable Ranking Scenario A.com site selling prosthetics linked TO by TO by A local orthopedic association in turn linked TO by A national orthopedic group in turn linked TO by The National Institutes of Health

The trouble with PR … PR allocates a value of authority to a page based on the number and quality of sites that link to it PR allocates a value of authority to a page based on the number and quality of sites that link to it A site with a high PR score MAY contain a page matching a query but not be “authoritative” for the topic of that query. A site with a high PR score MAY contain a page matching a query but not be “authoritative” for the topic of that query. Hilltop determines the authority of a page relative to the query or search term(s). A single page will rank differently depending on the query. Hilltop determines the authority of a page relative to the query or search term(s). A single page will rank differently depending on the query.

How does Hilltop do this? Identifies “expert documents” Identifies “expert documents” Widely recognized, high quality directories of links or subject metasites Widely recognized, high quality directories of links or subject metasites Open Directory, UK’s RDN, WWW Virtual Library et. al. Open Directory, UK’s RDN, WWW Virtual Library et. al. Runs terms from a given query against these expert documents Runs terms from a given query against these expert documents Filters out duplicates and affiliated sites Filters out duplicates and affiliated sites Creates a subset relevant to the query Creates a subset relevant to the query

How does Hilltop do this? Runs the query in the main Google database Runs the query in the main Google database Assigns a LocalScore to these results based on the linkage to the subset created from “expert documents” Assigns a LocalScore to these results based on the linkage to the subset created from “expert documents” Final ranking based on this and PR, on- the-page factors and more. Final ranking based on this and PR, on- the-page factors and more.

The trouble with Hilltop … Dependent on “expert documents” Dependent on “expert documents” Most effective with broad subject queries Most effective with broad subject queries Must find a minimum of 2 “expert documents” linked to a page or results returned are zero Must find a minimum of 2 “expert documents” linked to a page or results returned are zero PR and other ranking processes then take over PR and other ranking processes then take over

Personalization Re-orders search results based on user’s past searches and click tracks Re-orders search results based on user’s past searches and click tracks Ranking will change, depending on user profiles Ranking will change, depending on user profiles Requires setting up a (free) account Requires setting up a (free) account Personalized home page is offered Personalized home page is offered Complex profiles are problematic Complex profiles are problematic eg. “Movies, computer hardware, the Internet, general news, astronomy” SEARCH: cars SEARCH: cars Which categories take precedence over others?????

Yahoo! Personalization No statement concerning a user’s search records if My Yahoo! is terminated by the users No statement concerning a user’s search records if My Yahoo! is terminated by the users Search log data for all My Yahoo! searches kept (via cookies) Search log data for all My Yahoo! searches kept (via cookies) Yahoo 360 creates an online identity: photos, restaurant reviews, personal blog and more Yahoo 360 creates an online identity: photos, restaurant reviews, personal blog and more Yahoo!’s privacy policy: Yahoo!’s privacy policy:privacy.yahoo.com

Google Personalization Search records personally associated with a user are deleted if service is dropped Search records personally associated with a user are deleted if service is dropped Search log data for all Google searches kept (via cookies) Search log data for all Google searches kept (via cookies) Google’s privacy policy: Google’s privacy policy: Bookmark entire web pages Bookmark entire web pages

Search Features Yahoo!

Yahoo! Three ways in Portal home page (all services) search.yahoo.com search.yahoo.com Crawler only dir.yahoo.com dir.yahoo.com Subject directory only

Yahoo! Tabs for Images, Audio, Video, Local, News, Shopping Tabs for Images, Audio, Video, Local, News, Shopping Advanced Search Features Advanced Search Features Vertical Search Engines Vertical Search Engines Music, health, finance, shopping and over 20 more Music, health, finance, shopping and over 20 more Don’t forget the Subject Directory (now further down on the search page) Don’t forget the Subject Directory (now further down on the search page) Alerts for news, weather, sports and more Alerts for news, weather, sports and more

Yahoo!’s Contextual Searching - Y!Q Selected web pages or highlighted sections analyzed for word frequency and “concept extraction” and used as basis for a search Selected web pages or highlighted sections analyzed for word frequency and “concept extraction” and used as basis for a search Results give basis for query in “context selection box” Results give basis for query in “context selection box” Refinements include removing unwanted terms/phrases and “more like this” link Refinements include removing unwanted terms/phrases and “more like this” link Requires download of free toolbar Requires download of free toolbartoolbar.yahoo.com

Yahoo/OCLC toolbar Searchers may restrict their results to the Open World Cat database, currently at 57 million records Searchers may restrict their results to the Open World Cat database, currently at 57 million records Displays library holdings in the searcher’s vicinity Displays library holdings in the searcher’s vicinity Download (free) at Download (free) at

A lot to Yahoo! about A lot to Yahoo! about RSS feeds RSS feeds Offered as part of My Yahoo Offered as part of My Yahoo User-friendly Reader/Aggregator provided; limited to 250,000 Yahoo-selected feeds User-friendly Reader/Aggregator provided; limited to 250,000 Yahoo-selected feeds Yahoo content as RSS: News, Ask Yahoo, Buzz Index (popular searches), News Groups Yahoo content as RSS: News, Ask Yahoo, Buzz Index (popular searches), News Groups Video search (beta) //video.search.yahoo.com Video search (beta) //video.search.yahoo.com Advanced search features: KW, format, file size, length, content filter Advanced search features: KW, format, file size, length, content filter Creative commons search.yahoo.com/cc Creative commons search.yahoo.com/cc Content that is free to share or modify Content that is free to share or modify

Search Features Google

Google Alerts Automated running of user-created saved searches once a day or once a week Automated running of user-created saved searches once a day or once a week Examines the top 10 news results and the top 20 web search results and s you any that you haven’t seen before Examines the top 10 news results and the top 20 web search results and s you any that you haven’t seen before Requires a profile for each alert Requires a profile for each alert Available in RSS format Available in RSS format Alerts also available for Google News ONLY Alerts also available for Google News ONLY

Search by Number Enter number in main search box for Enter number in main search box for UPS Tracking # UPS Tracking # FAA airplane registration # FAA airplane registration # Enter number preceded by prefix for Enter number preceded by prefix for FedEx tracking “fedex xxxxxxx” FedEx tracking “fedex xxxxxxx” Patent “patent xxxxxxx” Patent “patent xxxxxxx” FCC equipment id’s “fcc xxxxxxxxxx” FCC equipment id’s “fcc xxxxxxxxxx” Current weather at US airports, from FAA’s Air Traffic Control System Current weather at US airports, from FAA’s Air Traffic Control System 3-letter code with airport “roc airport” 3-letter code with airport “roc airport”

Google Answers Fee Based answer service Fee Based answer service User sets fee ($2.50-up) and time frame for question (Guidelines offered) User sets fee ($2.50-up) and time frame for question (Guidelines offered) Searchable archive available Searchable archive available Comments can be added (by anyone) to unanswered questions Comments can be added (by anyone) to unanswered questions Users rate answers Users rate answers

Google Answers Who are the “researchers”? Must be 18 years old Must be 18 years old Write an essay on why you want to be a researcher Write an essay on why you want to be a researcher Answer 5 sample questions Answer 5 sample questions Training manual available at Training manual available at researchertraining.html

Google’s API Application Program Interface Free programs for developers and researchers interested in incorporating Google in their applications Free programs for developers and researchers interested in incorporating Google in their applications Iterative searches on a topic (SDI) Iterative searches on a topic (SDI) Search via non-html interfaces Search via non-html interfaces Games that play with Web information Games that play with Web information Daily limit of 1,000 queries Daily limit of 1,000 queries Uses SOAP (Simple Object Access Protocol) that is XML-based Uses SOAP (Simple Object Access Protocol) that is XML-based More at //google.com/apis/index.html More at //google.com/apis/index.html

Froogle Locates information about products for sale online Locates information about products for sale online Gives URL’s of sites offering the item Gives URL’s of sites offering the item Provides links to exact page in the site where you can make the purchase Provides links to exact page in the site where you can make the purchase

Froogle Ranking follows normal Google ranking processes Ranking follows normal Google ranking processes Paid placements always clearly marked Paid placements always clearly marked “Sort by price” “Sort by price” Access at or via Google home page Access at or via Google home pagehttp://froogle.google.com

Google Google Earth earth.google.com Geographic search application Geographic search application Originally Keyhole 3D, now a free Google download Originally Keyhole 3D, now a free Google download Images taken by satellites and aircraft “sometime in the last 3 years” Images taken by satellites and aircraft “sometime in the last 3 years” “Fly to” accepts an address or co-ordinates, returns a view from 3,000 ft. above, with zoom capabilities “Fly to” accepts an address or co-ordinates, returns a view from 3,000 ft. above, with zoom capabilities

Google Local for Mobile google.com/gmm Free download Free download Unique ID associated with your phone Unique ID associated with your phone Simplified version of the web-based Local Search Simplified version of the web-based Local Search Emphasis on maps and directions Emphasis on maps and directions Point-to-point directions limited to a certain area Point-to-point directions limited to a certain area Business listings offer address and phone number only Business listings offer address and phone number only Does not support all mobile phones Does not support all mobile phones

Google Video Search/Store video.google.com Index of closed captioning and text descriptions from selected TV and other video content after Dec Index of closed captioning and text descriptions from selected TV and other video content after Dec Results can include snippet, description, source, date, duration and hyperlink Results can include snippet, description, source, date, duration and hyperlink Search results can be sorted by Free or For Sale Search results can be sorted by Free or For Sale Purchasing information prominent Purchasing information prominent

Google Base base.google.com Allows users to send G information they would like for it to have Allows users to send G information they would like for it to have A place to list and describe anything you would like including A place to list and describe anything you would like including Events & ActivitiesJobsProducts Events & ActivitiesJobsProducts ServicesPersonal ProfilesReviews ServicesPersonal ProfilesReviews VehiclesWant AdsNews VehiclesWant AdsNews Selling possible Selling possible

Google Desktop 3.0’s “Search Across Computers” Allows users to search across all their computers Allows users to search across all their computers Requires user to install and configure the feature Requires user to install and configure the feature G uploads files from your computers, indexes them and transfers them to your other computers and deletes them from its servers G uploads files from your computers, indexes them and transfers them to your other computers and deletes them from its servers All computers involved must be online at the time All computers involved must be online at the time

Google Desktop 3.0’s “Search Across Computers” If one computer is not online no data transfer can occur and the files remain on G’s servers for up to 30 days, when they are deleted. If one computer is not online no data transfer can occur and the files remain on G’s servers for up to 30 days, when they are deleted. If service is deactivated “some personal account information” may stay on G servers for up to 60 days If service is deactivated “some personal account information” may stay on G servers for up to 60 days Deletions delayed due to G’s backup processes Deletions delayed due to G’s backup processes

Yahoogle and Blogs Search restricted to Search restricted to Alito Y 210G 23 Nancy SinatraY 201G 116 G’s blogsearch.google.com (all blogs) G’s blogsearch.google.com (all blogs) Nancy Sinatra 13,482 Y has no separate blog search function Y has no separate blog search function

Electronic Text Initiatives

Yahoo!’s Open Content Alliance (10/3/05) Yahoo!’s Open Content Alliance (10/3/05) Large scale E-text initiative Large scale E-text initiative Members include Yahoo, Internet Archive, National Archives (UK), RLG, LC, 8 US and 6 Canadian Universities Members include Yahoo, Internet Archive, National Archives (UK), RLG, LC, 8 US and 6 Canadian Universities Over 25,000 Digitized copies of public domain AND copyrighted works Over 25,000 Digitized copies of public domain AND copyrighted works Works under copyright only available if permission granted by owner Works under copyright only available if permission granted by owner Yahoo plans to include the content in its database or subject directory Yahoo plans to include the content in its database or subject directory

Yahoo! and content for $$$

Google Print’s 2 divisions Publisher Program and Library Project Publisher Program Publishers authorize G. to scan and make searchable the full text of their books Publishers authorize G. to scan and make searchable the full text of their books Users see only the full page containing their search terms Users see only the full page containing their search terms Link to purchase copy Link to purchase copy

Google Print’s 2 divisions Publisher Program and Library Project Library Project Scan and make searchable 15 million books, in and out of copyright, from Harvard, Stanford, Oxford, U. Michigan and NYPL Scan and make searchable 15 million books, in and out of copyright, from Harvard, Stanford, Oxford, U. Michigan and NYPL For works in copyright, users see only a few sentences around search terms For works in copyright, users see only a few sentences around search terms Users may browse full text of public domain works Users may browse full text of public domain works NOTE: Not possible to print ANY material from either Google Print project NOTE: Not possible to print ANY material from either Google Print project

Library Project in 2005 June – Assoc. of American Publishers question legality of Library Project June – Assoc. of American Publishers question legality of Library Project August 15 – G. “temporarily halts” scanning in-copyright works; continues scanning public domain works August 15 – G. “temporarily halts” scanning in-copyright works; continues scanning public domain works September 20 – Author’s Guild files a formal complaint against G. in NY Federal District Court alleging “massive copyright infringement” September 20 – Author’s Guild files a formal complaint against G. in NY Federal District Court alleging “massive copyright infringement”

Library Project in 2006 August 11 – University of California signs with Google to scan “several million” of the UC system’s 34 million titles August 11 – University of California signs with Google to scan “several million” of the UC system’s 34 million titles Google – For works in copyright, only the equivalent of an electronic “library catalog record” has been created. No infringement has occurred. Google – For works in copyright, only the equivalent of an electronic “library catalog record” has been created. No infringement has occurred.

Google and content for $$$ Video Store and Google Base

Foreign Language Content

Language Features Advanced Search – by language Advanced Search – by language G 35Y 37 (same plus Persian, Thai) Advanced Search – by location (country) Advanced Search – by location (country) G 83Y 27

Language Features G “Translate this page” for Spanish, German, French, Italian, Portuguese and, in beta, Japanese, Korean and Chinese (Simplified); service in Language Tools G “Translate this page” for Spanish, German, French, Italian, Portuguese and, in beta, Japanese, Korean and Chinese (Simplified); service in Language Tools Y “Translate this page” for same languages PLUS Greek, Dutch and Russian; service at Y “Translate this page” for same languages PLUS Greek, Dutch and Russian; service at

Content Comparisons Searching by language Imre Kertesz (Hungarian) Imre Kertesz (Hungarian) Y 751G 575 Jose Saramago (Portuguese) Jose Saramago (Portuguese) Y 172,000G 292,000 Moussaoui (Arabic) Moussaoui (Arabic) Y 96G 216

Google by language or google.??

Links to country-specific services at bottom of Language Tools Links to country-specific services at bottom of Language Tools Imre Kertesz Imre Kertesz by Hungarian language 575 on google.co.hu449,000 Jose Saramago Jose Saramago by Portuguese language 292,000 on google.pt 1,470,000

Search and Politics: National and International

Child Online Protection Act of 1998 Justice Dept: Parental controls and filters insufficient to protect children against online pornography. Stricter governmental controls needed Justice Dept: Parental controls and filters insufficient to protect children against online pornography. Stricter governmental controls needed Aug., 2005 – G, Y, Microsoft and AOL issued subpoenas for all data relating to search terms and the sites users visited between June 1 and July 31, 2005 Aug., 2005 – G, Y, Microsoft and AOL issued subpoenas for all data relating to search terms and the sites users visited between June 1 and July 31, 2005

Child Online Protection Act of 1998 Y, MSN and AOL “have provided some of the information requested and taken steps to guard users’ privacy” G refused Y, MSN and AOL “have provided some of the information requested and taken steps to guard users’ privacy” G refused To date no request for IP address or other data linking search behavior to individual users To date no request for IP address or other data linking search behavior to individual users

Implications For Users – Invasion of privacy/search behavior, online identity, 1 st Amendment For Users – Invasion of privacy/search behavior, online identity, 1 st Amendment For Search Engine Industry – For Search Engine Industry – R&D focused on offering search results customized to an individual R&D focused on offering search results customized to an individual Requires tracking individual’s search behavior Requires tracking individual’s search behavior Can privacy be guaranteed Can privacy be guaranteed Hearing US Dist. Court March 13 Hearing US Dist. Court March 13

Search, Censorship and China Chinese Government blocks access to politically sensitive and/or offensive sites Chinese Government blocks access to politically sensitive and/or offensive sites Jan Access to G cut off or degraded by Government Jan Access to G cut off or degraded by Government Search terms blocked included Taiwan’s independence, Tiananmen Square, democracy, human rights in China Search terms blocked included Taiwan’s independence, Tiananmen Square, democracy, human rights in China

Search, Censorship and China When a service maintains an office or other facility in a country, it is bound to the laws of that country. When a service maintains an office or other facility in a country, it is bound to the laws of that country. G removed content from Google.cn to comply with demands G removed content from Google.cn to comply with demands Y has complied with government demands for several years Y has complied with government demands for several years MSN removed a blog critical of the Chinese government MSN removed a blog critical of the Chinese government

Search, Censorship and China House Subcommittee on Africa, Global Human Rights and International Operations House Subcommittee on Africa, Global Human Rights and International Operations G, Y, MSN, AOL at hearing on Feb. 16 G, Y, MSN, AOL at hearing on Feb. 16 The point?? Leaders of the search industry should voluntarily set best practices for dealing with repressive regimes The point?? Leaders of the search industry should voluntarily set best practices for dealing with repressive regimes If not, Congress may do it If not, Congress may do it

Yahoogle! Yahoo! strengths Yahoo! strengths MediaVertical (specialized) engines Popular CultureOnline communities Local SearchPortal format Google strengths Google strengths Overall rankingBlog search International newsClean interface Somewhat larger database (???)

Thank You and Good Luck! Michael Hunter Reference Librarian Hobart and William Smith Colleges Geneva, NY (315)