Presentation on theme: "Sometimes Google Isn’t Enough Finding Information on the Invisible Web Shirley McDonald Hilda Donaldson"— Presentation transcript:
Sometimes Google Isn’t Enough Finding Information on the Invisible Web Shirley McDonald Hilda Donaldson
First: a definition of the Visible (Surface) Web “It’s made up of HTML Web pages that the search engines have chosen to include in their indices. It’s no more complicated than that.” Sherman and Price.
Static Web pages Fixed, or static, pages do not change and can be linked to other pages. Ex:
Dynamic Web Pages Dynamic - generated only by a specific query; does not exist after that query.
The Invisible, Deep, or Hidden Web Web sites or information that Google or other popular search engines are not capable of indexing Websites specifically excluded by the search engine
Invisible (Deep or Hidden) Web Public info is 400 – 550 times larger 550 billion individual documents vs one billion on surface web Quality content is 1,000 to 2,000 times greater than surface web 95% of Deep Web is accessible to public (no fees or subscription required) (Bergman)
Opaque Web – material that can be, but is not included in search engine results. Ex: new material added and not yet picked up. Private Web – sites intentionally excluded from search engine results. Ex: password protected Proprietary Web – sites that require user registration. Ex: eBay, New York Times Pay per click – Ex: overture.com, FindWhat.comoverture.com FindWhat.com Hidden Web sites
Content of Databases Information stored in tables (Access, Oracle, SQL Server, DB2) and accessible only by query. Examples: Phone books, People finders Patents, laws Items for sale in a Web store or Web-based auctions Digital exhibits Multimedia and graphical files Stock and bond prices
Examples of Hidden Sites Pages in searchable databases: medical (WebMD.com), patent, scientific, legal (Lexis and Westlaw), reference Pages requiring login or registration: Blackboard, New York Times Government publications or databases: ERIC Online databases: Gale Research PDF files, audio, video, any new format
More Examples Dictionaries and thesauri Sites that require forms to be filled out (ex: travel direction, job hunting) Product catalogs and library catalogs Newspaper and magazine archives Dynamic web pages (ex: airline flight checkers, mapquest) Interactive tools (ex: calculators)
How are pages excluded from search engines? Google’s PageRank TM puts pages at the top of the hit list by the number of times they are linked to other pages (popular) Webmasters that have figured out how to manipulate PageRank’s TM behavior are able to move their pages to the top of the hit list
Faulty typing and/or judgment Search engine spiders and crawlers cannot see the site unless it contains a link to another site Search engines can primarily see text pages in HTML form This will change in the future as search engines become more capable of retrieving the “hidden” web
Use of blocking techniques by the webmaster or server Password protection HTML blocking in the web page A listing on the server of blocked pages
Searching the Invisible Web Use the following to get around, just like the visible web: Directories – subject guide compiled by human editors Search Engines Specialized Databases
Directories to search the Invisible Web Big Hub Complete Planet: The Deep Web Directory 70,000 searchable databases and specialty search engines Digital Librarian: A Librarian’s Choice of the Best of the Web
More directories IncyWincy: The Invisible Web Search Engine Offers Web Search, Directory Search, Metasearch, News Note: Kids & Teens, Reference Invisible Web Directory
Infomine: Scholarly Internet Resource Invisible Web Directory Librarian’s Index to the Internet Open Directory Project (dmoz) (want to edit?) ProFusion: The Original Meta-Search Engine
Search Engines for the Invisible Web AlltheWeb: find it all Bright Planet Direct Search: SearchCenter (59 pages!) Can get updates through s - Resourceshelf IxQuick: the world’s most powerful metasearch engine
More Search Engines Search-22 Search Adobe PDF Online Turbo10 Vivisimo/Vivisimo Clustering
Specialized Databases Library of Congress LookSmart’s Find Articles (over 900 publications National Science Digital Library Singing Fish – audio and video
Choosing the Best Search NoodleTools mation/5locate/adviceengine.html mation/5locate/adviceengine.html Great chart that connects the information need to the search strategy How to Choose a Search Engine or Directory
Access to the Hidden Web is Constantly Improving “Google Scholar Offers Access to Academic Information.” written by Danny Sullivan, November 18, php/ php/ Google makes arrangement with publishers to get into password protected sites – sometimes shows only abstract Includes libraries of Oxford, Stanford, Michigan, Harvard, NY Public
Issues “Let a Thousand Googles Bloom.” – by Lawrence Lessig Questions the legality and copyright issues “Does Google move augur commercialization of libraries?” – Detroit Free Press htm htm
Alternative to Google Scholar “Internet Archive to Build Alternative to Google.” – by Mark Chillingworth “Ten major international libraries have agreed to combine their digitized book collections in a free text-based archive hosted online by the not-for- profit Internet Archive.” Open Access
Bibliography Bergman, Michael K. “The Deep Web: Surfacing Hidden Value.” (8 November 2004). Cadwallader, Joy. “Searching the Invisible Web.” (4 November 2004). Chillingworth, Mark. “Internet archive to build alternative to Google.” Information World. (30 December 2004).http://www.iwr.co.uk/IWR/ Cohen, Laura. “How to Choose a Search Engine or Directory.” (4 November 2004). “Does Google move augur commericalization of libraries?” (15 December 2004). (15 Grimes, Brad. “Expand your Web search horizons: six tips for finding the info you want by searching hidden corners of the Web.” PC World. June, “Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity.” (4 November 2004). Lessig, Lawrence. “Let a Thousand Googles Bloom.” lesig12Jan12,1, story?ctrack=1 (13 January 2005). lesig12Jan12,1, story?ctrack=1 McLaughlin, Laurianne. “Beyond Google: the web is so full of useful info that no search engine can find it all. But a multitude of specialty sites deliver shopping advice, reference databases, leisure-time ideas, and more – fast.” PC World. April, 2004.
Bibliography Niederlander, Mary. “More on Searching: The Hidden Web or Invisible Web Resources.” (4 November 2004).http://www.librarysupportstaff.com/hiddenweb.html O’Leary, Mick. “Invisible Web Discovers Hidden Treasures.” Information Today. January, “Search Engines 101 – Search Engines Explained.” (4 November 2004). “Searching the Hidden Web.” (4 November 2004). Sherman, Chris and Gary Price. “The invisible web: uncovering sources search engines can’t see.” Library Trends Fall, Smith, C. Brian. “Invisible Web: Explore hidden troves of information.” (4 November 2004). Sullivan, Danny. “Google Scholar Offers Access to Academic Information.” (1 Dec. 2004). Vine, Rita. “Going beyond Google for faster and smarter web searching.” Teacher Librarian. October, 2004.