Presentation is loading. Please wait.

Presentation is loading. Please wait.

When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube

Similar presentations


Presentation on theme: "When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube"— Presentation transcript:

1 When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube

2 What is the Visible (Surface) Web? “It’s made up of HTML Web pages that the search engines have chosen to include in their indices. It’s no more complicated than that.” Sherman and Price.

3 What is the Visible (Surface) Web? A collection of webpages Searchable with “search engines” What you and I think of as the “Internet” is actually only a small portion of the Internet

4 What is the Visible (Surface) Web? High volume Mass appeal High value Small percentage of web content –Exception: Google books and Google Scholar

5 What is the Invisible Web? What search engines do not search Searchable Databases –Tens of Thousands –Accessible and searchable via the Internet –Results often dynamically generated in specific response to your request (eBay, MapQuest, etc.)

6 What is the Invisible Web? Excluded Pages –Excluded per search engine –Excluded per webpage by the owner of the site Typically databases –Businesses –Governments –Schools –Libraries –Associations

7 What is the Invisible Web? Academic Never been indexed or linked Uniquely generated pages Proprietary Confidential Protected by username & password Constitutes the majority of the webpages on the Internet

8 The Invisible Web is about 550 times larger than the visible web and is growing much faster The deep Web consists of about 91,000 terabytes.terabytes The surface Web is only about 167 terabytes1 The Library of Congress contains about 11 terabytes.Library of Congress Quality content is 1,000 to 2,000 times greater than surface web 95% of the Deep Web is accessible to public (no fees or subscription required) based on extrapolations from a study done at University of California, BerkeleyextrapolationsUniversity of California, Berkeley Visible vs. Invisible Web

9 Opaque Web Private Web Proprietary Web Pay per click What is on the Invisible Web

10 Requires payment Requires registration Dynamically generated Very new Website specifically stops spiders Why can’t Google find it?

11 Fixed, or Could be indexed, but is not Deemed not important enough Too new and therefore not linked Never makes max results cutoff No one ever linked or submitted URL Opaque Web

12 Private Web Deliberately excluded –Password –Special coding in website stops spiders Only for select individuals –Employees –Students –Researchers

13 Proprietary Web Protected –Password –Registration (N.Y. Times, eBay, banks, etc.) –Terms of Use Anyone can access if you –Pay –Register –Agree to terms

14 Pay per click Search Engine Marketing tools Ex: overture.com, FindWhat.comoverture.comFindWhat.com

15 When do I use …. Portal or Directory? Search Engine? Invisible Web?

16 Portal or Directory You have a general topic You know little about the subject You do not know keywords You want someone or something to have sorted out the junk You need an exploratory overview

17 Search Engine You are looking for something specific You have keywords You are pretty sure the information is –advertised or –otherwise generally disseminated

18 Tips for search engines Use a toolbar Determine the key words/phrases most likely to be in your document and nowhere else Learn and use Boolean Operators Scan results Question the results

19 Invisible Web You are pretty sure the information is in a specific database Need something authoritative Speed The information is dynamically generated You are familiar with the database –Search techniques –Protocols –Access requirements

20 Searching the Invisible Web Directories – subject guide compiled by human editors Specialized Search Engines –http://library.albany.edu/internet/choose.htmlhttp://library.albany.edu/internet/choose.html Special Databases ( Library of Congress, Library of Congress LookSmart’s Find Articles (over 900 publications National Science Digital Library Singing Fish – audio and video

21 Special Databases Library of Congress –http://catalog.loc.govhttp://catalog.loc.gov LookSmart’s Find Articles (over 900 publications) –http://www.findarticles.comhttp://www.findarticles.com National Science Digital Library –http://www.nsdl.orghttp://www.nsdl.org Singing Fish – audio and video –http://www.singingfish.comhttp://www.singingfish.com

22 Types of Databases Information stored in tables (Access, Oracle, SQL Server, DB2) and accessible only by query. Examples: Phone books, People finders, Patents, laws Items for sale in a Web store or Web-based auctions Digital exhibits Multimedia and graphical files Stock and bond prices

23 Types of Hidden Info Pages in searchable databases: medical (WebMD.com), patent, scientific, legal (Lexis and Westlaw), reference Pages requiring login or registration: Social Sites, New York Times, web based applications, calendars, Google Docs, etc. Government publications or databases: ERIC, usa.gov Online databases: Gale Research PDF files, audio, video, any new format

24 More hidden stuff Dictionaries and thesauri Sites that require forms to be filled out (ex: travel direction, job hunting) Product catalogs and library catalogs Newspaper and magazine archives Dynamic web pages (ex: airline flight checkers, mapquest) Interactive tools (ex: calculators & measurement converters)

25 Access to invisible web is improving … Google Books Google Scholar

26 Maybe Consider … Specialized Databases such as Dialog, Nexis Lexis, Factiva, etc. (not cheap) Use an Information Professional

27 To Conclude … Focus and continue doing what you do best and what you have been trained for and let an Information Professional find the info you need. He is trained to do it faster, more effectively and efficiently than you or one of your employees. (www.aiip.org)www.aiip.org


Download ppt "When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube"

Similar presentations


Ads by Google