Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.

Similar presentations


Presentation on theme: "Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web."— Presentation transcript:

1 Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web

2 Different Ways to Search on the Web n Input URLs, surf links n Subject directories n Search engines n Metasearch engines

3 Web Directories n Small, selective databases n Created by humans not machines n Editors select and place sites into categories for easy retrieval n User browses categories and links to sites

4 How Directories Work n Browse subject categories u Funnel: from categories to web sites u HealthYahoo! LookSmart Yahoo! F Fitness Open Directory YogaYoga Most popular sites Most popular sites yogabasics yogabasics http://www.yogabasics.com/ http://www.yogabasics.com/http://www.yogabasics.com/

5 Directory Search Boxes n Use when u subject categories don’t match your topic: winemaking in Slovenia n Be aware a directory u first searches its own select database u may automatically default to a search engine if it finds little in its own database F LookSmart defaults to WiseNut LookSmart F Open Directory defaults to Google

6 Why Use Directories? n Identify major and quality sites n Get overview, general information on topic n Enjoy serendipity in discovery as you manipulate a small, focused file

7 Small, Selective Directories n Librarian’s Index to the Internet u 14,000+ n Informine u 120,000+ n Academic Info u 25,000+ n WWW Virtual Library

8 Large, Less Selective Directories n Yahoo!: 3,000,000+ n Open Directory: 3,800,000+ n LookSmart: 2,500,000+ HyperResearch Guide HyperResearch Guide

9 Web Search Engines

10 What Are Search Engines? n Spiders, crawlers u Visits, reads web page, follows its links n Index, catalog u Giant book containing web page spider finds u Spiders update pages and add to index n Search interface u Sifts through index to find matches to words in searchbox u Ranks pages for relevancy

11 Search Engines are Same and Different n Search engines are the same u Consist of crawlers, index, interface n Search engines are different u Low overlap of database contents F 50% of pages in one not found in another u Pages for inclusion found internally, by following links, or by submission u Remember: Google doesn’t know it all!

12 How Search Engines Work n Spider “capture” web pages n Web pages build index, database n Interface finds words in database n Engine ranks, describes results n How engines and directories differ

13 Spiders Capture Web Pages n Spider “reads” database text into index u Google ~ 100K, Yahoo ~ 500K n Spider follows page links, reads new pages into index n Spider returns to sites (every month or two) and look for changes u Dead sites removed, current sites updated u New sites added (through new links found or submitted by others)

14 Web Pages Build Database n Current web size: over 15 billion pages n Each database has different pages n No engine’s database covers it all u Google ~ 29% (4.3 billion+) u Yahoo! ~ 20% (3 million+) u HotBot ~ 20% (3 billion+) u Teoma ~ 13% (2 billion) n As pages are updated, nature of database changes

15 Interface Finds Words in Database n Provides keyword search box n Offers search options to affect results u Assumes AND between words: Iraq WMD u Uses “quotes” for PHRASE searches: “Cold War” u Allows FIELD searching : ti:Russian mafia url:russianmafia url:russianmafia n Offers Simple and Advanced Search n Teoma Teoma

16 Engine Ranks, Describes Results n How “relevance” is determined u Location and frequency of search words F Title tag, near top of page of indexed text u Site popularity ~ how many “clicks” a site get u Link popularity ~ how often others link to site u Subject popularity ~ link popularity within subject communities (Teoma) n Results described, keywords highlighted

17 How Engines and Directories Differ n Computers vs people u Spiders select documents u Editors select documents n Quantity vs quality u Engines large, non-judgmental u Directories small, want “best” “most important” n Technology vs human factor u Software ranks items u Editors organize pages into subject, categories

18 Why Use Search Engines? n You need specific rather than general information u What role did the Romans play in developing a wine culture in Slovenia? n You need a large, comprehensive database in contrast to a directory Note: remember, search engine crawlers grab anything and everything; selectivity depends on you and the engine’s ranking system Note: remember, search engine crawlers grab anything and everything; selectivity depends on you and the engine’s ranking system

19 Top Search Engines n Google4.3 billion+ n Yahoo(Inktomi)3 billion+ n HotBot (Inktomi)3 billion+ HotBot n Teoma2 billion+ HyperResearch Guide HyperResearch Guide HyperResearch Guide HyperResearch Guide

20 Metasearch Engines

21 Technologies that search several search engines at the same time

22 Pros n Increase results when one search engine produces little n Save time by searching several engines at once n Show results of several engines on one page

23 Cons n Retrieve too many hits n Retrieve less relevant results u Cannot read individual search syntax well F Cannot tell if syntax requires terms in upper. lower case (OR~or~and~AND? F Cannot tell if title, URL searching allowed, etc. n Contain mix of major search engines, not all

24 Top Metasearch Engines Top Metasearch Engines n Vivisimo Vivisimo u Clusters results into subject folders n Dogpile Dogpile u Refines results, covers major engines n Ez2find u Includes most major engines

25 A Few Words About the Web and Search Engines

26 What’s In Search Engines? n Business, commercial information n Organizational publications n Government resources n Some magazine, newspaper articles n Some scholarly information u Teaching materials, unpublished articles n Books, articles whose copyright expired

27 What’s Not in Search Engines n Most books and periodical articles u Current, past research, fiction, non-fiction n Reference materials u Best current encyclopedias, handbooks, business advisory services, etc. Bulk of human knowledge and research Bulk of human knowledge and research n Where can some of this information be found? u In libraries in print or via subscription databases available in libraries, institutions, organizations

28 Widening Google and Yahoo’s Eyes for Scholarship n OAIster OAIster u University of Michigan and Yahoo project u 3,000,000 scholarly documents u 277 institutions involved n Open WorldCat u OCLC and Google project u 2,000,000 books in Google index Google u Open access to 54 million books is goal

29 Search Tips n Check “advanced” search and options n Learn about AND, OR, ANY, ALL, PHRASE n Know how to search in titles, URLs n Spell it right n Switch engines, get different results n Keep up to date about search engines u Newspapers and magazines u Library web sites

30 Learn to Evaluate Web Sites n Accuracy u Is information reliable? Where is it from? u What does URL tell you? (com,.org,.gov,.edu)? n Authority u Author’s credentials? Address, email given? n Content and Currency u Purpose of site: inform, sell, propagandize? Date? n Documentation u Are sources given, footnotes? u Are other links given?

31 Find and Evaluate n Use Google and find Website titled: The Burmese Mountain Dog The Burmese Mountain Dog n Evaluate this site for u Accuracy u Authority u Content and Currency u Documentation n Is it a trustworthy Web site?


Download ppt "Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web."

Similar presentations


Ads by Google