Presentation is loading. Please wait.

Presentation is loading. Please wait.

Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web.

Similar presentations


Presentation on theme: "Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web."— Presentation transcript:

1 Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web

2 How People Search on the Web n Input URLs, surf links n Subject directories n Search engines n Metasearch engines

3 Web Directories n Small, selective databases n Created by humans not machines n Editors select and place sites into categories for easy retrieval n User browses categories and links to sites

4 Why Use Directories? n Identify quality, major sites n Get overview, general information on topic n Serendipity in discovery as result of manipulating a smaller, more focused file

5 High Quality Directories n Librarian’s Index to the Internet n Informine n Academic Info n WWW Virtual Library

6 Top Directories – Less Selective n Yahoo!: 1,800,000+ n Open Directory: 2,600,000+ n LookSmart: 2,500,000+ HyperResearch Guide HyperResearch Guide

7 How Directories Work n Browse subject categories u Funnel: category to topic, web site to page u HealthYahoo! LookSmart Yahoo! F Fitness Open Directory YogaYoga Most popular sites Most popular sites yogaclass.com yogaclass.com http://www.yogaclass.com/ http://www.yogaclass.com/http://www.yogaclass.com/

8 Directory Search Boxes n When to use Yahoo Search u Subject categories don’t match topic u Want broad search of Web n Why results are different u Directory searches only Yahoo’s selected sites u Search box combines Yahoo directory sites and full Web search results (from engine)

9 Top Web Directories n Yahoo!: 1,800,000+ n Open Directory: 2,600,000+ n LookSmart: 2,500,000+ HyperResearch Guide HyperResearch Guide

10 Web Search Engines

11 What Are Search Engines n Software u Captures web sites, pages u Indexes full-text of web page u Provides interface to search web pages n Database u Large, billions of pages (unlike directories) u Computer built (robots, spiders) u No selectivity, no evaluation

12 Why Use Search Engines? n Have already identified major sites from directory n Could find very little in directory n Want everything, comprehensive information on a topic Note: need to judge quality of sites since engines are NOT selective Note: need to judge quality of sites since engines are NOT selective

13 How Search Engines Work n Spider comb, “capture” web pages n Software builds database n Words from web pages “indexed” n Search interface finds words on pages n Engine ranks, describes results n How engines and directories differ

14 Spiders Comb, Capture Web Pages n Software decides which web pages to collect n Spiders check for updated pages n Spiders remove dead sites

15 Spider Software Builds Database n Current web size: over 15 billion pages n No engine’s database covers it all u Google covers 29% (4.3 billion+) u AlltheWeb covers 21% (3.2 million+) u HotBot covers 20% (3 billion+) u Teoma covers 10% (1.5 billion)

16 Words from Web Pages “Indexed” n “Index” is list of words in database linked to words in Web pages n Some engines index full text in document n Some index part of text u First 100 words in document u Words in abstract, or title of document n How an engine indexes affects search results

17 Search Interface Finds Web Pages n Provides keyword search box n Offers simple or advanced searching n Offers search options to affect results: u Most assume AND between words: Russian mafia u Most accept “quotes” to search a PHRASE: “Russian mafia” u Most allow FIELD searches : ti:Russian mafia n AlltheWeb AlltheWeb

18 Engine Ranks, Describes Results n Software lists most “relevant” items first u Word popularity: word repetitions, location u Site popularity – visitations of web site u Link popularity – how often link cited n Results described u Few words to a paragraph u Sometimes stars, other indicators of relevancy

19 How Engines and Directories Differ n Computers vs people u Engine spiders not editors select documents n Quantity vs quality u Engines big: want all, accept anything u Directories small: want “best” “important” n Technology vs human judgment u Engine software ranks, no human evaluation

20 Top Search Engines n Google4.2 billion+ n AlltheWeb3.2 billion+ n HotBot (Inktomi)3 billion+ HotBot n Teoma1.5 billion+ HyperResearch Guide HyperResearch Guide HyperResearch Guide HyperResearch Guide

21 Metasearch Engines

22 Technologies that search several search engines at the same time

23 Pros n Increase results when search engine produce little n Save time by searching several engines at once n Show results of several engines on one page

24 Cons n Retrieve too many hits n Retrieve less relevant results u Do not individualize search syntax all engines they search F Do not know whether to use and or AND, +, or “or” OR, cannot interpret phrase, title search etc. n Exclude certain large engines like Google

25 Top Metasearch Engines Top Metasearch Engines n Dogpile Dogpile u Refines results, covers major engines n Vivisimo u Categorizes results, narrows topics n Ez2find u Includes most major engines

26 A Few Words About the Web and Search Engines

27 What’s In Search Engines? n Business, commercial information n Organizational publications n Government resources n Some magazine, newspaper articles n Some scholarly information u Teaching materials, unpublished articles n Books, articles whose copyright expired

28 What’s Not in Search Engines n Books under copyright u Most Fiction, non-fiction in existence n Journal, magazine, newspaper articles u Most current and past research n Reference materials u Recent, quality, expensive encyclopedias, handbooks, business advisory services, etc. n In short u Bulk of human knowledge and research

29 Search Tips n Check “advanced” search and options n Learn about AND, OR, ANY, ALL, PHRASE n Know how to search in titles, URLs n Spell it right n Switch engines, get different results n Keep up to date about search engines u Newspapers and magazines u Library web sites

30 Evaluating Web Sites n Accuracy u Is information reliable? u What does URL tell you (.com,.org,.gov,.edu)? n Authority u Author’s credentials? Address, email given? n Content and Currency u Purpose of site: inform, sell, propagandize? Date? n Documentation u Are sources given, footnotes?

31 Find and Evaluate n Use Google and find Website titled: The Burmese Mountain Dog The Burmese Mountain Dog n Evaluate this site for u Accuracy u Authority u Content and Currency u Documentation n Is it a trustworthy Web site?


Download ppt "Best Web Directories and Search Engines Order Out of Chaos on the World Wide Web."

Similar presentations


Ads by Google