Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear.

Similar presentations


Presentation on theme: "Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear."— Presentation transcript:

1 Unit 3 Web Search Engines

2 Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear weaponsRec _______ n Combine Iran with the phrase nuclear weaponsRec _______ nuclear weaponsRec _______ n Use Advanced Search: n Combine Iran with the phrase nuclear weapons so that all the words appear in title of so that all the words appear in title of documentsRecords ___ documentsRecords ___

3 Unit 3 Web Search Engines n How People Search on the Web n What Are Search Engines? n How Search Engines Work n What’s in Search Engines? n How to Find Search Engines n Search Basics

4 Three Ways People Search n Surf u No direction, clear idea, issue u Consult people, news, magazines, Web for ideas n Browse u Have some idea, but vague, flexible, unclear u Consult reference sources, Web directories for direction, topic, theme n Search in-depth u Have defined topic, narrow focus u Consult databases, search and metasearch engines

5 What Are Search Engines n Software u Captures web sites, pages u Indexes full-text of web page u Provides interface to search web pages n Database u Large, billions of pages (unlike directories) u Computer built (robots, spiders) u No selectivity, no evaluation

6 How Search Engines Work n Spider comb, “capture” web pages n Software builds database n Words from web pages “indexed” n Search interface finds words on pages n Engine ranks, describes results n How engines and directories differ

7 Spiders Comb, Capture Web Pages n Software decides which web pages to collect n Spiders check for updated pages n Spiders remove dead sites

8 Spider Software Builds Database n Current web size: over 15 billion pages n No engine’s database covers it all u Google covers 22% (3.3 billion+) u AlltheWeb covers 21% (3.2 million+) u HotBot (Inktomi) covers 20% (3 billion+) u Teoma covers 10% (1.5 billion)

9 Words from Web Pages “Indexed” n “Index” means creating lists of words for database and linking words to web pages n Some index full text in document n Some index part of text u First 100 words in document u Words in abstract, or title of document n How indexing works affects search results

10 Search Interface Finds Web Pages n Provides keyword search box n Offers simple or advanced searching n Offers search options to affect results: u Most assume AND between words: Russian mafia u Most accept “quotes” to search a PHRASE: “Russian mafia” u Most allow FIELD searches : ti:Russian mafia n AlltheWeb AlltheWeb

11 Engine Ranks, Describes Results n Software lists most “relevant” items first u Word popularity: word repetitions, location u Site popularity – visitations of web site u Link popularity – how often link cited n Results described u Few words to a paragraph u Sometimes stars, other indicators of relevancy

12 How Engines and Directories Differ n Computers vs people u Engine spiders not editors select documents n Quantity vs quality u Engines big: want all, accept anything u Directories small: want “best” “important” n Technology vs human judgment u Engine software ranks, no human evaluation

13 Top Search Engines n Google3.3 billion+ n AlltheWeb3.2 billion+ n HotBot (Inktomi)3 billion+ HotBot n Teoma1.5 billion+

14 Directories, Search Engines and Defaults n If directories find little, they default to engines u Yahoo defaults to Google u Open Directory defaults to Google u Looksmart defaults to Inktomi (Hotbot) n Some search engines borrow directories u Google uses “Open Directory” Google n Learn the source of information when using a directory search box or search engine’s directory

15 Metasearch Engines Technologies that search several search engines at the same time

16 Pros n Increase results when search engine produce little n Save time by searching several engines at once n Show results of several engines on one page

17 Cons n Retrieve too many hits n Retrieve less relevant results u Do not individualize search syntax for each engine F Do not know whether to use and, AND, +, OR, or, cannot interpret phrases, etc. n Exclude certain large engines like Google

18 Top Metasearch Engines n Vivisimo u Categorizes results, narrows topics n Ez2find u Includes most major engines n Dogpile Dogpile u Refines results, covers major engines

19 What’s In Search Engines? n Business, commercial information n Organizational publications n Government resources n Magazine, newspaper excerpts n Some scholarly information u Teaching materials, unpublished articles n Books, articles whose copyright expired

20 What’s Not in Search Engines n Books under copyright u Most Fiction, non-fiction in existence n Journal, magazine, newspaper articles u Most current and past research n Reference books u Most recent, quality publications n In short u Bulk of human knowledge and research

21 How to Find Search Engines n Word of mouth, hearsay n Newspaper, magazine articles n Library web pages u Guides to search engines F HyperResearch HyperResearch

22 Search Basics n Identify, select keywords u Effects of internet use on children F Internet, children, effects n Combine keywords to focus results u Use OR, AND u Use phrase searching u Limit search to field like title or URL

23 Or Broadens n Retrieves an article if it contains either keyword n Use to connect similar words n Use to increase results

24 OR Expands Results n Internet15 n Internet OR Web 50 n Internet or Web or digital 90

25 AND Narrows n Use to connect two different ideas n AND between keywords means both terms must be in record n Use to decrease results

26 AND Reduces Results n Children2,956, 000 n Children AND Internet 1,756 n Children AND Internet AND Homework u Children internet homework 26

27 AND, OR and Search Engine Syntax n Use help or tips u AND, and, OR, or, “+” “-” ? u Does engine default to AND or OR? u Do AND or OR have to be upper case? u Use ADVANCED SEARCH to learn options n Is there a pull-down menu box? u AND can mean “All the words” u OR can mean “Any of the words”

28 Phrase Searching n Two words in consecutive order u Juvenile delinquency Russian mafia n How does computer recognize “phrase”? u Pull-down menu: EXACT PHRASE u Quotation marks: “drug abuse” n Phrase searches reduce hits, improve relevancy u Russian mafia 23,234 u “Russian mafia” 789

29 Field Searching n Common document “fields” u Author, Title, Subject, Abstract, Text, URL n Limits search to words in particular fields u Learn syntax:title: ti: url: F Ti: Russian mafiaurl: russianmafia u Use ADVANCED SEARCH u Use pull-down menu (in the title, in the URL) n Reduces hits, improves relevancy F Russian mafia (all the words) = 23,234 F Russian mafia (in title) = 254

30 Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records 11,00,000 n Combine Iran with nuclear weaponsRec 790,000 n Combine Iran with the phrase nuclear weaponsRec 428,000 nuclear weaponsRec 428,000 n Use Advanced Search: n Combine Iran with the phrase nuclear weapons so that all the words appear in title of so that all the words appear in title of documentsRecords 274 documentsRecords 274

31 Homework n Use major search engines u Alltheweb, Google, Teoma n Use a metasearch engine- Vivisimo n Practice using AND, OR, phrase, field searching


Download ppt "Unit 3 Web Search Engines. Can You Find the Answers? n Connect to Google Google n Search for items on Iran Records ________ n Combine Iran with nuclear."

Similar presentations


Ads by Google