Presentation is loading. Please wait.

Presentation is loading. Please wait.

How Search Engines Work General Search Strategies Dr. Dania Bilal IS 587 SIS Fall 2007.

Similar presentations


Presentation on theme: "How Search Engines Work General Search Strategies Dr. Dania Bilal IS 587 SIS Fall 2007."— Presentation transcript:

1 How Search Engines Work General Search Strategies Dr. Dania Bilal IS 587 SIS Fall 2007

2 Fun Quiz Take the search engine quiz located at http://websearch.about.com/library/quizzes /search_engine_quiz/blsearchenginequiz.h tm http://websearch.about.com/library/quizzes /search_engine_quiz/blsearchenginequiz.h tm Record the no. of incorrect answers Share the results of the quiz with a classmate.

3 How Search Engines Work? They collect information from selected web sites The employ special software robots, called spiders, to crawl web pages Spiders build lists of the words found in Web sites. When a spider is building its lists, the spider is Web crawling. When a spider is building its lists, the spider is Web crawling. Spiders store the lists in the engine’s database The engine’s indexing software builds an index of words Information is matched against query input and retrieved (processing algorithm)

4 How Spiders and Crawlers Work? They begin with popular and heavily used web servers. They begin with a popular site, collect the words on its pages and follow every link found within the site. Spiders travel across pages and the most widely used portions of the Web Spiders travel across pages and the most widely used portions of the Web

5 How Spiders and Crawlers Work? A dedicated server of URLs is built by a search engine company (e.g., Google) so that spiders collect information quickly More than one spider is used to craw web pages at a time Google uses 3-4 spiders and collect over 100 pages per second Google uses 3-4 spiders and collect over 100 pages per second

6 How Spiders and Crawlers Work? When no dedicated URL server is used, search engine company relies on ISP for the domain names (translated into addresses) to use for crawling the web Delay in gathering information Delay in gathering information Delay in updating information Delay in updating information Lack of control over URL addresses Lack of control over URL addresses

7 Google Spider and How it Works A spider looks at the html or xml or other coding used to build a web page and collects information from the meta-tags It indexes words within the actual text of a page It indicates where the words were found (URL, title, headings, etc.) It disregards initial articles It disregards pages that should not be crawled or indexed

8 Google Spider and How it Works It uses Robot-Exclusion Protocol in disregarding pages Implemented in the meta-tag section at the beginning of a Web page Implemented in the meta-tag section at the beginning of a Web page Tells a spider to leave the page alone, neither index the words on the page nor try to follow its links Tells a spider to leave the page alone, neither index the words on the page nor try to follow its links Franklin, C. How Internet Search Engines Work. http://computer.howstuffworks.com/search- engine.htm http://computer.howstuffworks.com/search- engine.htm http://computer.howstuffworks.com/search- engine.htm

9 How Search Engines Store Words Indexed? The process varies among engines Words are stored with no. of times they appear on a pages (posting) Weight is assigned to each word. Words appearing near top of a page may have more weight than those appearing in subheadings, in links, in meta tags, in title, etc.

10 How Search Engines Store Words Indexed? Information is encoded to save space Information is indexed An index of words is built by the automatic indexer (indexing software) An index of words is built by the automatic indexer (indexing software) A hash table is created with an assigned weight or value for each word indexed A hash table is created with an assigned weight or value for each word indexed Hashing allows for even the distribution of popular entries (e.g., letter M) with those that are less popular (e.g., letter X) for quick retrieval Hashing allows for even the distribution of popular entries (e.g., letter M) with those that are less popular (e.g., letter X) for quick retrieval

11 Using General Directories Yahoo and its family Browsing directory Directory database Directory database Small and human-selected and indexed Small and human-selected and indexed Searching using keywords Search database Search database Larger and non-selective database Larger and non-selective database Spider and machine indexing Spider and machine indexing

12 Yahoo Yahoo.com Works like a search engine rather than a directory Works like a search engine rather than a directory Searches the web Searches the web Exercise: search under my name and see how Yahoo processes query while you’re inputting information Exercise: search under my name and see how Yahoo processes query while you’re inputting information Directory found under more or at http://search.yahoo.com/dir http://search.yahoo.com/dirhttp://search.yahoo.com/dir

13 Yahoo Search Engine Search Web Web Images Images Videos Videos Local information Local information Shopping Shopping More… More…

14 Yahoo Advanced Search Advanced Search feature Shown on screen after you perform a search, or by going directly to Shown on screen after you perform a search, or by going directly to http://search.yahoo.com/web/advanced?ei=U TF-8&p=dr+dania+bilal&fr=yfp-t-471 http://search.yahoo.com/web/advanced?ei=U TF-8&p=dr+dania+bilal&fr=yfp-t-471 http://search.yahoo.com/web/advanced?ei=U TF-8&p=dr+dania+bilal&fr=yfp-t-471 http://search.yahoo.com/web/advanced?ei=U TF-8&p=dr+dania+bilal&fr=yfp-t-471 Lots of search features to explore Lots of search features to explore

15 Yahoo Advanced Search Features BooleanPhraseCurrencyDomain File format CountryLanguageOther

16 Yahoo Advanced Search Features Exercise Perform a search on a topic of your choice Perform a search on a topic of your choice Use Boolean equivalents Use Boolean equivalents All the words=AND The exact phrase=phrase; proximity search Any of these words=OR None of these words=Not Choose part of page to search Choose part of page to search Choose language other than English Choose language other than English Report results in class Report results in class

17 Yahoo Search Services For searching specific content area such as Search Services Search Services Web Search Find anything from across the Web Web Search Find anything from across the Web Web Search Web Search Answers Ask questions and get answers from real people Answers Ask questions and get answers from real people Answers Audio Search Find over 50mm audio files from across the Web Audio Search Find over 50mm audio files from across the Web Audio Search Audio Search Creative Commons Search Find Creative Commons content that you can share or re-use in your own works Creative Commons Search Find Creative Commons content that you can share or re-use in your own works Creative Commons Search Creative Commons Search Directory Search Search or browse Yahoo!'s categorized guide to the Web Directory Search Search or browse Yahoo!'s categorized guide to the Web Directory Search Directory Search Image Search Find over 1.6 Billion photos and illustrations from all over the Web Image Search Find over 1.6 Billion photos and illustrations from all over the Web Image Search Image Search Job Search Search for jobs, post your resume and more on Yahoo! HotJobs Job Search Search for jobs, post your resume and more on Yahoo! HotJobs Job Search Job Search Local Find everything in your area from dry cleaners to day spas Local Find everything in your area from dry cleaners to day spas Local Maps Find maps and driving directions for anywhere you want to go Maps Find maps and driving directions for anywhere you want to go Maps Mobile Search Find whatever, wherever you are Mobile Search Find whatever, wherever you are Mobile Search Mobile Search My Web (Beta) The newest way to save, share and organize any page you want on the Web My Web (Beta) The newest way to save, share and organize any page you want on the Web My Web My Web News Search Search for news stories and related photos, videos and audio clips News Search Search for news stories and related photos, videos and audio clips News Search News Search

18 Yahoo Next http://next.yahoo.com/ Cutting edge technology at Yahoo Cutting edge technology at Yahoo Blogs, Web 2.0, use of alltheweb, Yahoo Maps, Podcasts, audio and all other features that are in Beta testing Blogs, Web 2.0, use of alltheweb, Yahoo Maps, Podcasts, audio and all other features that are in Beta testing

19 Yahoo Preferences Customize Yahoo to fit your needs Go to Preferences from the Web search page Edit preferences based on your needs Edited preferences are saved in browser on desktop

20 General Search Strategies in Search Engines

21 Strategies Boolean Boolean equivalents Proximity and phrase searching Searching within a field Search limits

22 Yahoo Search Strategies Explore Yahoo’s help page Read the Search Tips Read the search limit parameters such as Intitle: Intitle: url: url: inurl: inurl: Read how to use Boolean equivalents and other search parameters

23 General Search Engines Besides Yahoo Search

24 Engines and Information Need Several general search engines on the Web Select engine(s) that best fit your need Visit the Web Search Guide for latest information: http://websearch.about.com/od/generalsearch engines/General_AllPurpose_Search_Engine s.htm http://websearch.about.com/od/generalsearch engines/General_AllPurpose_Search_Engine s.htm http://websearch.about.com/od/generalsearch engines/General_AllPurpose_Search_Engine s.htm http://websearch.about.com/od/generalsearch engines/General_AllPurpose_Search_Engine s.htm

25 Hands-on Activity Browe the list of general search engines in Web Search Guide Explore 4 of the engines listed Wisenut, Snap.com, Lycos, Exalead Wisenut, Snap.com, Lycos, Exalead Search under my name in each engine Search under my name in each engine Compare the results by viewing the first two pages retrieved Compare the results by viewing the first two pages retrieved How many overlaps were found among the three engines How many overlaps were found among the three engines How many unique results were found in each engine How many unique results were found in each engine

26 Specialized Search Engines Web Search Guide has a listing of specialized search engines Web companion to the textbook, chapter 3 describes a variety of specialized engines Explore chapter 3 familiarize yourself with the engines described

27 Hands-on Activity Find the answer or relevant information for these two queries using an appropriate, specialized search engine: Do squirrels hybernate? Do squirrels hybernate? Find me a list of foreign-owned companies based in the U.S., organized by state. Find me a list of foreign-owned companies based in the U.S., organized by state.


Download ppt "How Search Engines Work General Search Strategies Dr. Dania Bilal IS 587 SIS Fall 2007."

Similar presentations


Ads by Google