5Search Engines?A search engine is a web site that uses software to browse the Internet.A search engine will retrieve a listing of World Wide Web sites related to the key words you specify.
6How Search Engines Work Read pages they find on the web (spider)Store text in an “index”When you search, they look for pages with matching textOther factors involved in “ranking” those pages, such as “link popularity”
7Search Engine INdexing Computer-driven search toolWebsite owners submit web address of their homepage for inclusion in the databaseRobots periodically spider the Web, detect the homepage and proceed to scan every page in the entire website(The first words on the homepage appear as the ‘result’ the user sees in the search engine)
8Search Engines Crawler-based Search Engines Human-based Search Engines “Spiders” or “Crawlers” visit websites and some of their pages periodically, and adds to indexScans links and adds them to their indexReturns the information to the index or catalogSearch engine software sifts the index and ranks in relevant orderHuman-based Search EnginesMixedFocused Crawlers: LawCrawler (http://lawcrawler.findlaw.com) Part of the Fedlaw service which also has a targeted directory.sponsored by the APA
9Directories Vs Search Engines When should you use a directory?When you have a broad topicWhen you want experts to recommend sitesWhen you want to avoid irrelevant sitesExamples topics:DisabilitiesCivil WarWelfare
10Directories Vs Search Engines When should you use a search engine?When you have a narrow topicWhen you are looking for a specific websiteWhen you want to search for a file type or languageExamples:Americans with Disabilities ActBattle of GettsyburgWelfare to Work
11Start Your Search Engines Here GoogleAllTheWebYahooMSNWhy? See:
14GoogleGoogle is the undisputed leader in search engines, with the largest database and highly relevant resultsUses an algorithm based on site popularityThe more inbound links pointing to a particular site from another site Google thinks is worthwhile, then that site will receive a higher page rank in the resultsWary of minimising advertising - no frills design, nice clean look and no pop-up ads
15AllTheWeb & AltaVistaAllTheWeb used to be a Norwegian search engine FAST and for a while was one of the Web’s best kept secretsAltaVista was the first search engine in 1995 and was THE search engine before Google existedRecently, Overture acquired FAST and AltaVistaThis year, Yahoo! acquired Overture and Inktomi, making Yahoo! the largest network of major search tools on the InternetAllTheWeb & AltaVista’s future are now unknown, as many results are simply retrieved from Yahoo
16Open Directory & Ask Jeeves Open Directory Project is the largest humanly-compiled search directory on the WebAs each website is considered for inclusion by a human (many don’t make it) - quality is assuredAsk Jeeves uses special natural language technology, so the user can ask a complete question instead of inputting only a few wordsIt then searches its own database and supplements this with results from TeomaAsk Jeeves is popular with young Web users
17Understand Limitations of Search Engines Search “spiders” or “crawlers” do *not* crawl in real timeLag times getting info to the index vary by search engineIf a website is not submitted to the search engine it won’t be crawledNot every page from a website is crawledA webmaster can choose to not have a page crawledFormats like PDF, Flash, Zip files, executable programs, and others cannot be searchedThe “Invisible Web”If a webpage has no links pointing to it from another page the crawler can’t find it.
18Evaluating Web Sites Continued… Can you find this news reported on a legitimate news website?Who is the sponsor of the website?Are there inconsistencies or inaccuracies in the information?If an organization is mentioned by name, does the organization have any related information on this website?
19Meta Search EngineSearches more than one search engine simultaneously (often up to fifteen)Each meta search engine normally searches a different combination of search enginesSimultaneous multiple engine searching saves the user lots of timeBut meta search engines only skim the surface of each engine’s database and sometimes lack depth when searching for results
20Top Meta Search Engines KartooTurbo 10DogpileMammaRed Hot ChilliMeta EurekaWeb TaxiVivisimoixquickiBoogieMetacrawlerSupercrawlerSearch.comQuery Server
21Kartoo & Turbo10These search tools cluster sets of results on similar topics and display them on the side frameKartoo is arguably the funkiest search facility on the Web, displaying results as a visual mind mapThere’s a basic and expert version for searchingTurbo10 is unique because it has a long list of specialist databases on specific subjectsSearches the Deep Net (others rarely go there)Users can also tailor their searching by selecting unusual databases of their own choosing
22Vivisimo, Dogpile & Mamma Vivisimo also uses clustering technology and allows users to choose their own search enginesDogpile was one of the earliest meta search engines and remains very popular todayIt’s major advantage lies in its search engines: Google, Yahoo!, Ask Jeeves, Teoma, About etc.Canadian-based Mamma began in 1996 as a Masters thesis, arguably the first meta searchToday it is a well respected search tool and like Dogpile, searches the Web’s top engines such as Google, Open Directory, Teoma and others
23Image ( + Meta) Searching Although pictures on websites often appear neatly embedded amongst text, each image needs a unique URL, allowing picture searchingGoogle’s image search is one of the best on the Web, partly because of the size of its databaseThere are also excellent picture meta search engines: iBoogie, Dogpile, ixquick and 1Bananapicsearch is solely a picture search engine and markets itself as family and user friendly
24Language Translation Google and AltaVista offer language translation Google will allow you to translate a foreign language website or page and even allow you to link to the translated page from another websiteAltaVista uses Babel Fish for its translation and you can also translate blocks of textSome of the best websites on Bertolt Brecht and his Epic Theatre are actually in German, so this is an example of where translation tools are worthwhile if you speak another language
25Useful Reference Tools You can find free dictionaries online, such as Merriam Webster, Oxford, Macquarie, Cambridge and Dictionary.comMost dictionaries also have a thesaurus tabThe meta dictionary OneLook simultaneously searches nearly 1,000 generalist and specialist dictionaries!Some of the weirdest words out there are at the Strange and Unusual Dictionaries websiteOr visit RyhmeZone’s Rhyming Dictionary & Thesaurus for a bit of fun!
26More Reference ToolsIf looking for the origin of a phrase or saying, try Brewer’s Dictionary of Phrase and FableThere’s also the ClichéSite or the Hutchinson Dictionary of Difficult WordsFree encyclopaedias include Encyclopedia.com, Columbia, Encarta (partly free), Wikipedia, Hutchinson & the 1911 Encyclopaedia Britannica considered by many to be the best edition ever!Way Back Machine has been archiving large portions of the Web since 1996, so if a website has suddenly disappeared, search for it here!
27The Invisible WebWeb information that does not get indexed by the major search engines.Hidden mostly in databases or have robot.txt file attachedData created on the fly from the backend (cgi-bin, etc)More than ¾ of information on the Web is part of the IW.
28The Invisible Web – 4 Types Opaque: search engines choose not to indexThe Private Web: password protectedThe Proprietary Web: registration required (either fee or free)The Truly Invisible Web: can’t search certain file formats and databasesOPAQUE: Depth of crawl is limited – sometimes for cost reasonsFrequency of crawl is limited - pages that change every day are affectedMaximum Number of ResultsPRIVATE: password protected, robots files disallows spiders access, “noindex” meta tag prevents accessPROPRIETARY: either free or fee registration is required. Examples are The New York Times, The Well, The Wall Street Journal Interactive Edition.TRULY INVISIBLE: file formats like PDF, Flash, Shockwave, etc.; dynamically generated web pages
29Examples of the IW Online telephone and address databases News engines Professional look-up services (AMA)Movie and Book ReviewsEducation databases (ERIC)Medical databases (Medline)