INTRODUCTION: Search Engines - What are they? How do they work? How about some tips & strategies? CHOICES: What’s the difference between searches, metasearches.

INTRODUCTION: Search Engines - What are they? How do they work? How about some tips & strategies? CHOICES: What's the difference between searches, metasearches and other online databases? RESOURCES: A categorized list of top websites for Internet exploration Written by John Griffiths for the Pelham Schools Pelham, NY 10803

2 The Web is potentially a terrific place to get information on almost any topic. Doing research without leaving your desk sounds like a great idea, but all too often you end up wasting precious time chasing down useless URLs. Almost everyone agrees that there's got to be a better way. But for now we're stuck with making the best use of the search tools that already exist on the Web.

3 The main devices we use for finding topics on the Intenet are “search engines”. Search engines use software robots to survey the Web and build their databases. Web documents are retrieved and indexed. The primary method of searching is by inputting a “keyword”. Most search engines now index every word on every page. If a search engine finds your keyword on a webpage, it puts that webpage on your results list.

4 The Problem With Keyword Searching Keyword searches have a tough time distinguishing between words that are spelled the same way, but mean something different (I.e. hard cider, a hard stone, a hard exam, and the hard drive on your computer). This often results in hits that are completely irrelevant to your query. Some search engines also have trouble with so-called stemming --i.e., if you enter the word "big," should they return a hit on the word, "bigger?”. What about singular and plural words? What about verb tenses that differ from the word you entered by only an "s," or an "ed"? Search engines also cannot return hits on keywords that mean the same, but are not actually entered in your query. A query on heart disease would not return a document that used the word "cardiac" instead of "heart."

5 Refining Your Search : Advanced Searches Most sites offer two different types of searches--"basic" and "advanced." In a "basic" search, you just enter a keyword without sifting through any pulldown menus of additional options. Advanced search refining options differ from one search engine to another, but some of the possibilities include the ability to search on more than one word, to give more weight to one search term than you give to another, and to exclude words that might be likely to muddy the results. You might also be able to search on proper names, on phrases, and on words that are found within a certain proximity to other search terms.

6 Refining Your Search: Boolean Logic Boolean logic refers to the logical relationship among search terms, and is named for the British mathematician George Boole. Boolean logic consists of three logical operators: OR - AND - NOT Each operator can be visually described by using Venn diagrams, as shown on the following pages. NOTE: Not all search engines permit Boolean Searches. Click here for a Search Engine Feature Chart.

7 college OR university Query: I would like information about college. In this search, we will retrieve records in which AT LEAST ONE of the search terms is present. We are searching on the terms college and also university since documents containing either of these words might be relevant. This is illustrated by: the shaded circle with the word college representing all the records that contain the word "college" the shaded circle with the word university representing all the records that contain the word "university" the shaded overlap area representing all the records that contain both "college" and "university" OR logic is most commonly used to search for synonymous terms or concepts. Here is an example of how OR logic works: Search terms Results college 17,320,770 university 33,685,205 college OR university 33,702,660

8 The more terms or concepts we combine in a search with OR logic, the more records we will retrieve. For example: Search termsResults college17,320,770 university33,685,205 college OR university33,702,660 college OR university OR campus33,703,082

9 poverty AND crime Query: I'm interested in the relationship between poverty and crime.  In this search, we retrieve records in which BOTH of the search terms are present  This is illustrated by the shaded area overlapping the two circles representing all the records that contain both the word "poverty" and the word "crime"  Notice how we do not retrieve any records with only "poverty" or only "crime" Here is an example of how AND logic works: Search terms Results poverty 783,447 crime 2,962,165 poverty AND crime 1,677 NOTE: On some search engines you must type in the symbol “+” rather than “AND”

10 The more terms or concepts we combine in a search with AND logic, the fewer records we will retrieve. For example: Search terms Results poverty 783,447 crime 2,962,165 poverty AND crime 1,677 poverty AND crime AND gender76 A few Internet search engines make use of the proximity operator NEAR. A proximity operator determines the closeness of terms within a source document. NEAR is a restrictive AND. The closeness of the search terms is determined by the particular search engine. For example, NEAR in AltaVista (Power Search) is 10 words. As another example, Google defaults to proximity searching by default.

11 cats NOT dogs Query: I want to see information about cats, but I want to avoid seeing anything about dogs.  In this search, we retrieve records in which ONLY ONE of the terms is present  This is illustrated by the shaded area with the word cats representing all the records containing the word "cats"  No records are retrieved in which the word "dogs" appears, even if the word "cats" appears there too Here is an example of how NOT logic works: Search termsResults cats3,651,252 dogs4,556,515 cats NOT dogs81,497 NOT logic excludes records from your search results. Be careful when you use NOT: the term you do want may be present in an important way in documents that also contain the word you wish to avoid. NOTE: On some search engines you must type in the symbol “-” rather than “NOT”

12 Web Search Strategies or

13 How do Search Engines Work? Search Engines for the general web do not really search the World Wide Web directly. Each one searches a database of the full text of web pages selected from the billions of web pages out there residing on servers. When you search the web using a search engine, you are always searching a somewhat stale copy of the real web page. When you click on links provided in a search engine's search results, you retrieve from the server the current version of the page. Search engine databases are selected and built by computer robot programs called spiders. Although it is said they "crawl" the web in their hunt for pages to include, in truth they stay in one place. They find the pages for potential inclusion by following the links in the pages they already have in their database (i.e., already "know about"). They cannot think or type a URL or use judgment to "decide" to go look something up and see what's on the web about it. (Computers are getting more sophisticated all the time, but they are still brainless.) If a web page is never linked to in any other page, search engine spiders cannot find it. The only way a brand new page - one that no other page has ever linked to - can get into a search engine is for its URL to be sent by some human to the search engine companies as a request that the new page be included. All search engine companies offer ways to do this. After spiders find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword and whatever more advanced approaches are offered, and the page will be found if your search matches its content. Some types of pages and links are excluded from most search engines by policy. Others are excluded because search engine spiders cannot access them. Pages that are excluded are referred to as the “Invisible Web” -- what you don't see in search engine results. The Invisible Web is estimated to be two to three or more times bigger than the visible web.

14 This and the preceding page were reprinted with permission from: What Are "Meta-Search" Engines? In a meta-search engine, you submit keywords in its search box, and it transmits your search simultaneously to several individual search engines and their databases of web pages. Within a few seconds, you get back results from all the search engines queried. Meta-search engines do not own a database of Web pages; they send your search terms to the databases maintained for other search engines. What's WRONG with relying on Meta-Searchers? The idea of meta- searching is much better than the reality. You would think you would save a lot of time by searching only in one place and sparing the need to use and learn several seperate search engines. In fact, that is what people claim. But, in truth, meta-searchers offer a quick and dirty approach to searching that sometimes works. Take a look at these drawbacks to them:  None of them searches Google (unless they pay) or Northern Light (ever). Google is the BEST search engine database and Northern Light is very important in academic research.  Most of them dumbly pass your search terms on, without any concern to what happens to your carefully place " " or AND, OR or AND NOT, let alone your NEAR or you + or -. (Ixqquick and ProFusion handle complex searches intelligently.)  If you search does not get what you want, you do not have the ability to refine your search as you in what I consider the most powerful search engines around (Google, AltaVista, Northern Light & HotBot). All you can do is add a term and wonder where the meta-search engine is sending it.  None of the meta-search engines consistently queries all of the search engine it claims to query, and you don't know for sure what it is querying until you read the results. If you use ProFusion’s advanced search, you have the best control available.

15 What are Online Databases A Database is any collection of data organized for storage in a computer’s memory. It is designed for easy access by authorized users. The data may be in the form of text, numbers, or encoded graphics. Online databases allow users to search for specific information within a web site or to access an archived database of information. Online databases are usually content specific. They can include encyclopedia and other library reference information, journal articles, collections of art, clip art or literature, etc. Searching an online database sometimes makes a lot of sense! If you wanted to find a specific map, for instance, you would come across one faster by searching an online atlas database than by using any search engine. There are several links included in the RESOURCE chapter to clip art and library reference material. If you would like to see what other type of online databases exist, I recommend this University of Richmond HotList. University of Richmond Informational Services

16 Search Engines MetaSearch Kid’s Engines Online Reference Clip Art Sites Specialty

17 Goggle is a very powerful search engine for everyday use. Its also has a special section that allows you to just search for images. HOTBOT’s greatest benefit is its advanced search feature. A template helps you create a Boolean or phrase search, or limit by media type, date, domain, etc. All The Web allows users to search specific databases in its system: web pages, pictures, videos, and mp3 files. Its advanced search feature contains modifiable word filters. AltaVista is one of the Internet’s largest search engines. Alta Vista includes little words (such as a, to, be, not ) in the search so that you may search on often-ignored words in a phrase (e.g. "Vitamin A" or "to be or not to be").

18 http://www.dogpile.com Each engine that Ixquick queries has its unique strengths and vulnerabilities, so some but not all of each engine's top choices are likely to be relevant for you. Because engines have different vulnerabilities, irrelevant sites are unlikely to be prominently selected by multiple engines. When engines agree that a site is tops, and have reached that decision in different ways, the site is likely to be relevant. Dogpile SM, the most used and popular metasearch engine, utilizes the Web's best search engines simultaneously, returning the most comprehensive search results from across the Internet. Dogpile combs the most popular search engines to compile a complete list of matches for a user's query and organizes the results by each individually searched resource. The New ProFusion has over 1000 sources divided into 200+ search groups. You can search hundreds of sites for targeted content in one query! As well as providing access to the Web's most authoritative sources, ProFusion lets you search more than 500 sources from the Invisible Web, a vast resource of information neglected by traditional search engines. Searchalot searches over 69 search engines at the same time. It saves you time and really does find what you are searching for! Searchalot even has a global search with over 237 international search engines.


20 A searchable online encyclopediaAnother searchable online encyclopedia A searchable database of all commonly found reference books including the Columbia Encyclopedia, Roget’s, Bartlett’s and more! An extensive collection of links to general and subject specific reference resources.


22 ILOR is a search engine which provides very useful ways of handling search results. You can manage search lists and control views in very unique ways. Ask Jeeves is a search engine with the useful ability of generally following natural language. You can therefore ask it questions and it will answer them with appropriate search results. Type in a word and find its rhymes, synonyms, antonyms, definition, related words, similar sounding words, homophones, or similarly spelled words. An engine that categorizes results into folders by concept, document and site type. This list (on the left side of the screen) helps to find the type of information you really need!

