SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
Search Engine Introduction Introduction Components Components Type Type Functions Subject directories Vs Search Functions Subject directories Vs Search engine engine
Introduction: Search engine Search engine came into existence in According to Yahoo Search engine directory – 2003, there are over 448 major search engines. Search engine came into existence in According to Yahoo Search engine directory – 2003, there are over 448 major search engines. A SE is a searchable database of Internet files collected by a computer program (called wanderer, crawler, robot, worm and spider). A SE is a searchable database of Internet files collected by a computer program (called wanderer, crawler, robot, worm and spider).
Indexing is created from the colleted files e.g. title, full text, size, URL etc. There are no selection criteria for collection of files. SE allows the user to enter keywords and SE retrieve Web documents from its data base that match the key words entered by the searcher. Indexing is created from the colleted files e.g. title, full text, size, URL etc. There are no selection criteria for collection of files. SE allows the user to enter keywords and SE retrieve Web documents from its data base that match the key words entered by the searcher.
The SE doesn’t wait for someone to submit information about a site. It send spider/crawler/web crawler to visits publicly accessible websites following all links it comes across collecting data for search engine indexes. The SE doesn’t wait for someone to submit information about a site. It send spider/crawler/web crawler to visits publicly accessible websites following all links it comes across collecting data for search engine indexes.
A Spider discovers new sites and update information from sites previously visited. A spider can also be used to check links within websites. A Spider discovers new sites and update information from sites previously visited. A spider can also be used to check links within websites.
Components of SE A SE might well be called a search engine service or a search service. The components of SE are following- A SE might well be called a search engine service or a search service. The components of SE are following- Spider: Programs that traverses the Web from link to link, identifying and reading pages. Spider: Programs that traverses the Web from link to link, identifying and reading pages. Index: Web database containing a copy of each web page gathered by the spider. Index: Web database containing a copy of each web page gathered by the spider. SE Mechanism: Software that enables users to query the index and that usually returns results in relevancy ranked order. SE Mechanism: Software that enables users to query the index and that usually returns results in relevancy ranked order.
Types: SE A SE downloads all the information that the page contains and then examines that information to index key words and phrases that can be used to categories the sites. SE can be categorized into three types on the basis of the indexing techniques employed by them:- A SE downloads all the information that the page contains and then examines that information to index key words and phrases that can be used to categories the sites. SE can be categorized into three types on the basis of the indexing techniques employed by them:-
Active SE: It collect all information by itself. It uses a program calls ‘Spider’ or ‘Web robot’ to index and categories web pages as well as websites. The spider travel around WWW in search of new sites and add entries to their catalogue. Active SE: It collect all information by itself. It uses a program calls ‘Spider’ or ‘Web robot’ to index and categories web pages as well as websites. The spider travel around WWW in search of new sites and add entries to their catalogue.
Passive search engines or Subject directories:- Passive search engines or Subject directories:- This type of SE are possibly more accurately referred to as directories. It doesn’t seek out information by itself but it rely on the WWW users to submit details on their favorite sites in order to build up a database. For example yahoo directory has 14 main subject categories and each categories has many sub categories and sub categories also their own sub categories, and so on almost ad infinitum. This type of SE are possibly more accurately referred to as directories. It doesn’t seek out information by itself but it rely on the WWW users to submit details on their favorite sites in order to build up a database. For example yahoo directory has 14 main subject categories and each categories has many sub categories and sub categories also their own sub categories, and so on almost ad infinitum.
Due to size of the web and constant transformation,keeping up with important sites in all subject areas is humanly impossible. Due to size of the web and constant transformation,keeping up with important sites in all subject areas is humanly impossible.
Meta Search engine: Meta Search engine: An increasing number of search engines have led to the creation of ‘meta ‘ search tool. A meta search engine does not catalogue any web page by itself. It simultaneously searches multiple search engines. When query is put before this type of search engine,it forward that query to other search engines. An increasing number of search engines have led to the creation of ‘meta ‘ search tool. A meta search engine does not catalogue any web page by itself. It simultaneously searches multiple search engines. When query is put before this type of search engine,it forward that query to other search engines.
Types of meta Search engine There are two types of meta Search engine There are two types of meta Search engine 1. One type of SE provide separate list of results from each engine that was searched. With this type of Meta SE, one can retrieve comprehensive, and sometimes over whelming, results. 2. The other type is more common and returns a single list of results, often with the duplicate hits removed. This type of Meta SE always brings the results back to its own site for viewing.
Example: Metacrawler ( Metacrawler ( SurrfWax ( ) SurrfWax ( ) Zapmeta ( ) Zapmeta ( )
According scope the Search engine SE can divided in following categories. According scope the Search engine SE can divided in following categories. General Search engine : It covers a rage of services and facilities and facilitate Boolean search. Example: Google, Alta Vista etc. General Search engine : It covers a rage of services and facilities and facilitate Boolean search. Example: Google, Alta Vista etc. Regional Search Engine: It refer to country specific search engine for locating varied resources region –wise. Example : Euro Ferret( Europe) and Excite UK etc. Regional Search Engine: It refer to country specific search engine for locating varied resources region –wise. Example : Euro Ferret( Europe) and Excite UK etc.
Subject specific search engine: Subject specific search engine: It does not attempt to index the entire web. It focuses on searching for websites or pages within a defined subject area, geographical area or type of resources. Because this specific search engine aims for depth of coverage across subject. It does not attempt to index the entire web. It focuses on searching for websites or pages within a defined subject area, geographical area or type of resources. Because this specific search engine aims for depth of coverage across subject.
Examples: Examples: 1. Regional 1. RegionalWWW.123india.com 2. Regional 2. RegionalWWW.in.altavista.com 3. Employment 3. EmploymentWWW.nauri.com 4. Weather 4. WeatherWWW.zipcode.com 5. India specific 5. India specificwww.khoj.com
Features of SE When using a Web search engine by entering more than one words, the space between the words has a logical meaning that directly affects the results of the search. This is known as default syntax. Example: Alta Vista, Info seek and excite, a search, a search of word ‘bird migration’ means that the searcher will get back documents that contain either word’ Birds’ and the word ‘migration’ or both. When using a Web search engine by entering more than one words, the space between the words has a logical meaning that directly affects the results of the search. This is known as default syntax. Example: Alta Vista, Info seek and excite, a search, a search of word ‘bird migration’ means that the searcher will get back documents that contain either word’ Birds’ and the word ‘migration’ or both.
The space between the words defaults to the Boolean OR. This is probably not what the searcher will get back documents that contain both the words ’ Birds’ and ‘migration’. The space between the words defaults to the Boolean OR. This is probably not what the searcher will get back documents that contain both the words ’ Birds’ and ‘migration’. SE return results in schematic order. Most SE use various criteria to contract a term relevancy rating of each hit and present the search results in this order. SE return results in schematic order. Most SE use various criteria to contract a term relevancy rating of each hit and present the search results in this order.
Criteria can include: search term in the title, URL, first heading, HTML META tag; number of times search appear in the document; search terms appearing early in the document; search term appearing close together; etc. Criteria can include: search term in the title, URL, first heading, HTML META tag; number of times search appear in the document; search terms appearing early in the document; search term appearing close together; etc. SE technology continuous in developing stage. To day SE technology is organization of search results by concept, site, domain popularity and linking rather than by relevancy. SE technology continuous in developing stage. To day SE technology is organization of search results by concept, site, domain popularity and linking rather than by relevancy.
Following services provided by the SE Following services provided by the SE Direct Hit ranks according to sites other searchers have chosen from their results to similar queries. Direct Hit ranks according to sites other searchers have chosen from their results to similar queries. Google rank by the number of links from pages ranked high by services. Google rank by the number of links from pages ranked high by services. Inference find ranks by concept and top-level domain. Inference find ranks by concept and top-level domain. Meta find sorts results by keywords, alphabetically or by domain. Meta find sorts results by keywords, alphabetically or by domain.
SE do not index all the documents available on the web. Example most SE cannot index files to password protected sites, behind firewalls or configured by the host server to be left alone. Other web pages may not picked up if they are not linked to other pages. SE do not index all the documents available on the web. Example most SE cannot index files to password protected sites, behind firewalls or configured by the host server to be left alone. Other web pages may not picked up if they are not linked to other pages. SE rarely contain the most recent document posted to internet; do not look yesterday news on search engine SE rarely contain the most recent document posted to internet; do not look yesterday news on search engine
Contents of databases will generally not show up in a search engine results. A growing amount of valuable information on the Web is not generated from the database. Contents of databases will generally not show up in a search engine results. A growing amount of valuable information on the Web is not generated from the database. Some SE allow users to viewed display of the retrieved Web sites/ Web pages, clustered under different topics related to the search terms. Some SE allow users to viewed display of the retrieved Web sites/ Web pages, clustered under different topics related to the search terms.
FUNCTIONS OF SE They search the Internet by using a specialized software,called crawler or robot ;these software /agent can find out web pages by following hyper links. They search the Internet by using a specialized software,called crawler or robot ;these software /agent can find out web pages by following hyper links. These agent/ software sent the cached version of web pages to the repository of a search engine and SE keeps an index of words they find and where (URL) they find them These agent/ software sent the cached version of web pages to the repository of a search engine and SE keeps an index of words they find and where (URL) they find them
They allow users to look forwards or combinations of words found in that index They allow users to look forwards or combinations of words found in that index
Diagrammatic representation of Search Engine CRAWLARS Different Websites Different Websites Different Websites Different Websites Switch Indexing Software in search engine Database of search engine Search User Interface
Subject Directories Vs Search Engine A subject directories is a services that offers a collection of links to Internet resources submitted by the site creators or evaluators and organized into subject categories. A subject directories is a services that offers a collection of links to Internet resources submitted by the site creators or evaluators and organized into subject categories.