Presentation is loading. Please wait.

Presentation is loading. Please wait.

Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.

Similar presentations


Presentation on theme: "Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof."— Presentation transcript:

1 seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.

2  INTRODUCTION  HISTORY  WORKING  TYPES OF SERCH ENGINE  INVISIBLE WEB  ADVANTAGES  CONCLUSION CONTENTS IN SEARCH ENGINE:

3 INTRODUCTION INTRODUCTION Search engine is a software program that searches for sites based on the words that you designate as search terms. Search engines look through their own databases of information in order to find what it is that you are looking for. “Search engine” is the popular term for an Information Retrieval (IR) system. Protocol name and IP address or domain name are specified at first and second part of web address

4 INTRODUCTION Before search engines were introduced finding required information on web was impossible, Google was the successful company to launch search engine which had made searching web pages in easy and accurate way. Search engines are classified based on crawlers, spiders, human submissions and combination of two. Every search engine has many web pages stored on their database but search engines with large number of pages on web are not top search engines. Search engines which will provide accurate information based on requested keyword will be the top search engines.

5 HISTORY o The first Web search engine was "Wandex", a now- defunct index collected by the World Wide Web Wanderer, a web crawler developed by Matthew Gray at MIT in 1993 The first "full text" crawler-based search engine was WebCrawler, which came out in 1994 Several companies entered the market spectacularly, recording record gains during their initial public offerings. Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light.

6 Timeline yearEngineEvent 1993AliwebLaunch 1994 WebCrawlerLaunch InfoseekLaunch LycosLaunch 1995 AltaVistaLaunch (part of DEC) ExciteLaunch 1996 DogpileLaunch InktomiFounded Ask JeevesFounded 1997Northern LightLaunch 1998GoogleLaunch 1999AlltheWebLaunch 2000TeomaFounded 2003Objects SearchLaunch 2004 Yahoo! Search Final launch (first original results) MSN SearchBeta launch 2005 MSN Search Final launch FinQoo Meta Search 2006 QuaeroFinal launch KosmixBeta launch

7 WORKING Without the use of sophisticated search engines, it would be virtually impossible to locate anything on the Web without knowing a specific URL (Uniform Resource Locator), The first part of the address indicates what protocol to use, and the second part specifies the IP address or the domain name where the resource is located.(www.gmail.com) The global address of documents and other resources on the World Wide Web.

8 How do Search Engine Works  Spiders  Robots

9 SPIDERS To find information on the hundreds of millions of Web pages that exist, a search engine employs special software robots, called spiders or Crawler, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. After spiders or crawlers find pages, they pass them on to another computer program for "indexing." This program identifies the text, links, and other content in the page and stores it in the search engine database's files so that the database can be searched by keyword.

10 Building the Index: Once the spiders have completed the task of finding information on Web pages the search engine must store the information in a way that makes it useful. There are two key components involved in making the gathered data accessible to users: 1.The information stored with the data 2.The method by which the information is indexed To make for more useful results, most search engines store more than just the word and URL. the data will be encoded to save storage space.

11 The steps involved in working process of search engine are: 1. Document Gathering - done by Crawlers, spiders. 2.Document Indexing - done by Indexer 3.Searching 4.Visualisation of Results The steps involved in working process of search engine are: 1. Document Gathering - done by Crawlers, spiders. 2.Document Indexing - done by Indexer 3.Searching 4.Visualisation of Results It allows information to be found as quickly as possible. There are quite a few ways for an index to be built, but one of the most effective ways is to build a hash table.

12

13

14

15

16

17 Resolving a Query Consider ( cat hat mat ) Select a word from query ( “cat” ) Retrieve the list for the word cat Process the list and for each document add weights to the accumulator based on TF,ITF, doc length. Find the best ranked document and look up the mapping table. Retrieve and Summarise the docs.

18 Search Engine Modules : A document processor A query processor A search and matching function A ranking capability Summarisng and Presenting documents.

19 Tips for effective web searching ◦ Highly specific or topics with unique terms/ many concepts: use the search tools ◦ Go through the ‘help’ pages of search tools carefully ◦ Gather sufficient information about the search topic before searching  Spelling variations, synonyms, broader and narrower terms ◦ Use specific keywords, rare/unusual words are better than common ones Enter most important terms first - some search tools are sensitive to word order

20 TYPES OF SEARCH ENGINE Crawler-Based Search Engines Human-Powered Directories Hybrid Search Engines Or Mixed Results

21 Spider or Crawlers: Spider or Crawlers: Spider is a program that automatically fetches Web pages. Spiders are used to feed pages to search engines. It's called a spider because it crawls over the Web. Large search engines, like Alta Vista, have many spiders working in parallel. Because most Web pages contain links to other pages, a spider can start almost anywhere. The behavior of a Web crawler is the outcome of a combination of policies: a selection policy that states which pages to download, a re-visit policy that states when to check for changes to the pages, a politeness policy that states how to avoid overloading Web sites, and A parallelization policy that states how to coordinate distributed Web crawlers.

22 Human Powered Search Engines: Human-powered search engines rely on humans to submit information that is subsequently indexed and catalogued. Only information that is submitted is put into the index. This explains why sometimes a search on a commercial search engine, such as Yahoo! or Google, will return results that are in fact dead links. Since the search results are based on the index, if the index hasn't been updated since a Web page became invalid the search engine treats the page as still an active link even though it no longer is

23 So why will the same search on different search engines produce different results? because not all indices are going to be exactly the same. It depends on what the spiders find or what the humans submitted. But more important, not every search engine uses the same algorithm to search through the indices. Hybrid Search Engines or Mixed Results: Today, it extremely common for both types of results to be presented. Usually, a hybrid search engine will favor one type of listings over another. F or example, MSN Search is more likely to present human-powered listings from Look Smart

24 Challenges faced by Web search engines: The web is growing much faster than any present- technology search engine can possibly index (see distributed web crawling). The queries one can make are currently limited to searching for key words, which may result in many false positives Many dynamically generated sites are not indexable by search engines; this phenomenon is known as the invisible web.

25 CONCLUSION: Search engine plays important role in accessing the content over the internet, it fetches the pages requested by the user. It made the internet and accessing the information just a click away. The need for better search engines only increases The search engine sites are among the most popular websites.

26 Future Search: One of the areas of search engine research is concept- based searching. Some of this research involves using statistical analysis on pages containing the words or phrases you search for, in order to find other pages you might be interested in. The information stored about each page is greater for a concept-based search engine, and far more processing is required for each search. Many groups are working to improve both results and performance of this type of search engine.

27 QUERIES ?

28


Download ppt "Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof."

Similar presentations


Ads by Google