Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.

Similar presentations


Presentation on theme: "Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain."— Presentation transcript:

1 Web Search Created by Ejaj Ahamed

2 What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain widespread popular use until browsers like NCSA Mosaic became available in 1993, and Netscape in 1994. The Web become more searchable began soon thereafter with search tools as the Wanderer and JumpStation in 1993.

3 Web challenges Distributed data: Documents exists over millions decentralized servers. Computers are interconnected without any predefined topology and the bandwidth and reliability also varies widely. There is no central registry for web servers and virtual hosting makes this more complicated. Volatile data: Many documents change or disappeared rapidly. It’s been predicted 40% of web changes monthly; as a result indexes quickly grow outdated or inaccurate. Scale: there are billions of separate documents. The growth appears exponential that poses scaling issues difficult to cope with.

4 Web challenges (Continued)  Lack of structure: No uniform structure, HTML errors, up to 30%(near) duplicate documents. Most HTML pages are not valid and have many formats. Much web data is repeated.  Quality of data: There are no editorial control, false information, poor quality writing etc. And there is undesirable contents, filtering those content is technically complex.  Heterogeneous data: Multiple media types (images, video, VRML), languages, character sets, etc. Initially, the Web was dominated by English speakers, now less then half of existing web pages are in English. The growth of non-English servers and users increased dramatically.

5 Search engines! Search engines are critically important to help users find relevant information on the World Wide Web People can search the Web by using different search engines that uses various algorithms and techniques There are also non-human conduct web searching now and they includes agents, softbots and automated processes or spiders.

6 How a search engine works?  Create an index  Receive a query – a set of search terms and commands  Look in the index file for matching  Gather the matching page entries and rank them by relevance  Format the results  Return the result page in HTML to the searcher web browser

7 Google Search Engine Architecture Source: - http://www-db.stanford.edu/~backrub/google.htmlhttp://www-db.stanford.edu/~backrub/google.html

8 Indexing process  Indexer Application - Gathers and stores text  Inverted Index File contains entries for each instance of each word: –Location within file ( for phrase matching) –Enclosing field or meta tag –Pointer to document info

9 Robot spider indexers  Many search engines use programs called robots to gather web pages for indexing. These programs are not limited to a pre-defined list of web pages, they can follow links on pages they find, which makes them a form of intelligent agent. The process of following links is called spidering.

10 Database indexers  Databases provide the content storage for many sites, which dynamically create web pages around them, including ecommerce catalog sites, online news, and even entertainment sites  Intranets often contain large amounts of text stored in databases as well.  databases generally have their own search functions, which may appear to take the place of a full-text search engine.

11 Database indexers (Continued)  Work best locally –Most use JDBC or ODBC –Can index via the web  Easiest with straightforward tables –Perform a join to build listings for indexing –Problems with legacy systems

12 Effective Site Search Index everything and keep it fresh Add synonym and spell checking Tweak relevance until it works for you Customize results pages Provide help for search failure

13 Conclusion  At present searching the World-Wide Web successfully is the basis for many of our information tasks today. Search engines provide us with the right information from a vast majority of web pages and it just accomplish its task with the minimum input from the users, generally one or two keywords. A lot of work has been done to make search engine more efficient but still there are substantial amounts of work remain to be accomplished in order to keep with the expansion flow of the Web.


Download ppt "Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain."

Similar presentations


Ads by Google