Search Engines: The players and the field The mechanics of a typical search. The search engine wars. Statistics from search engine logs. The architecture.

Slides:



Advertisements
Similar presentations
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 4.1 Chapter 4 : Searching the Web The mechanics.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
Search Engine Optimization (SEO)
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Searching the Web Mark Levene (Follow the links to learn more!)
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
1 CS 502: Computing Methods for Digital Libraries Lecture 16 Web search engines.
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
1 ETT 429 Spring 2007 Microsoft Publisher II. 2 World Wide Web Terminology Internet Web pages Browsers Search Engines.
Web Information retrieval Lecture 1 Presentation by Andrei Broder, IBM Krishna Bharat, Google Prabhakar Rahavan, Verity Inc.
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
Internet Research Search Engines & Subject Directories.
What’s The Difference??  Subject Directory  Search Engine  Deep Web Search.
What are search engines? Tools used for locating web pages Automated software programs known as spiders or bots to survey the Web and build their databases.
Effective Internet Searching. Why use the Internet Search for a question Research a topic Current research Variety of sources, a click away What other.
SEARCH ENGINE By Ms. Preeti Patel Lecturer School of Library and Information Science DAVV, Indore E mail:
An Application of Graphs: Search Engines (most material adapted from slides by Peter Lee) Slides by Laurie Hiyakumoto.
Algorithms for Information Retrieval Prologue. References Managing gigabytes A. Moffat, T. Bell e I. Witten, Kaufmann Publisher A bunch of scientific.
Λ14 Διαδικτυακά Κοινωνικά Δίκτυα και Μέσα
How Search Engines Work General Search Strategies Dr. Dania Bilal IS 587 SIS Fall 2007.
Browser Wars and the Politics of Search Engines
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Web Search Created by Ejaj Ahamed. What is web?  The World Wide Web began in 1989 at the CERN Particle Physics Lab in Switzerland. The Web did not gain.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
 Search Engine Search Engine  Steps to Search for webpages pertaining to a specific information Steps to Search for webpages pertaining to a specific.
Influence of Search Engines Christina Pong cs349.
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Search Engines: The players and the field The mechanics of a typical search. The search engine wars. Statistics from search engine logs. The architecture.
Search Engines.
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
The World Wide Web: Information Resource. Hock, Randolph. The Extreme Searcher’s Internet Handbook. 2 nd ed. CyberAge Books: Medford. (2007). Internet.
Search Tools and Search Engines Searching for Information and common found internet file types.
Understanding User Goals in Web Search University of Seoul Computer Science Database Lab. Min Mi-young.
Internet and WWW. Internet Network linking computers to other computers Access to numerous resources – Communications systems Instant messaging.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Chapter 1 Getting Listed. Objectives Understand how search engines work Use various strategies of getting listed in search engines Register with search.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
Search Engines Information Technology and Social Life March 2, 2005.
1 CS 430: Information Discovery Lecture 18 Web Search Engines: Google.
Web Search Architecture & The Deep Web
CS276A Text Information Retrieval, Mining, and Exploitation Lecture November, 2002 Special thanks to Andrei Broder, IBM Krishna Bharat, Google for.
Web Design Terminology Unit 2 STEM. 1. Accessibility – a web page or site that address the users limitations or disabilities 2. Active server page (ASP)
Search Engine and Optimization 1. Introduction to Web Search Engines 2.
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
Searching the Web for academic information Ruth Stubbings.
CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES.
SEARCH ENGINE by: by: B.Anudeep B.Anudeep Y5CS016 Y5CS016.
WEB SEARCH BASICS By K.KARTHIKEYAN. Web search basics The Web Ad indexes Web spider Indexer Indexes Search User Sec
Education 499-R01 Search Basics.
Search Engines and Search techniques
Search Engine Architecture
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Text Based Information Retrieval
Search Engines & Subject Directories
ITE 130 Web Searching.
What is a Search Engine EIT, Author Gay Robertson, 2017.
Data Mining Chapter 6 Search Engines
Search Engines & Subject Directories
Search Engines & Subject Directories
Web Search Engines.
Presentation transcript:

Search Engines: The players and the field The mechanics of a typical search. The search engine wars. Statistics from search engine logs. The architecture of a search engine. The query engine.

Mechanics of a typical search

Results & ads returned ranked

Category of first result

Result for phrase query

Search on the Web Corpus: The publicly accessible Web: static + dynamic Goal: Retrieve high quality results relevant to the user’s need (not docs!) Need Informational – want to learn about something Navigational – want to go to that page Transactional – want to do something (web-mediated)  Access a service  Downloads  Shop Gray areas  Find a good hub  Exploratory search “see what’s there” Low hemoglobin United Airlines Tampere weather Mars surface images Nikon CoolPix Car rental Finland Abortion morality

Search Engines as Info Gatekeepers Search engines are becoming the primary entry point for discovering web pages. Ranking of web pages influences which pages users will view. Exclusion of a site from search engines will cut off the site from its intended audience. The privacy policy of a search engine is important. Introna & Nissenbaum: Defining the Web: The Politics of Search Engines Hindman et al: Googlearchy: How a few Heavily-Linked Sites Dominate Politics on the Web

Search Engine Wars The battle for domination of the web search space is heating up! The competition is good news for users! Crucial: advertising is combined with search results! What if one of the search engines will manage to dominate the space?

Yahoo! Synonymous with the dot-com boom, probably the best known brand on the web. Started off as a web directory service in 1994, acquired leading search engine technology in Has very strong advertising and e-commerce partners

Lycos! One of the pioneers of the field Introduced innovations that inspired the creation of Google

Google Verb “google” has become synonymous with searching for information on the web. Has raised the bar on search quality Has been the most popular search engine in the last few years. Had a very successful IPO in August Is innovative and dynamic. Has restored glamour in CS lost in dot-com-bust

Live Search ( was: MSN Search) Synonymous with PC software. Remember its victory in the browser wars with Netscape. Developed its own search engine technology only recently, officially launched in Feb May link web search into its next version of Windows.

Ask Jeeves Specialises in natural language question answering. Search driven by Teoma.Teoma

Cuil The latest kid on the block Claims to have indexed 120B pages! So far, it does not rank!

Experiment with query syntax Default is AND, e.g. “computer chess” normally interpreted as “computer AND chess”, i.e. both keywords must be present in all hits. “+chess” in a query means the user insists that “chess” be present in all hits. “computer OR chess” means either keywords must be present in all hits. “”computer chess”” means that the phrase “computer chess” must be present in all hits.

Statistics from search engine logs Statistic (Year) AltaVista (1998) AlltheWeb (2002) Excite (2001) average terms per query average queries per session average result pages viewed usage of advanced search features 20.4%1.0%10.0%

The most popular search keywords AltaVista (1998)AlltheWeb (2002)Excite (2001) sexfree appletsex pornodownloadpictures mp3softwarenew chatuknude

Web search Users Ill-defined queries Short length Imprecise terms Sub-optimal syntax (80% queries without operator) Low effort in defining queries Wide variance in Needs Expectations Knowledge Bandwidth Specific behavior 85% look over one result screen only mostly above the fold 78% of queries are not modified  1 query/session Follow links – “the scent of information”...

Query Distribution Power law: few popular broad queries, many rare specific queries

How far do people look for results? (Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)iprospect.com

Architecture of a Search Engine The Web Ad indexes Web spider Indexer Indexes Search User

Rate of web content change 720K pages from 270 popular sites sampled daily from Feb 17 – Jun 14, 1999 [Cho00] Mathematically, what does this seem to be? What does this suggest for crawling policy?

Diversity Languages/Encodings Hundreds of languages, W3C encodings: 55 (Jul01) [W3C01] Home pages (1997): English 82%, Next 15: 13% [Babe97] Google (mid 2001): English: 53%, JGCFSKRIP: 30% Document & query topic Popular Query Topics (from 1 million Google queries, Apr 2000) 1.8%Regional: Europe7.2%Business ………… 2.3%Business: Industries7.3%Recreation 3.2%Computers: Internet8%Adult 3.4%Computers: Software8.7%Society 4.4%Adult: Image Galleries10.3%Regional 5.3%Regional: North America13.8%Computers 6.1%Arts: Music14.6%Arts

Search Index - Inverted File Also store position of word in web page (“offset”) and information on HTML structure. Frequency

The query engine The interface between the search index, the user and the web. Algorithmic details of commercial search engines are kept as trade secrets. First step is retrieval of potential results from the index. Second step is the ranking of the results based on their “relevance” to the query.

Portal User Interface

Crawling the Web Mode of crawl: BFS Frequency of crawl: important robots.txt gives explicit directions on what not to crawl Parallel machines crawl all the time