Search Engine Architecture

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

UCLA : GSE&IS : Department of Information StudiesJF : 276lec1.ppt : 5/2/2015 : 1 I N F S I N F O R M A T I O N R E T R I E V A L S Y S T E M S Week.
Web Search - Summer Term 2006 III. Web Search - Introduction (Cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Architecture of the 1st Google Search Engine SEARCHER URL SERVER CRAWLERS STORE SERVER REPOSITORY INDEXER D UMP L EXICON SORTERS ANCHORS URL RESOLVER (CF.
Web Search – Summer Term 2006 III. Web Search - Introduction (Cont.) - Jeff Dean, Google's Systems Lab:
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Searching the Web II. The Web Why is it important: –“Free” ubiquitous information resource –Broad coverage of topics and perspectives –Becoming dominant.
Information Retrieval in Practice
Searching The Web Search Engines are computer programs (variously called robots, crawlers, spiders, worms) that automatically visit Web sites and, starting.
1 Information Retrieval and Web Search Introduction.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Modern Information Retrieval Chapter 1 Introduction.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Basic IR Concepts & Techniques ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Exercise 1: Bayes Theorem (a). Exercise 1: Bayes Theorem (b) P (b 1 | c plain ) = P (c plain ) P (c plain | b 1 ) * P (b 1 )
Overview of Search Engines
Databases & Data Warehouses Chapter 3 Database Processing.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
Aardvark Anatomy of a Large-Scale Social Search Engine.
CS523 INFORMATION RETRIEVAL COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin & Lawrence Page Presented by: Siddharth Sriram & Joseph Xavier Department of Electrical.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Database VS. Search Engine Explore the difference between database* and search results Next.
Search Engines By: Faruq Hasan.
Introduction to Information Retrieval Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Search Engine and SEO Presented by Yanni Li. Various Components of Search Engine.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval CSE 8337 Spring 2007 Introduction/Overview Some Material for these slides obtained from: Modern Information Retrieval by Ricardo.
Information Retrieval
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Augmenting (personal) IR Readings Review Evaluation Papers returned & discussed Papers and Projects checkin time.
Chapter. 3: Retrieval Evaluation 1/2/2016Dr. Almetwally Mostafa 1.
Relevance Feedback Hongning Wang
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Searching the Web for academic information Ruth Stubbings.
Information Retrieval in Practice
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Visual Information Retrieval
Search Engine Architecture
Information Retrieval (in Practice)
Chapter Five Web Search Engines
Information Retrieval and Web Search
Search Engine Architecture
Information Retrieval and Web Search
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
WIRED Week 2 Syllabus Update Readings Overview.
Thanks to Bill Arms, Marti Hearst
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
Search Engine Architecture
Internet Basics and Information Literacy
Information Retrieval and Web Design
Presentation transcript:

Search Engine Architecture Hongning Wang CS@UVa

Classical search engine architecture “The Anatomy of a Large-Scale Hypertextual Web Search Engine” - Sergey Brin and Lawrence Page, Computer networks and ISDN systems 30.1 (1998): 107-117. Crawler and indexer Citation count: 12197 (as of Aug 27, 2014) Citation count: 13727 (as of Aug 30, 2015) Query parser Document Analyzer Ranking model CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval User input Result display Result post-processing Query parser Ranking model Domain specific database Crawler & Indexer Document analyzer & auxiliary database CS@UVa CS4501: Information Retrieval

Abstraction of search engine architecture Indexed corpus Crawler Research attention Ranking procedure Evaluation Feedback Doc Representation Doc Analyzer Query Rep (Query) User Ranker Index results Indexer CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Core IR concepts Information need “an individual or group's desire to locate and obtain information to satisfy a conscious or unconscious need” – wiki An IR system is to satisfy users’ information need Query A designed representation of users’ information need In natural language, or some managed form CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Core IR concepts Document A representation of information that potentially satisfies users’ information need Text, image, video, audio, and etc. Relevance Relatedness between documents and users’ information need Multiple perspectives: topical, semantic, temporal, spatial, and etc. One sentence about IR - “rank documents by their relevance to the information need” CS@UVa CS4501: Information Retrieval

Key components in a search engine Web crawler A automatic program that systematically browses the web for the purpose of Web content indexing and updating Document analyzer & indexer Manage the crawled web content and provide efficient access of web documents CS@UVa CS4501: Information Retrieval

Key components in a search engine Query parser Compile user-input keyword queries into managed system representation Ranking model Sort candidate documents according to it relevance to the given query Result display Present the retrieved results to users for satisfying their information need CS@UVa CS4501: Information Retrieval

Key components in a search engine Retrieval evaluation Assess the quality of the return results Relevance feedback Propagate the quality judgment back to the system for search result refinement CS@UVa CS4501: Information Retrieval

Key components in a search engine Search query logs Record users’ interaction history with search engine User modeling Understand users’ longitudinal information need Assess users’ satisfaction towards search engine output CS@UVa CS4501: Information Retrieval

Discussion: Browsing v.s. Querying Browsing – what Yahoo did before The system organizes information with structures, and a user navigates into relevant information by following a path enabled by the structures Works well when the user wants to explore information or doesn’t know what keywords to use, or can’t conveniently enter a query (e.g., with a smartphone) Querying – what Google does A user enters a (keyword) query, and the system returns a set of relevant documents Works well when the user knows exactly what query to use for expressing her information need CS@UVa CS4501: Information Retrieval

Pull vs. Push in Information Retrieval Pull mode – with query Users take initiative and “pull” relevant information out from a retrieval system Works well when a user has an ad hoc information need Push mode – without query Systems take initiative and “push” relevant information to users Works well when a user has a stable information need or the system has good knowledge about a user’s need CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval What you should know Basic workflow and components in a IR system Core concepts in IR Browsing v.s. querying Pull v.s. push of information CS@UVa CS4501: Information Retrieval

CS4501: Information Retrieval Today’s reading Introduction to Information Retrieval Chapter 19: Web search basics CS@UVa CS4501: Information Retrieval