Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Chapter 5: Introduction to Information Retrieval
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Information Retrieval in Practice
Search Engines and Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
2/25/2004 The Google Cluster Architecture February 25, 2004.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
Information Retrieval in Practice
Efficient Search in Large Textual Collections with Redundancy Jiangong Zhang and Torsten Suel Review by Newton Alex
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Internet basics, Browsers, application, advantages and disadvantages, architecture, WWW, URL, HTML Week 10 Mr. Mohammed Rahmath.
Web Search Engines and Information Retrieval on the World-Wide Web Torsten Suel CIS Department Overview: introduction.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Search Engines and Information Retrieval Chapter 1.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Graph-based Algorithms in Large Scale Information Retrieval Fatemeh Kaveh-Yazdy Computer Engineering Department School of Electrical and Computer Engineering.
Ihr Logo Chapter 7 Web Content Mining DSCI 4520/5240 Dr. Nick Evangelopoulos Xxxxxxxx.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Search - on the Web and Locally Related directly to Web Search Engines: Part 1 and Part 2. IEEE Computer. June & August 2006.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Chapter 6: Information Retrieval and Web Search
Module 10 Administering and Configuring SharePoint Search.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Text Based Information Retrieval Text Based Information Retrieval H02C8A H02C8B Marie-Francine Moens Karl Gyllstrom Katholieke Universiteit Leuven.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Curtis Spencer Ezra Burgoyne An Internet Forum Index.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
IT-522: Web Databases And Information Retrieval By Dr. Syed Noman Hasany.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Search Tools and Search Engines Searching for Information and common found internet file types.
Information Architecture Week 1. Information Architecture CALENDAR.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
- University of North Texas - DSCI 5240 Fall Graduate Presentation - Option A Slides Modified From 2008 Jones and Bartlett Publishers, Inc. Version.
IA Tools to Inform IA Summit 2003 Madonnalisa G. Chan.
Web Search Architecture & The Deep Web
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
Why You Should Optimize Your Website Content. Optimizing a website's content, in order to obtain a high search engine ranking is what Search Engine Optimization.
Lecture-6 Bscshelp.com. Todays Lecture  Which Kinds of Applications Are Targeted?  Business intelligence  Search engines.
Data mining in web applications
Search Engine Optimization
Information Retrieval in Practice
Information Storage and Retrieval Fall Lecture 1: Introduction and History.
Statistics Visualizer for Crawler
Search Engine Architecture
Map Reduce.
Web Mining Ref:
MID-SEM REVIEW.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Data Warehousing and Data Mining
CS 456 Interactive Software.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Data Mining Chapter 6 Search Engines
CSE 635 Multimedia Information Retrieval
Web Mining Department of Computer Science and Engg.
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Information Retrieval and Web Design
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

Internet Information Retrieval Sun Wu

Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search engines –How they work –How to design and develop a large scale internet search engine –Research issues in internet search engines

Outline Basic Introduction Data Crawling Data Preprocessing and Mining Index and Search Kernel (Full Text Retrieval System) User Interface and Query Processing Service Maintenance and Management

Basic Introduction Basic introduction: –Introduction to WWW and history of search engines –Search Engine Architecture –Classification of search engines and applications –Evaluations of search engines –Search engine market and SEO (Search Engine Optimization)

Data Crawling A web search engine needs to crawl large scale of web data (billions of web pages/objects) efficiently! Techniques and design issues –Architecture of distributed crawling system –Optimization of crawling efficiency –Focused crawling –Crawling quality optimization: Url job queue management, selection, filteration, and scheduling. Spam and porn data detection –Incremental Crawling

Data Preprocessing and Mining Before we do the indexing, a lot of data preprocessing and mining tasks have to be done. The goal of the data preprocessing and mining is to –optimize the data quality and transform it into a form suitable for indexing –Extract valuable information that is useful for search engine service Spam detection and Data filteration –Some spam data can not be caught in crawling phase, so we have to detect them after crawling.

Data Preprocessing and Mining Data partition: –language partition –Url partition –Data type partition Redundancy Removal: –Cross site redundancy removal, –In-site redundancy removal Link Analysis to find relationship between web content and assign ranking scores for web sites/pages.

Index and Search Kernel Full Text Retrieval System is needed. Hashing and Inverted index is the basic tech. Inverted Index architecture and techniques Index performance optimization Search Kernel query processing optimization Ranking techniques Distributed indexing and searching –Horizontal partition –Vertical partition

User Interface and Query Processing Process user’s query based on the search kernel Design issue for user friendliness SERP (Search Engine Result Page) design Integrated and Classified search results Query Session management and Interactive search –Error correction –Recommended and related searches –User ranking feedback Personalization –Locality tuning –Personalized search ranking adjustment –Search result management

Service Maintenance and Management Search Engine Service Management –Bot detection and service security –Fault tolerant and load balancing –Log analysis –Performance optimization and Query Caching

Prerequest Solid background in data structures Basic Web programming experience

Text Books No text books are used Students are encouraged to use search engines to search for related information, articles, papers, guides, software, in the web.

Course Requirement No examinations Approximately two project assignments. A solid document report plus an oral presentation is required.