Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Slides:



Advertisements
Similar presentations
Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS.
Advertisements

Hyper search ing the Web Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins Jacob Kalakal Joseph CS.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
ONLINE RESOURCES. QUESTION Do you ever go onto the Internet and plan to only spend a small amount of time looking for something and spend much longer.
ONLINE RESOURCES. QUESTION Do you ever go into the Internet and plan to only spend a small amount of time looking for something and spend much longer.
Mining Web’s Link Structure Sushanth Rai University of Texas at Arlington
WEB BASICS FOR CRITICAL THINKING. SEARCH ENGINES Use a variety of search engines: Google Yahoo! Dogpile AltaVista HotBot Lycos WebCrawler Bing.
Hypersearching the Web Hira Bashir - June 22, 2010 Soumen Chakarbarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan & Andrew Tomkins.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
Clarke, R. J (2001) t909-02: 1 Office Automation & Intranets BUSS 909 Tutorial 2 Researching on the WWW.
CSE 522 – Algorithmic and Economic Aspects of the Internet Instructors: Nicole Immorlica Mohammad Mahdian.
Search Engines and Subject Directories Selecting the Best Way to Find Information.
The PageRank Citation Ranking “Bringing Order to the Web”
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
Chapter 5 Searching for Truth: Locating Information on the WWW.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Seek and Ye shall Find COS 116: 2/21/2008 Sanjeev Arora The continuum of computer “intelligence”
SEARCH ENGINES By, CH.KRISHNA MANOJ(Y5CS021), 3/4 B.TECH, VRSEC. 8/7/20151.
Internet Research Search Engines & Subject Directories.
 Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found.  A search.
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
Seek and Ye shall Find COS 116, Spring 2010 Adam Finkelstein The continuum of computer “intelligence”
Net Search Engines The Which, Why and How Tim Landeck Handouts/PowerPoint available at:
Chapter 5 Searching for Truth: Locating Information on the WWW.
Using Hyperlink structure information for web search.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
1 University of Qom Information Retrieval Course Web Search (Link Analysis) Based on:
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
CSCI-235 Micro-Computer in Science Internet Search.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
Overview of Web Ranking Algorithms: HITS and PageRank
The Business Model of Google MBAA 609 R. Nakatsu.
Search Engines1 Searching the Web Web is vast. Information is scattered around and changing fast. Anyone can publish on the web. Two issues web users have.
Search Engines.
Hypersearching the Web, Chakrabarti, Soumen Presented By Ray Yamada.
Understanding Search Engines. Basic Defintions: Search Engine Search engines are information retrieval (IR) systems designed to help find specific information.
What is SEO? Why should you care?
Search Tools and Search Engines Searching for Information and common found internet file types.
Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.
1 1 COMP5331: Knowledge Discovery and Data Mining Acknowledgement: Slides modified based on the slides provided by Lawrence Page, Sergey Brin, Rajeev Motwani.
- Murtuza Shareef Authoritative Sources in a Hyperlinked Environment More specifically “Link Analysis” using HITS Algorithm.
Internet Power Searching: Finding Pearls in a Zillion Grains of Sand By Daniel Arze.
Week 1 Introduction to Search Engine Optimization.
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
1 Chapter 5 (3 rd ed) Your library is an excellent resource tool. Your library is an excellent resource tool.
WEB LITERACY K.Parra, M.A.. H OW MUCH DO YOU KNOW ABOUT INFORMATION LITERACY ? Take a short quiz.
Chapter Five Web Search Engines
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Search Engines & Subject Directories
Lecture 22 SVD, Eigenvector, and Web Search
CS 572 (Spring 2011) | Class Presentation | June 21, 2011
Search Before Google Computer Science 49S
Anatomy of a search engine
Data Mining Chapter 6 Search Engines
Searching for Truth: Locating Information on the WWW
Search Engines & Subject Directories
Search Engines & Subject Directories
All About the Internet.
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Lecture 22 SVD, Eigenvector, and Web Search
Lecture 22 SVD, Eigenvector, and Web Search
Searching the Internet
Presentation transcript:

Hyper-Searching the Web

Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Basic search engine Examples: AltaVista, InfoSeek, HotBot, Lycos, Excite, Google, etc Maintains an index for every word found Processes through crawling, indexing, and returning results

Basic search engine Different ranking systems used -most use heuristics (easiest solution) counts # of keywords that appear -Google uses PageRank

Basic search engine No idea of searcher’s intent so “best” result hard to achieve Problems with synonymy and polysemy ex. car and automobile ex. jaguar One solution: store semantic relations -only can help w/synonmy Can’t identify concepts/author intent ex. IBM site does not say “computer”

Cluster search engine Example: Clusty Clusters results into categories/themes Can show results that would be ranked lower in another search engine -due to different meanings in words, can show the less searched-for

Meta-search engine Examples: Dogpile, Surfwax, Copernic, etc Sends searcher’s query to a database of search engines Claimed to not be any better than database; often the referenced search engines are small, free, commercial Users can create their own on Google of up to 5,000 URLs as “database”

“Smarter” meta-search engine Example: Clever project (n/a online yet) Includes clustering and linguistic analysis “cat” Cat – feline Cat – power Cat – equipment Cat – scans etc.

The Clever Project Uses hyperlinks to locate hubs and authorities “a respected authority is a page that is referred to by many good hubs; a useful hub is a location that points to many valuable authorities”

The Clever Project Obtains a list of webpages from a standard index & follows hyperlinks to increase own database -resulting collection = “root set” -each page gets numerical hub & authority score

The Clever Project Similar to PageRank in determining method – guesses & constant calculations -useful by-product: clusters sites Adds to competition because competitors don’t have to acknowledge their competition through hyperlinks

Clever vs. Google GOOGLE - gives initial rankings - keeps pages indpt. of queries - faster - looks forward “link to link” CLEVER - root sets per keyword - page priority through query context - forwards & backwards “hub and authority” - sometimes too broad ex. Fallingwater