Searching the World Wide Web

Slides:



Advertisements
Similar presentations
Downloading Textual Hidden-Web Content Through Keyword Queries
Advertisements

Welcome to TETC Fall 2001 Creating an Online Reference Page.
Searching The Internet Practical Strategies. URLs Look at the URL to determine what type of organization produced the site..com is a commercial site..edu.
The Structure of the Web Mark Levene (Follow the links to learn more!)
Mark Levene, An Introduction to Search Engines and Web Navigation © Pearson Education Limited 2005 Slide 1.1 Chapter 1 : Introduction The World-Wide-Web.
Getting Your Web Site Listed Danny Sullivan Editor, Search Engine Watch
How can I find the number of times a work has been cited by other authors?
Index Structures for Querying the Deep Web Jian Qiu, Feng Shao, Jayavel Shanmugasundaram Cornell Universersity Misha Zatsman Google.
Natural Language Processing WEB SEARCH ENGINES August, 2002.
1 LE 4000 ENGLISH FOR ACADEMIC PURPOSES STEP 2 Gathering academic information The Internet & Other academic sources.
Exploring the Deep Web Brunvand, Amy, Kate Holvoet, Peter Kraus, and David Morrison. "Exploring the Deep Web." PPT--Download University of Utah.
“The Computer as an Educational Tool: Productivity and Problem Solving” ©Richard C. Forcier and Don E. Descy.
Computer Information Technology – Section 3-2. The Internet Objectives: The Student will: 1. Understand Search Engines and how they work 2. Understand.
Estimation of the Number of Relevant Images in Infinite Databases Presented by: Xiaoling Wang Supervisor: Prof. Clement Leung.
1 Web Search and Web Search Overlap: What the Deal? Amanda Spink Queensland University of Technology.
Exploring the Academic Invisible Web Das wissenschaftliche Invisible Web erkunden Dr. Dirk Lewandowski Heinrich-Heine-Universität Düsseldorf, Information.
Purdue University Writing Lab Research and the Internet A workshop brought to you by the Purdue University Writing Lab.
James Tam Computer Searches Concepts covered What is a search engine and how do they work? General search tips The Big Six search engines Other search.
ISP 433/633 Week 7 Web IR. Web is a unique collection Largest repository of data Unedited Can be anything –Information type –Sources Changing –Growing.
1 Our Web Part 0: Overview COMP630L Topics in DB Systems: Managing Web Data Fall, 2007 Dr Wilfred Ng.
Searching the World Wide Web From Greenlaw/Hepp, In-line/On-line: Fundamentals of the Internet and the World Wide Web 1 Introduction Directories, Search.
1 Uniform Sampling from the Web via Random Walks Ziv Bar-Yossef Alexander Berg Steve Chien Jittat Fakcharoenphol Dror Weitz University of California at.
The Fragmented Web Notes on Chapter 12 For In765 Judith Molka-Danielsen.
Internet Research Search Engines & Subject Directories.
Donghui Xu Spring 2011, COMS E6125 Prof. Gail Kaiser.
How to Search Smart. Starter How do search engines work? Write down your thoughts on a mini whiteboard.
CS246 Web Characteristics. Junghoo "John" Cho (UCLA Computer Science)2 Web Characteristics What is the Web like? Any questions on some of the characteristics.
Accessing the Deep Web Bin He IBM Almaden Research Center in San Jose, CA Mitesh Patel Microsoft Corporation Zhen Zhang computer science at the University.
Search Engine Interfaces search engine modus operandi.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
CSCI-235 Micro-Computer in Science Internet Search.
Search Yahoo! With Boolean Operators AND, OR, (), “”, NOT, Domain:
Search Engine Comparisons By: Thomie Ventura. Search Engines Today, much, but not all, of the work we do revolves around the web Today, much, but not.
HOW BIG IS THE INTERNET? As of 2005, Internet size is estimated at 5 million terabytes: 5.
Measuring the Size of the Web Dongwon Lee, Ph.D. IST 501, Fall 2014 Penn State.
World Wide Web. Browser Use browser to access the web –Internet Explorer (Microsoft) –Firefox (Mozilla) On all PCs Requires internet connection Provides.
The Internet Do you really know what is out there?
GCSE ICT Year 9 Project 1a Collecting Information.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Implementation of Meta-Search Engine by: Antony pranata
Chapter 8  Government and Universities over 30 years  Who’s connected today? ◦ Individuals ◦ Educational institutions ◦ Government ◦ Research ◦ Medical.
Search Pages and Results LIS 385E: Information Architecture and Design By: Alex Chung
Research Paper NE 201 Honora Eskridge NCSU Libraries September 27, 2006.
Meet the web: First impressions How big is the web and how do you measure it? How many people use the web? How many use search engines? What is the shape.
Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Research and the Internet A workshop brought to you by the Purdue University Online Writing Lab.
The World's Largest computer Network. The World Wide Web In 1989, Tim Berners-Lee, an Oxford-trained computer scientist, had an idea for a "global hypertext.
G042 - Lecture 09 Commencing Task A Mr C Johnston ICT Teacher
Internet Power Searching Finding Pearls in a Zillion Grains of Sand.
Internet Power Searching: Finding Pearls in a Zillion Grains of Sand By Daniel Arze.
Internet Power Searching Finding Pearls in a Zillion Grains of Sand By Amelia Kassel Found in “Technical Communication” on page 198.
Dinosaurs During this unit you will be briefly covering several different topics to give you a taste of ICT. The overall aim of this unit is to create.
Traffic Source Tell a Friend Send SMS Social Network Group chat Banners Advertisement.
Coventry High School Research and the Internet. Coventry High School Research and the Internet The Internet can be a great tool for research, but finding.
CS 115: COMPUTING FOR THE SOCIO-TECHNO WEB FINDING INFORMATION WITH SEARCH ENGINES.
Understanding Search Engines
Using Search Tools on the Internet
Search Engines and Search techniques
Uniform Sampling from the Web via Random Walks
Search Engines & Subject Directories
Spreadsheets, Websites
CS246 Web Characteristics.
Characterization of Search Engine Caches
Search Engines & Subject Directories
Search Engines & Subject Directories
All About the Internet.
How to Search Smart.
Presentation transcript:

Searching the World Wide Web S. Lawrence and C.L. Giles Presented by Robert Cadwgan-Evans, Simon Munday

Introduction Analyse the paper Coverage of search engines Size of the Indexable Web Consider search and Internet development from 1998-today The future of searching

Paper Outline Published April 1998, data collected in 1997 Investigates the comparative coverage of the internet by major search engines of the time Attempts to put a figure on the size of the web Important as provide a way to measure the size of the web Introduction Slide: What is the paper about, and why we picked it...

Search Engine Coverage: The Test 575 Queries AltaVista Excite HotBot Infoseek Lycos Northern Light Results Results Results Results Results Results List of unique results from all queries Coverage: Percentage of the unique list that an individual engine returns in its queries

Search Engine Coverage: Results Results of search engine coverage using this test: Search Engine Coverage (%) HotBot 57.5 AltaVista 46.5 Northern Light 32.9 Excite 23.1 Infoseek 16.5 Lycos 4.41 Even the most successful of the engines, HotBot, doesn’t manage to cover two thirds of the result set from all engines

Size of the Indexable Web: Method Estimated on the analysis of the overlap between search engines N Set of indexable web pages Na Set of results returned by search engine A Nb Set of results returned by search engine B N0 Set of results returned by A and B, the overlap An estimate of the fraction of the indexable web covered by an engine a can be calculated: Pa = N0 / Nb From this fraction an estimate for the overall size of the indexable web, N, can be calculated N = Sa / Pa

Size of the Indexable Web: Examples Little overlap shows ignorance of search engines as lots of results are missing therefore not much of the web is covered Big overlap shows the sets are almost complete therefore must contain most of the web Works on the assumption of randomness and independence

Size of the Indexable Web: Results Comparison between pairs of search engines Search Engines Indexable Web (millions of pages) Lycos and Infoseek 90 Infoseek and Excite 220 Excite and Northern Light 230 Northern Light and Altavista Altavista and HotBot 320 Paper selects the largest of these, 320million pages, as an estimate for the size of the indexable web

Paper Summary Paper admits the size is an estimate, the actual figure is probably larger Query terms based upon scientists searching habits, not general public This estimate suggests that previous estimates of as little as 75 million pages are incorrect Results of the test from the paper

Current Technology Newcomers: Google, Yahoo, MSN and Ask Jevees Size of the web has exploded in the last 5 years [1] Dot com boom…

Size of the Web Today Up-to-date and accurate measurement is difficult. But, current figures put the size of the web around 11.5billion pages [2] Currently indexed 9.4 billion pages [2] Google indexes 8 billion pages, but also takes searching further, indexing 880million images [3] Does a bigger index mean better quality results? Larger index could hamper performance [4] Where we got the figures from

Specialized Search Engines With such big search engines providing general results more specialized search engines have resulted:

The Future The Deep Web – refers to databases from which dynamic pages are created from Over 200,000 deep websites exist [5] Examples include eBay and Amazon Deep Web is 400 to 550 times larger than the “surface web” [5]

Conclusion Estimating the size of the web is difficult and as of yet not possible Paper does a good job of showing previous estimates are far too low (even if it's own is low) The inclusion of deep web will only make the problem harder

References 1. Search Engine Sizes, D. Sullivan, January 2005, http://searchenginewatch.com/reports/article.php/2156481 2. The Indexable Web is More than 11.5 Billion Pages, A. Gulli and A. Sigorini, 2005, http://citeseer.ist.psu.edu/gulli05indexable.html 3. Google Product Descriptions, http://www.google.co.uk/press/descriptions.html 4. Accessibility of Information on the Web, S. Lawrence and C. Giles, Nature, 400:107--109, 1999 5. The Deep Web: Surfacing Hidden Value, Michael K. Bergman, 2001, http://beta.brightplanet.com/deepcontent/turtorials/DeepWeb/index.asp