Presentation is loading. Please wait.

Presentation is loading. Please wait.

S. Lawrence and C.L. Giles Presented by Robert Cadwgan-Evans, Simon Munday Searching the World Wide Web.

Similar presentations


Presentation on theme: "S. Lawrence and C.L. Giles Presented by Robert Cadwgan-Evans, Simon Munday Searching the World Wide Web."— Presentation transcript:

1 S. Lawrence and C.L. Giles Presented by Robert Cadwgan-Evans, Simon Munday Searching the World Wide Web

2 Introduction Analyse the paper –Coverage of search engines –Size of the Indexable Web Consider search and Internet development from 1998-today The future of searching

3 Paper Outline Published April 1998, data collected in 1997 Investigates the comparative coverage of the internet by major search engines of the time Attempts to put a figure on the size of the web Important as provide a way to measure the size of the web

4 Search Engine Coverage: The Test Coverage: Percentage of the unique list that an individual engine returns in its queries HotBotNorthern LightExciteInfoseekLycosAltaVista Results 575 Queries List of unique results from all queries

5 Search Engine Coverage: Results Results of search engine coverage using this test: Search EngineCoverage (%) HotBot57.5 AltaVista46.5 Northern Light32.9 Excite23.1 Infoseek16.5 Lycos4.41 Even the most successful of the engines, HotBot, doesnt manage to cover two thirds of the result set from all engines

6 Size of the Indexable Web: Method Estimated on the analysis of the overlap between search engines N Set of indexable web pages N a Set of results returned by search engine A N b Set of results returned by search engine B N 0 Set of results returned by A and B, the overlap An estimate of the fraction of the indexable web covered by an engine a can be calculated: P a = N 0 / N b From this fraction an estimate for the overall size of the indexable web, N, can be calculated N = S a / P a

7 Little overlap shows ignorance of search engines as lots of results are missing therefore not much of the web is covered Size of the Indexable Web: Examples Big overlap shows the sets are almost complete therefore must contain most of the web Works on the assumption of randomness and independence

8 Size of the Indexable Web: Results Comparison between pairs of search engines Search EnginesIndexable Web (millions of pages) Lycos and Infoseek90 Infoseek and Excite220 Excite and Northern Light230 Northern Light and Altavista230 Altavista and HotBot320 Paper selects the largest of these, 320million pages, as an estimate for the size of the indexable web

9 Paper Summary Paper admits the size is an estimate, the actual figure is probably larger Query terms based upon scientists searching habits, not general public This estimate suggests that previous estimates of as little as 75 million pages are incorrect

10 Current Technology Newcomers: Google, Yahoo, MSN and Ask Jevees Size of the web has exploded in the last 5 years [1]

11 Size of the Web Today Up-to-date and accurate measurement is difficult. But, current figures put the size of the web around 11.5billion pages [2] Currently indexed 9.4 billion pages [2] Google indexes 8 billion pages, but also takes searching further, indexing 880million images [3] Does a bigger index mean better quality results? Larger index could hamper performance [4]

12 Specialized Search Engines With such big search engines providing general results more specialized search engines have resulted:

13 The Future The Deep Web – refers to databases from which dynamic pages are created from Over 200,000 deep websites exist [5] Examples include eBay and Amazon Deep Web is 400 to 550 times larger than the surface web [5]

14 Conclusion Estimating the size of the web is difficult and as of yet not possible Paper does a good job of showing previous estimates are far too low (even if it's own is low) The inclusion of deep web will only make the problem harder

15 References 1. Search Engine Sizes, D. Sullivan, January 2005, 2. The Indexable Web is More than 11.5 Billion Pages, A. Gulli and A. Sigorini, 2005, 3. Google Product Descriptions, 4. Accessibility of Information on the Web, S. Lawrence and C. Giles, Nature, 400: , The Deep Web: Surfacing Hidden Value, Michael K. Bergman, 2001,


Download ppt "S. Lawrence and C.L. Giles Presented by Robert Cadwgan-Evans, Simon Munday Searching the World Wide Web."

Similar presentations


Ads by Google