Presentation is loading. Please wait.

Presentation is loading. Please wait.

February 17, 20111. 2 There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas and achievements,

Similar presentations


Presentation on theme: "February 17, 20111. 2 There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas and achievements,"— Presentation transcript:

1 February 17, 20111

2 2

3 There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas and achievements, to the creation, that is, of a complete planetary memory for all mankind. And not simply an index; the direct reproduction of the thing itself can be summoned to any properly prepared spot. … This in itself is a fact of tremendous significance. It foreshadows a real intellectual unification of our race. The whole human memory can be, and probably in a short time will be, made accessible to every individual. H. G. Wells (1937) February 17, 20113

4  One of the facilities or services provided by certain of the computers on the Internet  A logical network of web pages that need not be on physically connected computers February 17, 20114

5 5 http://www.ksg.harvard.edu/ http://www.president.harvard.edu/ http://www.news.harvard.edu/gazette/… http://www.harvard.edu http://www.brighamandwomens.org/PressReleases/… http://www.harvard.edu

6 February 17, 20116 Request “www.president.harvard.edu” Receive html code Your computer Harvard’s computer URL = Uniform Resource Locator The Internet

7 February 17, 20117 We know where you are!

8 8February 25, 2010

9 … search companies log your searches … February 17, 20119

10 February 22, 2010 10

11 February 17, 201111

12  Finding pages referring to the search terms  Deciding which pages are the most “relevant” February 17, 201112

13 1. Build an index ahead of time February 17, 201113 EddingtonURL, URL, … EdisonURL, URL, … EdmontonURL, URL, … 2.When queried, look up in the index

14  Google “crawls” the entire Web, following links and loading the pages they point to  Every time it retrieves a page, it  indexes everything on the page  maybe keep a “cached” copy of the page  A complete crawl probably takes a week or two  Opt-out  Caching and copyrights? February 17, 201114

15  Primary storage: Silicon memory chips  Up to a gigabit or more  Random-access: same time for any datum February 17, 201115

16 February 17, 201116

17  Seek delay  Rotational latency February 17, 201117

18  Primary: approaching 1 ns = 10 -9 sec  Secondary: seek time 5 ms = 5·10 -3 sec  Secondary is (5·10 -3 )/10 -9 = 5 million times slower  Imagine a bookshelf is primary memory and getting a book takes 10 sec  Getting book from secondary storage would take more than a year and a half February 17, 201118

19 February 17, 201119

20  Works only if  items are in order  same amount of time to access any item  Then it takes at most lg n steps to find an item in a table of length n.  E.g. n = 1 billion => lg n steps = 30 steps February 17, 201120

21 February 17, 201121 EddingtonURL, URL, … EdisonURL, URL, … EdmontonURL, URL, … Eddington Edison Edmonton Primary Memory Secondary Memory The LexiconThe Lists of Pages

22  Many, many tricks to compress both the index and the lists of URLs  Notes show how a lexicon with 25 million entries might fit in 16GB of primary storage  The lists of URLs might be vastly greater but OK as long as it takes only one disk access to get back a lot of URLs February 17, 201122

23  Hugely important commercially  Page rank is really a new kind of capital  People try to “spoof” ranking algorithms  Search engineers try to detect and discount spoofing  Endless game of cat and mouse … February 17, 201123

24 February 17, 201124 Probably wrong. Also easy to spoof

25 www.holdthisspear.co.ukThedailddoozy.comabout.com

26 February 22, 2010 26

27  Circular?  Not really. Can calculate a consistent meaning of “importance” where every page’s importance is the sum of the importance of the pages pointing to it  Like scholarly citations of scholarly papers February 17, 201127

28 February 17, 201128

29  Web surfing metric  If you wander the web at random, how likely are you to wind up at a given page?  Page A is more higher ranked than page B if you are more likely to wind up at A during a completely random meandering through the web February 17, 201129

30  Mission: “to organize the world's information and make it universally accessible and useful.”  Brin: “The perfect search engine would understand exactly what you mean and give back exactly what you want” February 25, 201030

31 February 25, 201031

32 February 25, 201032


Download ppt "February 17, 20111. 2 There is no practical obstacle whatever now to the creation of an efficient index to all human knowledge, ideas and achievements,"

Similar presentations


Ads by Google