Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 2: How Search Engines Work. Chapter Objectives Describe the PageRank formula for calculating a webpage’s popularity. Determine how a search engine.

Similar presentations


Presentation on theme: "Chapter 2: How Search Engines Work. Chapter Objectives Describe the PageRank formula for calculating a webpage’s popularity. Determine how a search engine."— Presentation transcript:

1 Chapter 2: How Search Engines Work

2 Chapter Objectives Describe the PageRank formula for calculating a webpage’s popularity. Determine how a search engine would calculate the relevance of a webpage to a keyword. Describe the kinds of websites that were rewarded and penalized by the Google Panda and Google Penguin updates.

3 Yahoo Lists

4 Larry Page American computer scientist and internet entrepreneur who co-founded Google Inc. with Sergey Brin,computer scientistinternet entrepreneurGoogle Inc.Sergey Brin CEO of Google's parent company, Alphabet Inc. After stepping aside as CEO in August 2001 in favour of Eric Schmidt, Page re-assumed the role in April 2011. He announced his intention to step aside a second time in July 2015 to become CEO of Alphabet, under which Google's assets would be reorganized. Under Page, Alphabet is seeking to deliver major advancements in a variety of industries. [3] Alphabet IncEric Schmidt [3] Page is the inventor of PageRank, Google's best-known search ranking algorithmPageRankalgorithm Google makes up almost 70% of search engine market share.

5 Search Engine Parts From Google’s white paper Indexer-Barrels-Sorter portion is key Pagerank no longer used, but this structure is still relatively accurate Black-Hat search engine optimization attempts to artificially inflate a page’s ranking

6 Crawling Crawling=browses World Wide Web typically for web indexing Find new and updated web content – URL Server tracks pages – Crawler explores all links to find new pages (no need to submit as it happens automatically) URL Server must prioritize crawling – Crawlers are fast, but with limits (usually once/week) – Frequently updated content will be crawled more often (news sites) – Can be problematic

7 Caching HTML code of webpage sent to repository – Google has cached copy of entire world wide web – Cache = temporary storage (In google storage so if website is down, Google knows what is there or was there as a snapshot)

8 Indexing Recodes each web page as a “hit list” – A “hit” is a word occurrence (not to be confused with a web hit, when someone views a web page) – Each page indexed as a series of words docID:2058795wordID:21548nhits: 5hit1hit2hit3hit4hit5 wordID:18975nhits: 5hit1hit2hit3hit4hit5 wordID:87916nhits: 3hit1hit2hit3... wordID: 48985nhits: 1hit1 Cap: 0, font: 3, position: 173

9 Storing Hit Lists Partially sorts hits – docID sent to barrel corresponding to wordID – Some duplication of docID’s – Prepares docID’s for re-sorting by wordID

10 Sorting Hit lists sorted by docID are not searchable – Must sort by wordID – Search engine results must find all docIDs that use the searched-for word wordid:21548docID:2058795nhits:5hit1hit2hit3hit4hit5 docID:4856187nhits:2hit1hit2 docID:4894872nhits:6hit1hit2hit3hit4hit5hit6... docID:12487561nhits:4hit1hit2hit3hit4 wordid:18975docID:2058795nhits:5hit1hit2hit3hit4hit5... docID:14879531 nhits:3hit1hit2hit3

11 Analyzing Links Links used for multiple purposes – Crawling – Creating list of webpages (docIDs) – Calculating relevance – Calculating PageRank No longer used Many link metrics still used

12 Searching on Google Searcher types “metamorphosis” into Google – All docIDs containing wordID 21548 found – Relevance score for each docID calculated – PageRank of each webpage (docID) found – Relevance and PageRank combined to determine final rankings

13 Calculating Relevance Hit TypeType Weight URL100 Anchor Text90 Title Tag100 Plain text large font60 Plain text medium font30 Plain text small font10 Note: When looking just at Relevance, some sites with little useful content can earn good rankings if set up properly.

14 Calculating Relevance – http://en.wikipedia.org/wiki/Metamorphosis http://en.wikipedia.org/wiki/Metamorphosis Hit TypeType WeightNo. of Hits URL1001 Anchor Text9052 Title Tag1001 Plain text large font 601 Plain text medium font 307 Plain text small font 1037 100*1 + 90*52 + 100*1 + 60*1 + 30*7 + 10*37 = 5520

15 Calculating Relevance – http://en.wikipedia.org/wiki/The_Metamorphosis http://en.wikipedia.org/wiki/The_Metamorphosis Hit TypeType WeightNo. of Hits URL1001 Anchor Text9036 Title Tag1001 Plain text large font 601 Plain text medium font 302 Plain text small font 1025 100*1 + 90*36 + 100*1 + 60*1 + 30*2 + 10*25 = 3810

16 Count-Weights To inflate score, a webmaster could repeat “metamorphosis” 100 times at the bottom of the page (in white font to make it invisible to users— keyword stuffing) Count-weights prevent high scores from repeated use CountHit 1Hit 2Hit 3Hit 4Hit 5Hit 6Hit 7Hit 8Hit 9+ Weight11.9.7.45.2.05.010 Count-Weight Adjusted Relevance Score Metamorphosis820 The Metamorphosis751

17 Multi-Word Searches butterfly metamorphosis – “butterfly” – “metamorphosis” – “butterfly metamorphosis” Much easier to earn good rankings for multiple- word searches

18 Perform a Google Search Examine top 3 organic results – Analyze usage of the words you searched in each webpage (relevance) – Analyze PageRank of each webpage using http://ahrefs.com or http://www.opensiteexplorer.org http://ahrefs.com http://www.opensiteexplorer.org – https://serps.com/tools/rank-checker/ https://serps.com/tools/rank-checker/ – Determine what actions the #3 ranked site should take to become ranked #1


Download ppt "Chapter 2: How Search Engines Work. Chapter Objectives Describe the PageRank formula for calculating a webpage’s popularity. Determine how a search engine."

Similar presentations


Ads by Google