Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths.

Similar presentations


Presentation on theme: "Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths."— Presentation transcript:

1 Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths and Reality

2 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 2 Motivation ● Internet search engines (e.g. Google) drive users to highly ranked pages ● Search engines ranking results greatly influence how people acquire knowledge from the Internet [Pan ‘07] ● It is desirable to understand how a search engine ranks web pages ● Search engines’ ranking algorithms are proprietary ■ Publicly available information is very limited and out- dated

3 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 3 Current Approaches ● Guess-works by webmasters ■ Trial and error ■ Inefficient ● Based on experience of search engine optimization (SEO) experts Lack of systematical studies leads to folklores

4 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 4 Various Ranking Feature Opinions SEO experts Survey of Internet users Individual Internet marketing expert

5 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 5 Goals & Challenges ● Goals ■ Systematically approximate a search engine’s ranking results ■ Identify the importance of ranking factors ● Reverse-engineering a search engines’ ranking algorithms can be very complicated ■ Numerous ranking factors − Google claims to have over 200 ranking factors ■ Sophisticated ranking functions

6 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 6 Our Approach ● Build our own ranking system to approximate search engines’ ranking results Learning models: Linear programming SVM Recursive partitioning algorithm: Capture non-equational behavior of ranking functions. New ranking system: Generate our own ranking results and compare to Google’s

7 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 7 System Architecture ● Components of our ranking system ■ Crawler ■ Ranking Engine Can we approximate Google’s ranking results (top 10 pages) by using our own ranking system?

8 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 8 Ranking Features

9 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 9 Learning Models ● Linear programming model ■ Minimize the distance between our ranking system and Google’s ■ Minimize objective function ● Support vector machine (SVM) learning models ■ General technique for learning to rank programs ■ Support linear and polynomial kernels Weight: highly ranked pages are more important Ranking difference between the 2 pages Decision function: Out of order => penalty Decision function: Out of order => penalty Sum up the penalties

10 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 10 Recursive Partitioning Algorithm ● Multiple layers of indices ● Non-equational ranking algorithm While we need to partition the set of |S| pages Partition the |S| pages into top half and bottom half Return top half of the |S| pages and continue the recursion The algorithm ends when we found top X pages Train or apply ranking models to the set of |S| pages

11 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 11 Experimental Evaluation ● Evaluate different ranking models ■ Which model has better prediction accuracy? ● Evaluate the effectiveness of recursive partitioning algorithm ■ Can recursive partitioning algorithm improve prediction accuracy? ● Evaluate the relative weights of ranking features ■ Which ranking feature is more important?

12 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 12 Experimental Setup ● Crawl top 100 pages of 60 random keywords ● Randomly select 15 keywords as the training set with the rest 45 keywords as the testing set ● Evaluate the accuracy of our ranking system by predicting Google’s top 10 pages for each keyword in the testing set

13 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 13 Comparisons of Ranking Models The performance of our customized linear learning is better than SVM-linear model The performance of the polynomial model is better than both linear models. At the cost of: (1)Significant increase of learning time (2)No human readable equations The performance of the polynomial model is better than both linear models. At the cost of: (1)Significant increase of learning time (2)No human readable equations For 78% of the explored keywords, our ranking system successfully predicts 7 or more pages within the top 10 pages

14 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 14 The Power of Recursive Partitioning The recursive partitioning algorithm does help to improve accuracy of the ranking system in every round 3 rounds of recursive partitioning successfully “smooth out” the non-linearity of Google ranking algorithm and achieve a high prediction accuracy

15 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 15 Weights in Different Rounds in a Linear Model In different rounds, the learning model produces different set of weights Page rank score, keyword in title and hostname are the top 3 ranking feature Keyword in meta-description tag matters but in meta- keyword tag does not

16 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 16 Case Studies ● Can we improve our ranking system’s accuracy by isolating a subset of ranking features ■ Example: remove the age factor by focusing on “young” pages ● Can we use our ranking system to detect biases in search engines’ ranking algorithms? ■ Example: blogs ● Can we validate or disapprove new ranking features? ■ Example: HTML syntax errors

17 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 17 Isolating Subsets of Ranking Features We crawl web pages less or equal to 24 hours old to remove ranking features of age and page rank Our ranking system’s hit rate improves to 80% for 92% of evaluated keywords When the ranking features are more specific, our ranking system performs better

18 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 18 Negative Bias Toward Blogs We categorized web pages to different categories (e.g. blogs, news and music) and add a new ranking feature (hypothesis) into our ranking system The accuracy of our ranking system improves and the weight of the new ranking feature (blog) is negative

19 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 19 HTML Syntax Errors do not Matter We add a new ranking feature (hypothesis) for the number of HTML syntax errors in each web page The performance of the new ranking model is very close to the original one -> the new ranking feature does not make an impact

20 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 20 Conclusions ● In this work, we show that it is possible to systematically approximate Google’s ranking results with high accuracy ■ By a linear learning model incorporated with a recursive partitioning scheme ● We reveal the relative importance of ranking features in Google’s ranking function ● We illustrate our system can validate or disapprove ranking features and detect ranking bias

21 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 21 Thank you!

22 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality 22 Backup Slides

23 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality Linear Programming Model

24 Ao-Jan SuHow to Improve Your Google Ranking: Myths and Reality Query Keywords


Download ppt "Ao-Jan Su † Y. Charlie Hu ‡ Aleksandar Kuzmanovic † Cheng-Kok Koh ‡ † Northwestern University ‡ Purdue University How to Improve Your Google Ranking: Myths."

Similar presentations


Ads by Google