Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adversarial Information System Tanay Tandon Web Enhanced Information Management April 5th, 2011.

Similar presentations


Presentation on theme: "Adversarial Information System Tanay Tandon Web Enhanced Information Management April 5th, 2011."— Presentation transcript:

1 Adversarial Information System Tanay Tandon Web Enhanced Information Management April 5th, 2011

2 Agenda What is Adversarial Information Retrieval? What is the goal of Adversarial Information Retrieval? Issues with First generation engines Resolution Future Improvements

3 What is Adversarial IR? Gathering, Indexing, Retrieving and Ranking Information. Subset of the information has been manipulated maliciously. Financial Gain.

4 What is the Goal of Adversarial IR? Detect bad/unauthentic sites. Improve precision on search engines by eliminating such unauthentic sites.

5 Issues with First Generation Engines What is Term Frequency? First generation engines relied heavily on "term frequency" to determine the page rank. Increase the page rank by repeating the same word over and over again.

6 Search Engine Spamming Link Spam Link-bombing Spam Blogs Comment Spam Keyword Spam Malicious Tagging

7 Google Bombing

8 Trust Rank Observation Good Pages tend to link good pages. Algorithm -- Select a small subset of pages and let a human classify them -- Propagate the goodness of pages.

9 Propagation Trust function T -- T(p) returns the probability that page p is good page. Initial values -- T(p) = 1, if p was found to be a good page. -- T(p) = 0, if p was found to be a spam page. Iterations: -- Propagate trust following out-links. -- only a fixed number of iterations M.

10 Propagation Issues and Resolution Problems with propagation – Pages reachable from good seeds might not be good. – the further away we are from good seed pages, the less certain we are that a page is good. Solution -- Reduce trust as we move further away from the good seed pages (trust attenuation).

11 Hubs and Authorities A hub is good if it belongs to good authority. An authority is good if good hubs point to it. Weights given to pages must keep track of the authenticity.

12 Further Improvements Seed Weighting Instead of assigning equal weights to each seed assign a weight proportional to its quality/importance. Seed Filtering Filtering out low quality pages that may exist in topic directories. Finer Topics Lower Layer of Topic Directories


Download ppt "Adversarial Information System Tanay Tandon Web Enhanced Information Management April 5th, 2011."

Similar presentations


Ads by Google