Presentation is loading. Please wait.

Presentation is loading. Please wait.

Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied.

Similar presentations


Presentation on theme: "Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied."— Presentation transcript:

1 Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied Computing 2006

2 Motivation Link-based ranking algorithms are important to current popular search engines. (e.g., HITS for Teoma) Link farms will deteriorate the performance of link-based ranking algorithms

3 HITS algorithm Each page has two measures, authority score a shows how good this page is for a query, hub score h shows the possibility that the page points to good authority pages. E is the adjacency matrix. a = E T h h = E a

4 Example: for query “weather” http://www.tripadvisor.com/ http://www.virtualtourist.com/ http://www.abed.com/memoryfoam.html http://www.abed.com/furniture.html http://www.rental-car.us/ http://www.accommodation-specials.com/ http://www.lasikeyesurgery.com/ http://www.lasikeyesurgery.com/lasik-surgery.asp http://mortgage-rate-refinancing.com/ http://mortgage-rate-refinancing.com/mortgage- calculator.html

5 Factors that degrade HITS Mutually reinforcing relationships Duplicate pages Link farms

6 Complete hyperlink Definition:  The link with its anchor text as a unit. Duplication of a complete link is a much stronger sign of copying behavior on the Web than a duplicate link target.

7 Document - Complete link Matrix

8 Bipartite Graph Two disjoint sets X and Y, each edge starts from an element in X and ends with an element in Y.

9 Link farms Link farms are usually densely connected via multiple overlapping small bipartite cores. Task: to detect densely connected bipartite components from “document - complete link” matrix

10 Algorithm for finding bipartite components

11 Result: k=2 and l=2

12 Adjustment: document-document matrix

13 Final matrix

14 Weighted adjacency matrix

15 Experiment: HITS result of “rental car” http://www.discountcars.net/ http://www.motel-discounts.com/ http://www.stlouishoteldeals.com/ http://www.richmondhoteldeals.com/ http://www.jacksonvillehoteldeals.com/ http://www.jacksonhoteldeals.com/ http://www.keywesthoteldeals.com/ http://www.austinhoteldeals.com/ http://www.gatlinburghoteldeals.com/ http://www.ashevillehoteldeals.com/

16 Experiment: B&H HITS result of “rental car” http://www.rentadeal.com/ http://www.allaboutstlouis.com/ http://www.allaboutboston.com/ https://travel2.securesites.com/ about_travelguides/addlisting.html http://www.allaboutsanfranciscoca.com/ http://www.allaboutwashingtondc.com/ http://www.allaboutalbuquerque.com/ http://www.allabout-losangeles.com/ http://www.allabout-denver.com/ http://www.allabout-chicago.com/

17 Experiment: CL-HITS result of “rental car” http://www.hertz.com/ http://www.avis.com/ http://www.nationalcar.com/ http://www.thrifty.com/ http://www.dollar.com/ http://www.alamo.com/ http://www.budget.com/ http://www.enterprise.com/ http://www.budgetrentacar.com/ http://www.europcar.com/

18 Experiment: B&H HITS result of “translation online” http://www.no-gambling.com/ http://www.teleorg.org/ http://ong.altervista.org/ http://bx.b0x.com/ http://video-poker.batcave.net/ http://www.websamba.com/marketing-campaigns http://online-casino.o-f.com/ http://caribbean-poker.webxis.com/ http://roulette.zomi.net/ http://teleservices.netfirms.com/

19 Experiment: CL-HITS result of “translation online” http://www.freetranslation.com/ http://www.systransoft.com/ http://babelfish.altavista.com/ http://www.yourdictionary.com/ http://dictionaries.travlang.com/ http://www.google.com/ http://www.foreignword.com/ http://www.babylon.com/ http://www.worldlingo.com/products_services /worldlingo_translator.html http://www.allwords.com/

20 Duplicate example: BH-HITS result of “maps” http://www.maps.com/ http://www.mapsworldwide.com/ http://www.cartographic.com/ http://www.amaps.com/ http://www.cdmaps.com/ http://www.ewpnet.com/maps.htm http://mapsguidesandmore.com/ http://www.njdiningguide.com/maps.html http://www.stanfords.co.uk/ http://www.delorme.com/

21 Duplicate example: CL-HITS result of “maps” http://www.maps.com/ http://maps.yahoo.com/ http://www.delorme.com/ http://tiger.census.gov/ http://www.davidrumsey.com/ http://memory.loc.gov/ammem/gmdhtml/gmdhome.html http://www.esri.com/ http://www.maptech.com/ http://www.streetmap.co.uk/ http://www.libs.uga.edu/darchive/hargrett/maps/maps.html

22 User evaluation CategoryHITSBHITSCL-HITSCL-POP Quite relevant12.9%24.5%48.4%46.3% Relevant10.7%18.3%28.8%26.2% Not sure6.6%10.5%6.7%6.4% Irrelevant26.8%14.8%11.3%12.7% Totally irrelevant42.8%31.9%4.6%8.1%

23 Discussion Using link alone, the precision at 10 is 66.4%. Much lower than using “complete link”. Random anchor texts.

24 Questions? baw4@cse.lehigh.edu davison@cse.lehigh.edu


Download ppt "Undue Influence: Eliminating the Impact of Link Plagiarism on Web Search Rankings Baoning Wu and Brian D. Davison Lehigh University Symposium on Applied."

Similar presentations


Ads by Google