Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data set ranking in Semantic Search Engine to resist link SPAM By: Soheila DehghanZadeh WTLab Research Group Weekly Seminars August 2011 1http://wtlab.um.ac.ir.

Similar presentations


Presentation on theme: "Data set ranking in Semantic Search Engine to resist link SPAM By: Soheila DehghanZadeh WTLab Research Group Weekly Seminars August 2011 1http://wtlab.um.ac.ir."— Presentation transcript:

1 Data set ranking in Semantic Search Engine to resist link SPAM By: Soheila DehghanZadeh WTLab Research Group Weekly Seminars August 2011 1http://wtlab.um.ac.ir

2 Outline Introduction – Semantic Search – Spam – Problem! Web of data Vs. Web of Document Query Engine Vs. Search Engine Related Works – Sindice – SWSE Proposed Ranking Method Experimental Results. Evaluation… 2http://wtlab.um.ac.ir

3 Introduction Why Semantic search? – Web of data proliferation – Agent based Application – Information integration from heterogeneous sources Spammer appears to take advantage of widespread reputation of semantic web. Spammers techniques : content spam, link spam 3http://wtlab.um.ac.ir

4 What is the Problem…. Existing implemented Ranking methods in semantic search engines are vulnerable to link spam. We are purposed to improve these vulnerabilities. 4http://wtlab.um.ac.ir

5 Web of data Vs. Web of Document Web of document has missed – Link type – Information provenance. Semantic search Engine phases should use adapted algorithms to new data model. There is no fully integrated open source semantic search Engine. So we should join separated open source tools and implement other phases that yields a full semantic search Engine. 5http://wtlab.um.ac.ir

6 Link farm concept http://wtlab.um.ac.ir6 X N ZW Y T

7 Page rank http://wtlab.um.ac.ir7 J i1 i2 i3 L(i1)

8 Motivation Author copyright.

9 Related works Sindice 9http://wtlab.um.ac.ir

10 Related Work DING 10http://wtlab.um.ac.ir

11 Related Work SWSE 11http://wtlab.um.ac.ir

12 Proposed architecture http://wtlab.um.ac.ir12 Author copyright.

13 Proposed ranking method-1 13http://wtlab.um.ac.ir Author copyright.

14 Similar entities graph http://wtlab.um.ac.ir14 Author copyright.

15 Proposed Ranking Method-2 15http://wtlab.um.ac.ir Author copyright.

16 Proposed Ranking Method-3 Author copyright.

17 Experimental Results Spam injection scenario – T1: Content spam – T2: Simple link Farm – T3: Link farm with a new predicate – T4: Large link farm with new predicates. 17http://wtlab.um.ac.ir

18 T1: DING Vs. SWSE Vs. TRank DomainRank bio2rdf.org2101.379 129.128.185.1221194.286 dbpedia.org839.3002 www.drugbank.ca214.1324 en.wikipedia.org167.5204 www.rxlist.com139.2081 www.w3.org38.48582 www4.wiwiss.fu-berlin.de30.37194 www.uniprot.org22.80327 www.fake.org0.003301 DomainRank www.w3.org0.166664 www4.wiwiss.fu-berlin.de0.026189 dbpedia.org0.01611 www.uniprot.org0.01611 bio2rdf.org0.01611 129.128.185.1220.01611 www.drugbank.ca0.01611 en.wikipedia.org0.01611 www.rxlist.com0.01611 fake.org0.002372 DomainRank bio2rdf.org777.7737 129.128.185.122444.0772 dbpedia.org313.927 www.drugbank.ca82.33484 en.wikipedia.org64.83432 www.rxlist.com53.54001 www.w3.org14.86574 www4.wiwiss.fu-berlin.de11.09228 www.uniprot.org9.309607 fake.org 0.014838 18http://wtlab.um.ac.ir

19 T2: DING Vs. SWSE Vs.TRank DomainRank bio2rdf.org2098.214 129.128.185.1221192.487 dbpedia.org828.412 www.drugbank.ca213.8099 en.wikipedia.org167.2681 www.rxlist.com138.9984 www.w3.org38.43269 www4.wiwiss.fu-berlin.de30.34906 www.uniprot.org22.76892 www.fakex.org0.751035 www.faken.org0.741516 www.fakew.org0.364824 www.fakez.org0.128398 www.fakey.org0.012898 fake.org0.00323 DomainRank fake.org0.297017 www.fakex.org0.115944 www.faken.org0.093344 www.fakey.org0.088278 www.fakew.org0.068561 www.fakez.org0.062708 www4.wiwiss.fu-berlin.de0.041312 www.rxlist.com0.038824 en.wikipedia.org0.032412 www.drugbank.ca0.028589 129.128.185.1220.012213 bio2rdf.org0.010067 www.uniprot.org0.009621 dbpedia.org0.008556 www.w3.org0.000876 DomainRank bio2rdf.org534.4987 129.128.185.122305.1813 dbpedia.org213.3416 www.drugbank.ca56.59444 en.wikipedia.org44.56518 www.rxlist.com36.79457 www.w3.org10.21828 www4.wiwiss.fu-berlin.de7.625585 www.uniprot.org6.397596 www.faken.org0.237435 www.fakex.org0.147879 www.fakew.org0.053366 www.fakez.org0.028163 www.fakey.org0.021987 fake.org0.010184 19http://wtlab.um.ac.ir

20 T3: DING Vs. SWSE Vs. TRank DomainRank bio2rdf.org3.69E+06 129.128.185.1222097198 dbpedia.org1473832 www.drugbank.ca376018.4 en.wikipedia.org294166.7 www.rxlist.com244450.7 www.w3.org66640.34 www4.wiwiss.fu-berlin.de53334.26 www.uniprot.org4.00E+04 www.d2.org16714.27 www.d3.org12918.41 www.d4.org6289.253 www.d0.org5553.691 www.d1.org2367.403 fake.org5.776906 DomainRank www.w3.org0.114581 www.d4.org0.062247 www.d0.org0.062247 www.d2.org0.062247 www.d3.org0.062247 www.d1.org0.0514 www4.wiwiss.fu-berlin.de0.018005 fake.org0.016304 dbpedia.org0.011075 www.uniprot.org0.011075 bio2rdf.org0.011075 129.128.185.1220.011075 www.drugbank.ca0.011075 en.wikipedia.org0.011075 www.rxlist.com0.011075 DomainRank bio2rdf.org400.7644 129.128.185.122305.6677 dbpedia.org216.1277 www.drugbank.ca56.69572 en.wikipedia.org44.64441 www.rxlist.com36.85125 www.w3.org10.23479 www4.wiwiss.fu-berlin.de7.628382 www.uniprot.org6.406328 www.d3.org5.753264 www.d4.org4.503335 www.d0.org4.215745 www.d2.org1.462207 www.d1.org0.557099 fake.org0.010202 20http://wtlab.um.ac.ir

21 T4 DING Vs. SWSE Vs.TRank DomainRank www.d6.org2.28E+84 www.d1.org1.33E+84 www.d2.org1.06E+84 www.d0.org6.39E+83 www.d3.org4.93E+83 www.d5.org2.86E+83 www.d7.org2.70E+83 www.d8.org2.24E+83 www.d4.org1.54E+83 www.d9.org7.73E+82 bio2rdf.org6.88E+81 129.128.185.1223.91E+81 dbpedia.org2.75E+81 www.drugbank.ca7.03E+80 en.wikipedia.org5.50E+80 www.rxlist.com4.56E+80 www.w3.org1.25E+80 www4.wiwiss.fu-berlin.de9.98E+79 www.uniprot.org7.43E+79 fake.org1.08E+76 DomainRank www.w3.org0.0873 www.d2.org0.051435 www.d4.org0.051435 www.d8.org0.051435 www.d6.org0.04712 www.d5.org0.047016 www.d3.org0.046009 www.d1.org0.045234 www.d7.org0.042673 www.d0.org0.039809 www.d9.org0.035567 www4.wiwiss.fu-berlin.de0.013718 fake.org0.012422 dbpedia.org0.008438 www.uniprot.org0.008438 bio2rdf.org0.008438 129.128.185.1220.008438 www.drugbank.ca0.008438 en.wikipedia.org0.008438 www.rxlist.com0.008438 DomainRank bio2rdf.org31.92569 129.128.185.12219.21594 dbpedia.org14.48065 www.drugbank.ca4.871831 www4.wiwiss.fu-berlin.de4.487979 en.wikipedia.org4.032452 www.rxlist.com3.171212 www.w3.org1.332953 www.uniprot.org0.80188 www.d4.org0.27296 www.d8.org0.210585 www.d2.org0.199939 www.d6.org0.190853 www.d7.org0.1856 www.d5.org0.163599 www.d3.org0.153463 www.d0.org0.136611 www.d1.org0.113658 www.d9.org0.101992 fake.org0.007167 21http://wtlab.um.ac.ir

22 DING Remove semi-Spam DomainRank www.d0.org2.19E+08 www.d3.org1.47E+08 www.d5.org1.28E+08 www.d9.org9.35E+07 www.d8.org5.13E+07 www.d4.org3.37E+07 www.d1.org3.23E+07 www.d7.org2.39E+07 www.d2.org9182851 www.d6.org3713589 bio2rdf.org2814786 129.128.185.1221599737 dbpedia.org1124235 www.drugbank.ca286825.8 en.wikipedia.org224389.5 www.rxlist.com186466.3 www.w3.org50833.08 www4.wiwiss.fu-berlin.de40683.23 www.uniprot.org30543.9 fake.org4.406611 22http://wtlab.um.ac.ir

23 Evaluation Spearman correlation. Coming soon! 23http://wtlab.um.ac.ir

24 ? 24http://wtlab.um.ac.ir


Download ppt "Data set ranking in Semantic Search Engine to resist link SPAM By: Soheila DehghanZadeh WTLab Research Group Weekly Seminars August 2011 1http://wtlab.um.ac.ir."

Similar presentations


Ads by Google