A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

1 gStore: Answering SPARQL Queries Via Subgraph Matching Presented by Guan Wang Kent State University October 24, 2011.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
IS530 Lesson 12 Boolean vs. Statistical Retrieval Systems.
Web Search – Summer Term 2006 VI. Web Search - Ranking (cont.) (c) Wolfgang Hürst, Albert-Ludwigs-University.
ONLINE EXPANSION OF RARE QUERIES FOR SPONSORED SEARCH attack Chih-Hung Wu.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Semantic Search Jiawei Rong Authors Semantic Search, in Proc. Of WWW Author R. Guhua (IBM) Rob McCool (Stanford University) Eric Miller.
Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.
Chapter 5 Searching for Truth: Locating Information on the WWW.
1 Discovering Unexpected Information from Your Competitor’s Web Sites Bing Liu, Yiming Ma, Philip S. Yu Héctor A. Villa Martínez.
Information Retrieval
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
Search Engine Optimization (SEO)
Design and Implementation of a Geographic Search Engine Alexander Markowetz Yen-Yu Chen Torsten Suel Xiaohui Long Bernhard Seeger.
Search Engine Optimization
HITS – Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub.
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Chapter 5 Searching for Truth: Locating Information on the WWW.
HOW SEARCH ENGINE WORKS. Aasim Bashir.. What is a Search Engine? Search engine: It is a website dedicated to search other websites and there contents.
Courtney Forsmann IT Help Desk Manager Lewis-Clark State College October 1, 2014.
Search Engines & Search Engine Optimization (SEO).
Web Categorization Crawler Mohammed Agabaria Adam Shobash Supervisor: Victor Kulikov Winter 2009/10 Design & Architecture Dec
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
The Business Model and Strategy of MBAA 609 R. Nakatsu.
Overview What is a Web search engine History Popular Web search engines How Web search engines work Problems.
SEO  What is it?  Seo is a collection of techniques targeted towards increasing the presence of a website on a search engine.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
윤언근 DataMining lab.  The Web has grown exponentially in size but this growth has not been isolated to good-quality pages.  spamming and.
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
Improving Cloaking Detection Using Search Query Popularity and Monetizability Kumar Chellapilla and David M Chickering Live Labs, Microsoft.
Search Engine Optimization: A Survey of Current Best Practices Author - Niko Solihin Resource -Grand Valley State University April, 2013 Professor - Soe-Tsyr.
Page 1 Alliver™ Page 2 Scenario Users Contents Properties Contexts Tags Users Context Listener Set of contents Service Reasoner GPS Navigator.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
The Business Model of Google MBAA 609 R. Nakatsu.
Keyword Query Routing.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
Problems in Semantic Search Krishnamurthy Viswanathan and Varish Mulwad {krishna3, varish1} AT umbc DOT edu 1.
SMX Madrid 2008 Uncovering the Algorithm A Peek Inside How Google Evaluates and Ranks Pages.
Search Engines By: Faruq Hasan.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Search Engines, SEO and Web Search By Alessandro Ballarin.
Week 1 Introduction to Search Engine Optimization.
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
Data set ranking in Semantic Search Engine to resist link SPAM By: Soheila DehghanZadeh WTLab Research Group Weekly Seminars August http://wtlab.um.ac.ir.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC
Web Spam Taxonomy Zoltán Gyöngyi, Hector Garcia-Molina Stanford Digital Library Technologies Project, 2004 presented by Lorenzo Marcon 1/25.
Automated Question Answering Suggestion Using User Expert and Semantic Information การแนะนำการตอบคำถามอัตโนมัติ โดยใช้ข้อมูลผู้เชี่ยวชาญ และข้อมูลเชิง.
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
Search Engine Optimization
WEB SPAM.
IST 516 Fall 2011 Dongwon Lee, Ph.D.
وب معنایی در تجارت الکترونیک
A Comparative Study of Link Analysis Algorithms
Searching for Truth: Locating Information on the WWW
Agenda What is SEO ? How Do Search Engines Work? Measuring SEO success ? On Page SEO – Basic Practices? Technical SEO - Source Code. Off Page SEO – Social.
Searching for Truth: Locating Information on the WWW
Searching for Truth: Locating Information on the WWW
Information Retrieval and Web Design
Presentation transcript:

A Ranking Algorithm for Semantic search engine – spam and fake detection case study By: Soheila Dehghanzadeh. Web technology lab weekly seminars.

Agenda : Web spam definition A brief overview of Search engines Search engine phases: – Crawling – Indexing – Index lookup – Ranking lookup results My proposed ranking algorithm

Web spam and fake: In web of data anyone is able to say anything about anything. Low quality data should not be mentioned in top search results.

A Search Engine:

web of data vs. web of documents. WODocNo: link type and no trustworthiness (just popularity). WOData: should consider link type and link context (for provenance and proof of trust).

Crawling & Indexing phase… Using ldspider to crawl linked data. Using hexastore for complete indexing the crawled data. Special thanks to Panagiotis Karras for providing hexastore implementation in python.

Index lookup results for extension… some Results may not include keyword but they have high quality and relevance. Result expansion to hide the locality effect. Some sites is referred many times but in this special context other professional sites lookup results are more interested. Web of data Crawler Raw rdf indexing index lookup result ranking

HexaStore Index structure that we use in our search engine. Each RDF element type deserve to have special index structure build round it. Every possible ordering of the importance or precedence of the three elements in the indexing scheme is materialized. Each index structure in a hexastore centers around one RDF element and defines a perioritation between the other 2 elements.

Sample spo indexing in a hexastore Si P(I,1)P(I,2)P(I,Ni) O(i1,1) O(i1,2) O(i1,ki1) O(i2,1) O(i2,2) O(i2,ki2) O(iNi,1) O(iNi,2) O(iNi,kiNi) Space complexity : Spo+sp+pso+so+ pos+po

My idea! Import the base result set to jena and extend it. Extending the base set with ontology reasoning rules so that extra resources and relations will be added through reasoning rules. The added resources The added relation has no context so their trustworthiness is an aggregation function on (x,y,rule) relations--- Resources will be added only through sameAs predicate Resources will be ranked according to relevance to query terms (using ontobroker – pagerank – objectrank- triplerank – HITS,….) Query – Keyword query – Structured query – Ontology based query (using an interface to get query) - ontobroker Relation (properties) will be ranked according to contexts(provenance) using relation ranking methods such as semRank or we can look at context’s pageRank. Note that First we rank resources and second we rank relations. However it depends on the user query whether it is looking for relations or resources.

Lookup on quads for keyword (Soheila) Q1: Q2: Dehghan”,NIOC Q3: Q4: Q1 /u122 /us122 SA(UM) SA(FB) Q3 SA(UM) Q2 SA(NIGC) u12 SA(LI) Q4 SA(FK) CheeseB.Gates Scott Dancewith(FK) Meet(CNN) Buy(Spam)

Result set expansion methods: step1: using sameas predicate on found Qaudes and extend ResultSet to Q1,…,Qr index LookUp – Q1(S),SameAs,?,?  Qr(S),SameAs,?,? – ?,SameAs,Q1(S),?  ?,SameAs,Qr(S),? (Q1,…,Q4  Q1,…,Q4,FBURI,LinkedInURI,isportURI) in our case. apply PR on Extended graph with SameAs which SameAs links are replaced with PR weight of sameAs context.(to know the trustwothiness of each contexts).

Result set expansion methods: Step2: LookUp all properties of Q1(s),…,Qr(s) – Q1(s),?,?,?—?,?,Q1(s),? – … – Qr(s),?,?,?—?,?,Qr(s),? Step4: add inferred relation using domain ontology(context is composed of ontology+inference process) Step4: rank Q1,…,Qr according to their TpageRank (computed online from graph of step1 ), rank relations according to their context pageRank(which is computed by Google offline) Note : contexts who has PR lower than a treshhold won’t be mentioned.they maybe Spam or Fake Sites.

Structured query on quads indexes Single pivot: – (S,?,?,?),(?,p,?,?),(?,?,o.?),(?,?,?.C) Double pivot: – (S,p,?,?),(s,?,O,?),(s,?,?,C),(?,P,O,?),(?,P,?,C),(?,?,o, C) Triple pivot: – (s,p,o,?),(s,p,?,c),(s,?,o,c),(?,p,o,c)

Step1: if the specified parts was URI then a direct lookup is performed by search engine. Otherwise if user have specified keyword for each parts then firstly a keyword search will be done and then for each result URI a lookup will be performed.

Lookup on quads for ontological queries

Soheila (FUM) FUM GAS NIOC Team Sally(NIOC) Studied in(FUM) Worked at(GAS) played in(NIOC) Owl:sameAs(NIOC) DehghanZadeh (GAS) Owl:sameAs(FUM) Kahani (FUM) Supervisor (FUM)

Related works for ranking web of data… Objectrank Ding Sindice ti-idf EntityRank. semRank ReConRank ontobroker…

Proof of trust Jena inference Explanation will be used to represent as a proof of trust

Evaluation … Compare Spam ranks Compare query time Compare index size

Any question?

Best things in the life are free. Thanks for attention.