©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Chatty Web: Emergent Semantics Through Gossiping WWW2003 Karl Aberer,

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Relevance Feedback and User Interaction for CBIR Hai Le Supervisor: Dr. Sid Ray.
Peer to Peer and Distributed Hash Tables
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
Peer-to-Peer Networking for Distributed Learning Repositories: The Edutella Network Diplomarbeit von Boris Wolf.
Efficient Query Evaluation on Probabilistic Databases
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
©2004, Philippe Cudré-Mauroux Sharing Pictures in Peer-DBMS MSRA, Image Retrieval Meeting Philippe Cudré-Mauroux Distributed Information Systems Laboratory.
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 Semantic Network.
Probabilistic Message Passing in Peer Data Management Systems Philippe Cudré-Mauroux, Karl Aberer EPFL Andras Feher, T.U. Darmstadt.
©2004, Philippe Cudré-Mauroux Semantic Interoperability for Global Information Systems Microsoft Research Asia Philippe Cudré-Mauroux Distributed.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
Locally Constraint Support Vector Clustering
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 ICDE
The National Centres of Competence in Research are managed by the Swiss National Science Foundation on behalf of the Federal Authorities 1 MICS Scientific.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Chatty Web approach for global semantic agreements MMGPS Workshop,
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
©2004, Philippe Cudré-Mauroux Exploiting Localized Metadata in Decentralized Settings Microsoft Research Asia Philippe Cudré-Mauroux Distributed.
P2P Information Interoperability & Decision Support Domain Application SEMANTIC INTEROP QUERY PROCESSING GIS INTEROP P2P ● Heterogeneous semantic ● Semantic.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
©2002, Karl Aberer, EPFL, Laboratoire de systèmes d'informations répartis Semantic Gossiping Karl Aberer, EPFL Distributed Information.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
ODBASE A Necessary Condition for Semantic Interoperability in the Large Philippe Cudré-Mauroux and Karl Aberer School of Computer and Communication.
1 ISWC GridVine: Building Internet-Scale Semantic Overlay Networks Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth School of Computer.
presented by Hasan SÖZER1 Scalable P2P Search Daniel A. Menascé George Mason University.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Quality-driven Integration of Heterogeneous Information System by Felix Naumann, et al. (VLDB1999) 17 Feb 2006 Presented by Heasoo Hwang.
Searching in Unstructured Networks Joining Theory with P-P2P.
Improving Data Access in P2P Systems Karl Aberer and Magdalena Punceva Swiss Federal Institute of Technology Manfred Hauswirth and Roman Schmidt Technical.
©2003, Philippe Cudre-Mauroux, EPFL-I&C-IIF, Laboratoire de systèmes d'informations répartis LSIR Weekly seminar Mapping the Semantic Web.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Peer-to-Peer Databases David Andersen Advanced Databases.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
The PageRank Citation Ranking: Bringing Order to the Web Presented by Aishwarya Rengamannan Instructor: Dr. Gautam Das.
Alert Correlation for Extracting Attack Strategies Authors: B. Zhu and A. A. Ghorbani Source: IJNS review paper Reporter: Chun-Ta Li ( 李俊達 )
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
Information Systems: Modelling Complexity with Categories Four lectures given by Nick Rossiter at Universidad de Las Palmas de Gran Canaria, 15th-19th.
Peer-to-Peer Data Integration Using Distributed Bridges Neal Arthorne B. Eng. Computer Systems (2002) Supervisor: Babak Esfandiari April 12, 2005 Candidate.
Theory Revision Chris Murphy. The Problem Sometimes we: – Have theories for existing data that do not match new data – Do not want to repeat learning.
POLIPO: Policies & OntoLogies for Interoperability, Portability, and autOnomy Daniel Trivellato.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
POLICY ENGINE Research: Design & Language IRT Lab, Columbia University.
Enabling Peer-to-Peer SDP in an Agent Environment University of Maryland Baltimore County USA.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Data Access and Security in Multiple Heterogeneous Databases Afroz Deepti.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Security for Distributed Data Management.
Cooperative Location- Sensing for Wireless Networks Authors : Haris Fretzagias Maria Papadopouli Presented by cychen IEEE International Conference on Pervasive.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Light-weight Domain-based Form Assistant: Querying Web Databases On the.
The Chatty Web : Emergent Semantics Through Gossiping Karl Aberer, Philippe Cudre-Mauroux, Manfred Hauswirth Presented by Yookyung Jo.
Organizing Structured Web Sources by Query Schemas: A Clustering Approach Bin He Joint work with: Tao Tao, Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Sensor Data Search & Integration Philippe Cudré-Mauroux & Karl Aberer Nokia-MICS meeting Novembre 14, 2006.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
A Viewpoint-based Approach for Interaction Graph Analysis
On Growth of Limited Scale-free Overlay Network Topologies
Data and Applications Security Developments and Directions
Data and Applications Security Developments and Directions
Actively Learning Ontology Matching via User Interaction
WSExpress: A QoS-Aware Search Engine for Web Services
Data and Applications Security Developments and Directions
Presentation transcript:

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Chatty Web: Emergent Semantics Through Gossiping WWW2003 Karl Aberer, Philippe Cudré-Mauroux, Manfred Hauswirth Distributed Information Systems Laboratory (LSIR) Swiss Federal Institute of Technology, Lausanne (EPFL)

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Outline Problem statement / motivations Model Intrinsic criteria Extrinsic criteria Case study Conclusions

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Problem (1) Swissprot site at Geneva A lab at MIT A lab in Trondheim organism Query posted at EPFL species EMBLChange site at Cambridge organism EMBLChange peers species, … SwissProt peers authors, titles, organism, … other peers authors, …

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Problem (2) How to obtain semantic interoperability among heterogeneous data sources without relying on pre-existing, global semantic models?

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Motivations Querying the semantic web –Large number of data / meta-data sources –Heterogeneous and decentralized by essence Federating loosely-coupled specialized databases –Pre-existing schemas –No query propagation Introducing meta-data support in modern P2P applications –Complete decentralization and auto-organization –Nonsense to impose a global schema

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Outline of the solution A lab in Trondheim species EMBLChange site at Cambridge Swissprot site at Geneva A lab at MIT organism Query posted at EPFL organism EMBLChange peers species, … SwissProt peers authors, titles, organism, … other peers authors, … organism  authors organism  species species  organism Local translations enabling global agreements

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Query Forwarding To whom shall we send the queries? –To peers susceptible of sending us a response… Simplistic solutions –Local Neighboring (same schema) Low recall –Query Flooding (entire network) Low precision, high network load Semantic Gossiping –Query forwarding by selecting the right peers –Query dependant PHBs (Per-Hop Behaviors) –Query / transformed queries analysis Intrinsic measures (syntactic distances) Extrinsic measures (semantic distances)

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Model Network of peers Identical (Gnutella model) / related schemas Queries Translations between some schemas

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab On Translations

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Syntactic Similarity Measure the degree of similarity between the original and the transformed queries Based on the number of attributes lost during translations Takes into account the importance of the attributes and the selectivity of the predicates Two different values: projection / selection Can be computed iteratively, i.e. for every transformation step Results kept on an attribute-basis in a feature vector

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Semantic Similarity The syntactic similarity always assumes that the translations are semantically correct –This is usually not the case in reality We define extrinsic measures for the semantic quality of the translations –Query cycles analysis –Results analysis Semantics as an agreement among peers

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Cycle Analysis Query cycle detection Based on the original query it forwarded and on the returning query it received, p 1 may assess the correctness of the translation T 1->2

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Cycle Analysis (2) What happened to an attribute A i present in the original query? –(T 1->n ) (A i ) = (A i )  (positive cycle) –(T 1->n ) (A i ) = (A j ) X (negative cycle) –(T 1->n ) (A i ) =  - (no semantic information) Probabilistic analysis based on the positive and negative feedback we receive   s : probability of p 1 ’s translation being incorrect   f : probability of another translation being incorrect   : probability of two errors being compensated somehow

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Cycle Analysis (3) Combining different probabilities, we have the likelihood l() of receiving a series of feedbacks, some positive, other negative No assumption on error probabilities Prob. of p 1 ’s translation being correct (we could take into account density functions here) is For every outgoing link, peers keep a corresponding semantic feature-vector Similar method for result analysis (association rule mining)

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab To Gossip or not We compute similarity measures S i based on –the four feature vectors –the weights (relative importance) of the attributes W We forward the query using a translation link if where S min i are query-defined thresholds for the different similarity values  queries are only forwarded to peers susceptible of understanding them syntactically and semantically

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Case study (1) First tests performed inside the lab –7 nodes (semantic domains) –21 translation links (some erroneous) Test-bed uses –Java-based query processor and translator (based on IPSI XQuery processor) –P2P app. (JXTA) for creating and exporting schema + data Initial results only

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Case study (2) Query = FOR $project IN “project_A.xml”/* RETURN $Project/title Analysis for A (thresholds set to 0.5): : C has no representation for the title –C will never receive the query : –Translation link A to B : 0.79 –Translation link A to C : 1 (cf. syntactic analysis) –Translation link A to D : 0.26

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab Conclusions Semantic interoperability in a bottom-up, decentralized manner Global agreement from purely local interactions Relies on local translations between the different schemas Semantics viewed as an agreement 2-dimensional analysis: –intrinsic (syntax) –extrinsic (semantics) Results used to determine whether or not it is useful to forward a query to a certain group

©2003, Philippe Cudré-Mauroux, EPFL-I&C-IIF, Distributed Information Systems Lab The Chatty Web: Emergent Semantics Through Gossiping WWW2003 lsirwww.epfl.ch