Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ESTEEM: Trust-aware P2P data integration Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco Dipartimento di Informatica e Sistemistica Università

Similar presentations


Presentation on theme: "1 ESTEEM: Trust-aware P2P data integration Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco Dipartimento di Informatica e Sistemistica Università"— Presentation transcript:

1 1 ESTEEM: Trust-aware P2P data integration Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

2 2 Outline Progetti precedenti Obiettivi ESTEEM Problematiche e direzioni di ricerca dellunità Data quality: Quality-aware query processing Privacy: Privacy-aware record matching Trust: Modello di trust per le sorgenti

3 3 DaQuinCIS project (2003) MIUR – COFIN/PRIN Main focus: data quality in cooperative information systems (CISs) Data Quality Problems: Record Matching Quality-driven query processing

4 4 Motivations A real example: e-Goverment project to integrate data about Italian companies DATA INTEGRATION LAYER Query Company XYZ ? Chambers of CommerceSocial Insurance AgencyAccident Insurance Agency

5 5 Chambers of Commerce Social Insurance AgencyAccident Insurance Agency Id Name Type of activity Address City

6 6 The Three Real Records IDType of Activity CityNameAddress CNCBTB765SDVRetail of bovine and ovine meats Novi LigureMeat production of Bartoletti Benito National Street dei Giovi 0111232223Grocers shop, beverages Pizzolo Formigaro Bartoletti Benito Meat production 9, Rome Street CNCBTR765LDVButcherOvadaMeat production in Piemonte of Bartoletti Benito 4, Mazzini Square Which is the actual company XYZ to be returned to the client ? One of 3 ? Which ? A merge of the 3 ? Which is the actual company XYZ to be returned to the client ? One of 3 ? Which ? A merge of the 3 ?

7 7 Objectives of the Research Given a set of distributed and heterogeneous data sources that are affected by data quality problems 1. Improving the quality of each data source Record matching across sources 2. Provide a unified and trasparent access to data sources Data Integration & Quality-driven query processing

8 8 Improving quality of addresses in Italian PA (2004) Accordo di collaborazione AIPA (ora CNIPA) e ISTAT Aprile 2002-Luglio 2004 Proposta di formati standard per lacquisizione e linterscambio degli indirizzi Proposta di ridisegno dei flussi per laggiornamento degli indirizzi Metodologia per la misurazione della qualità degli indirizzi Misurazione sperimentale della qualità degli indirizzi in tre archivi nazionali: Agenzia delle Entrate Camere di Commercio INPS

9 9 Data Quality and Data Privacy (Current) Joint Activity with University of Purdue, Indiana USA Publishing elementary data may violate privacy requirements, even when data are anonymized anonymization removes principal identifiers like SSN, Name+Surname+DOB, etc. Record matching privacy aware only the result of the intersection (AB) across data sets are shared and nothing else (not A- AB and not B-AB)

10 10 Obiettivi ESTEEM Studio di problematiche di trust e qualità dei dati in sistemi P2P Specifica di sistemi di integrazione dati P2P con requisiti di trust Definizione di algoritmi di query processing quality- and trust-aware

11 11 P2P Systems P2P systems loosely coupled, dynamic, open Data sharing in such systems no centralized global schema peers mapping dynamically build new peers can make available new data schema

12 12 Data Quality EmployeeIDNameSurnameSalaryEmail arpa78JohnSmith2600smith@abc.it eugi98EdwardMonroe1500monroe@abc.it ghjk09AnthonyWhite1250white@abc.it dref43MarianneCollins1150collins@abc.it Attribute conflict EmployeeIDNameSurnameSalaryEmail arpa78JohnSmith 2000 smith@abc.it eugi98EdwardMonroe1500monroe@abc.it ghjk09AnthonyWite1250white@abc.it treg23 MarianneCollins1150collins@abc.it Key conflict EmployeeS1 EmployeeS2

13 13 Quality-aware query processing - 1 Key conflicts require the application of Record Matching techniques Attribute conflicts are solved by query time Conflict Resolution Techniques The resolution of such conflicts in P2P systems is an open issue: Definition of a quality-aware semantics for query answering in P2P systems Need to develop techniques for solving such conflicts according to the defined semantics

14 14 Quality-aware query processing - 2 Query language supporting the specification of conflict resolution strategies Important in P2P systems: research space pruning on the basis of quality characterization of sources

15 15 Privacy How to protect privacy when sharing data? With the source S1 and S2 issuing the Queries Q1 and Q2 respectively, at the end of the interaction S1 must learn result Q1 and nothing else S2 must learn result Q2 and nothing else S1 S2 Query Q1 Result Q2 Query Q2 Result Q1

16 16 Privacy-aware Record Matching - 1 AB A B Secure set intersection: (i) matching esatto; (ii) non di record; (iii) costosi Private data sharing: (i) matching esatto; (ii) schema un-aware

17 17 Privacy-aware Query Processing - 2 Algoritmi che consentano di fare privacy aware record matching in contesti P2P Problema della third party Prime proposte ElAbbadi ICDE 2006 ma matching esatto

18 18 Trust Trust typically associated to a source as a whole Need for finer level characterization Eg: M i nistero delle Finanze affidabile rispetto ai Codici Fiscali

19 19 Modello di Trust per le sorgenti dati -1 Previous proposals: the whole organization (peer) Our proposal: # of D- exchanges of Org k # of complaints sent by Org i

20 20 Modello di Trust per le sorgenti dati - 2 Drawback: Centralized Need for: Decentralized More flexible model (e.g. trust associated to views)

21 21 Modello di Trust per le sorgenti dati - 3 More general trust characterization based on the evaluation of a peers assertion on some metadata: Data quality-aware: trust computed on the basis of the declared quality of provided data Privacy-aware: trust computed on the basis of the declared privacy level different roles for providers and consumers: e.g. a provider can decide not to release data if a requester is not privacy - trusted (or to adopt specific technique)


Download ppt "1 ESTEEM: Trust-aware P2P data integration Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco Dipartimento di Informatica e Sistemistica Università"

Similar presentations


Ads by Google