Download presentation

Presentation is loading. Please wait.

Published byIan Nelson Modified over 2 years ago

1
Uncertainty in Data Integration Ai Jing

2
Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

3
Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

4
Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

5
Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

6
Traditional Data Integration Systems SELECT P.title AS title, P.year AS year, A.name AS author FROM Author, Paper, AuthoredBy WHERE Author.aid = AuthoredBy.aid AND Paper.pid = AUthoredBy.pid Q Q1Q1 Q2Q2 Q3Q3 Q4Q4 Q5Q5

7
Uncertainty Can Occur at Three Levels in Data Integration Applications III. Query Level II. Mapping Level I. Data Level Focus of the paper: Probabilistic schema mappings

8
Example Probabilistic Mappings T(name, , mailing-addr, home-addr, office-addr) S(pname, -addr, current-addr, permanent-addr) T(name, , mailing-addr, home-addr, office-addr) S(pname, -addr, current-addr, permanent-addr) T(name, , mailing-addr, home-addr, office-addr) S(pname, -addr, current-addr, permanent-addr) m1: 0.5 m2: 0.4 m3: 0.1

9
Top-k Query Answering w.r.t. Probabilistic Mappings Mediated Schema Q: SELECT mailing- addr FROM T Q1: SELECT current-addr FROM S Q2: SELECT permanent-addr FROM S Q3: SELECT -addr FROM S

10
Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

11
Definition of probabilistic mappings Schema Mapping Probabilistic Mapping S=(pname, -addr, home-addr, office-addr) T=(name, mailing-addr) one-to-one schema matching have exact knowledge of mapping S=(pname, -addr, home-addr, office-addr) T=(name, mailing-addr)

12
By-Table Semantics DT=DT= m 0.5

13
By-Tuple Semantics DT=DT= Pr( )=0.05 …

14
Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

15
By-Table Query Answering

16
By-Tuple Query Answering

17
Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

18
Complexity of query answering

19
More on By-Tuple Query Answering The high complexity comes from computing probabilities the number of mapping sequences is exponential in the size of the input data n tuples, m mappings m^n mapping sequences There are two subsets of queries that can be answered in PTIME by query rewriting SELECT mailing-addr FROM T SELECT mailing-addr FROM T,V WHERE T.mailing-addr = V.hightech In general query answering cannot be done by query rewriting One of Dt

20
Extensions to More Expressive Mappings The complexity results for query answering carry over to three extensions to more expressive mappings Complex mappings GLAV mappings Conditional mappings:

21
Data Integration with Uncertainty Motivation and overview Definition of probabilistic mappings Query answering w.r.t. p-mappings Complexity of query answering Contributions

22
Definition of probabilistic mappings Semantics: by-table v.s. by-tuple Complexity of query answering

23
Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

24
Overview of MUD 2007 Theory A New Language and Architecture to Obtain Fuzzy Global Dependencies A New Language and Architecture to Obtain Fuzzy Global Dependencies About the Processing of Division Queries Addressed to Possibilistic Databases About the Processing of Division Queries Addressed to Possibilistic Databases Making Aggregation Work in Uncertain and Probabilistic Databases Application Making Aggregation Work in Uncertain and Probabilistic Databases Application Materialized Views in Probabilistic Databases Application Flexible matching of Ear Biometrics Consistent Joins Under Primary Key Constraints

25
A New Language and Architecture to Obtain Fuzzy Global Dependencies SQL does not satisfy the minimum requirements to be true DM language A New Language: dmFSQL (data mining Fuzzy Structured Query Language) Fuzzy Database Data mining

26
About the Processing of Division Queries Addressed to Possibilistic Databases They devised a data model which is a strong representation system for operations in possibilistic databases A possibilistic databases D can be interpreted as a weighted disjunctive set of regular databases Division Queries

27
Making Aggregation Work in Uncertain and Probabilistic Databases Trio is a prototype database management system for storing and querying data with uncertainty and lineage Trio s query language TriQL Trio data model and query semantics Aggregation function in the Trio system for uncertain and probabilistic data

28
Materialized Views in Probabilistic Databases Materialized Views for probabilistic may not define a unique probability distribution view representation Answer queries on large probabilistic data set more efficiently with materialized views

29
Flexible matching of Ear Biometrics Research area Image Recognition (or Identification) Scenario identifying found bodies in a large-scale disaster Challenge fast and cheap identification no DNA-databases or fingerprint databases are at hand

30
Consistent Joins Under Primary Key Constraints Inconsistent database primary key will the natural join of the repaired relations always be nonempty, no matter which tuples are selected? game theory, winning strategy

31
Outline Data Integration with Uncertainty Overview of Workshop on Management of Uncertain Data Uncertainty in Deep Web

32
No perfect data Noise Dirty Redundancy …… No perfect solution Web data extraction Interface integration ……

33
Uncertainty in Deep Web Data Integration(1) Robust Evaluable

34
Uncertainty in Deep Web Data Integration(2) Tuning Feedback Evaluable

35
Uncertainty in Jobtong(1) Data level

36
Uncertainty in Jobtong(2) Query level How can we give every result a probability to show it s importance?

37
Uncertainty in Jobtong(3) The automatic maintenance of configuration files 2 title td[2]/a/span company td[3]/a/span 2 title td[2]/a company td[3]/a

38
Q&A Thank you!

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google