Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented.

Similar presentations


Presentation on theme: "Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented."— Presentation transcript:

1 Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented By Stephen Lynn

2 Data Integration Overview  Aggregate Queries  Probabilistic Schema Mapping  Goals/Objectives  Aggregate Processing (3 proposals)  By-Table Algorithm  By-Tuple Algorithm  Evaluation  Analysis

3 Data Integration Aggregate Queries COUNT, MIN, MAX, SUM, AVG IDPriceQuantity 12.302 23.204 37.341 48.2920 53.323 Simple PTIME algorithms to compute

4 Data Integration Probabilistic Schema Mappings

5 Data Integration By-Table vs By-Tuple  Tuple – consider all possible mappings for each tuple  Table – single mapping for entire table  P(date→postedDate) = 0.7  P(date→reducedDate) = 0.3

6 Data Integration Goals/Objectives  Impact Analysis of Probabilistic Schemas on Aggregate Queries  Aggregate Query Algorithms  Time Complexity Analysis  Evaluation

7 Data Integration Aggregation Methods Range Distribution Expected Value

8 Data Integration Method Relationships  Distribution  Most time consuming  Most information  Range  Computed directly from distribution  Expected Value  Computed directly from distribution More efficient ways to compute

9 Data Integration By-Table Algorithm All PTIME computable

10 Data Integration By-Tuple Algorithm (COUNT) O(n * m)

11 Data Integration Example By-Tuple (COUNT)

12 Data Integration Time Complexity

13 Data Integration Evaluation  Empirical Evaluation  Real-world dataset (eBay)  Synthetic dataset  Evaluate Time Complexity  Vary tuple numbers  Vary attribute mappings

14 Data Integration Evaluation Results

15 Data Integration Evaluation Results

16 Data Integration Evaluation Results

17 Data Integration Analysis  Strengths  Effect of probabilistic schemas on aggregates  Nice PTIME algorithms  Weaknesses  Evaluation was obvious  By-Table results biased by database optimizations  Future Work  Improve algorithms  Extend to sub-queries  Heuristics


Download ppt "Data Integration Aggregate Query Answering under Uncertain Schema Mappings Avigdor Gal, Maria Vanina Martinez, Gerardo I. Simari, VS Subrahmanian Presented."

Similar presentations


Ads by Google