Presentation is loading. Please wait.

Presentation is loading. Please wait.

R*: An Overview of the Architecture

Similar presentations


Presentation on theme: "R*: An Overview of the Architecture"— Presentation transcript:

1 R*: An Overview of the Architecture
R. Williams, et al IBM Almaden Research Center

2 Outline Environment and Data Definitions Object Naming
Distributed Catalogs Transaction Management and Commit Protoctols Query Preparation Query Execution SQL Additions and Changes

3 Environment and Data Definitions
CICS as the underlying communication model Data distribuion: Dispersed Replicated Partitioned Horizontal vertical Snapshot

4 Figure 1 from paper

5 Figure 21.4 from CS 432 text

6 Object Naming System Wide Names (SWN):
BIRTH_SITE

7 Distributed Catalogs Local site maintains objects in its database
Catalog entry may be cached Entries are versioned SWN Type Format Access path Object ref (view) Statistics

8 Transaction Management and Commit Protocol
Transaction number: SITE.SEQ_NUM (or SITE.TIME) Two phase commit (2PC)

9 Query Preparation Name resolution Authorization check
Distributed compilation Global plan generation/optimization Local access path selection Local optimization Local view materialization

10 Figure 2 from paper

11 Cost Model 3 weighted components: I/O CPU Message # of messages sent
# of bytes sent

12 Query Execution Synchronous vs asynchronous execution
Distributed concurrency control Deadlock detection and resolution Crash recovery

13 Figure 3 from paper

14 SQL Additions and Changes
DEFINE SYNONYM DISTRIBUTE TABLE HORIZONTALLY VERTICALLY REPLICATED DEFINE SNAPSHOT REFRESH SNAPSHOT MIGRATE TABLE

15 Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center
R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center

16 Outline Distributed Compilation/Optimization Instrumentation
Experiments and Results

17 Distributed Compilation/Optimization
Issues: Join site Transfer methods: ship whole fetch matches Cost model

18 Weights Estimation CPU: inverse of MIPS
I/O: avg seek, latency, transfer time MSG: # of instruction per msg BYTE: effective transmission speed of network

19 Figure 2 from paper

20 Instrumentation Distributed EXPLAIN Distributed COLLECT COUNTERS
Force optimizier

21 Experiment I Transfer method Merge-scan join of 2 tables:
500 tuples in each table Project both table – 50% 100 different values for join attribute Join result: 2477 tuples

22 Figure 4 from paper

23 Figure 3 from paper

24 Experiment II Distributed vs local join Join of 2 tables:
1000 tuples in each table Project both table – 50% 3000 different values for join attribute

25 Figure 5 from paper

26 Figure 6 from paper

27 Experiment III Relative importance of cost components

28 Figure 7, 8, 9, 10 from paper

29 Experiment IV Optimizer evaluation
Accurate estimates of # of msgs and bytes sent (<2% difference) Better estimates when tables are more distributed

30 Experiment V Alternative distributed join methods: 2 tables:
Dynamically created indexes Semijoins Bloomjoins 2 tables: 1000 tuples for outer Varies inner from 100 to 6000 tuples

31 Figure 11, 12 from paper

32 Other Experiments Clustered index: 50% Projection: Wider join column:
Bloomjoins < Semijoins < R* 50% Projection: Site 1: Bloomjoins < Semijoins < R* Site 2: Bloomjoins < R* << Semijoins Wider join column: Bloomjoins < R* << Semijoins


Download ppt "R*: An Overview of the Architecture"

Similar presentations


Ads by Google