Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved MOCHA : A Self-Extensible Database Middleware System for Distributed Data Sources Manuel Rodriguez-Martinez.

Similar presentations


Presentation on theme: "© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved MOCHA : A Self-Extensible Database Middleware System for Distributed Data Sources Manuel Rodriguez-Martinez."— Presentation transcript:

1 © Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved MOCHA : A Self-Extensible Database Middleware System for Distributed Data Sources Manuel Rodriguez-Martinez Nick Roussopoulos

2 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 2 Motivation Data Sources are distributed and heterogeneous: Fact of Life... Client Oracle 8iInformixXML DataText Data Internet

3 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 3 Client-Server Connectivity 2-tier architecture means FAT Clients Client Oracle 8iInformixXML DataText Data Internet Not a Good Idea

4 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 4 Middleware Integration Service Client Oracle 8iInformixXML DataText Data Internet Translator Middleware is a 3-tier connectivity solution – Thin Clients Integration Server Catalog

5 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 5 Problem 1: Code Deployment User-defined types and functions –Polygon –Composite() – image aggregation Porting and manual installation of code –Operating system –Hardware platform Expensive Software Maintenance –Updates –Version management Security –Software certification

6 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 6 Problem 1: Code Deployment Client Oracle 8iInformixXML DataText Data Internet Translator Not Scalable – Expensive System Growth Integration Server Catalog

7 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 7 Problem 2: Query Processing Operator placement options –Limited by site-dependent software Composite() – got to have it before using it! Most processing at Integration Server –Powerful Data Servers are under-utilized I/O Nodes –Excessive data movement over the network Network bottleneck Unfeasible in WANs, Internet

8 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 8 Problem 2: Query Processing Client Oracle 8iInformixXML DataText Data Internet Translator Not Scalable – Inefficient evaluation of queries Integration Server Catalog 100MB

9 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 9 MOCHA Solution: Ship Code! Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Oracle Informix DAP QPC Code Repository Catalog Internet Virginia Maryland Virginia Texas

10 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 10 MOCHA Solution: Filter Data! Select location, Composite(image) From Rasters Where week BETWEEN t1 and t2 Group By location Client Oracle Informix DAP QPC Code Repository Internet Virginia Maryland Virginia Texas Catalog 200MB tuples 100MB tuples results 200KB results 150KB results 150KB results 200KB results 150KB results 200KB results 350KB results 350KB

11 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 11 MOCHA Goals Automatic Deployment of Code (self-extensible) –QPC ships compiled Java classes User-defined types and functions –XML for their metadata (easy exchange) Data processing at data source sites –Utilize powerful machines On-site data distillation Processing based on data movement reduction –Filter data at the data sources –Expand data near the clients

12 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 12 The MOCHA Architecture Client Informix Oracle QPC DAP Code Repository Catalog Multi-threaded Distributed Objects Coordination Thread Execution Thread

13 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 13 QPC: The Integration Server Client API Query Parser Catalog Manager Query Optimizer Execution Engine Code Loader SQL & XML Proc. Interface DAP Access API XML Catalog Code Repository DAP QPC Controls and Coordinates Query Execution

14 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 14 DAP: The Facilitator of Data DAP Provides QPC with Remote Access to the Data Data Source DAP Access API Control Module Execution Engine Code Loader SQL & XML Proc. Interface Data Source Access Layer JDBCI/O APIDOMJNI 100MB tuples 100MB tuples 100MB tuples results 150KB 100MB tuples

15 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 15 Road Map Introduction Problem Definition MOCHA Architecture Query Processing Experiments Summary

16 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 16 Processing The Queries Issue 1: Placement and deployment of operators –Which operators go to QPC, and which go to the DAPs? Issue 2: How to determine this placement? –Dynamic programming [SAC+79], [ML86] –But search space is enormous Placement of UDF, joins, execution sites … Plenty of bad plans èIn MOCHA: Query Optimization based on heuristics –Network usually is the critical factor optimize for it first –CPU and I/O are cheaper optimize for them later –Quickly converge to a good plan

17 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 17 Operator Placement Data-Reducing Operators –Filter the data –Aggregates, predicates, projections, semi-joins Composite(), Overlaps(), AvgEnergy() Push to the DAPs –Code Shipping policy (Unique to MOCHA) –Only send back distilled results +Less data movement Cost: –Computation cost –Transfer of filtered results

18 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 18 Operator Placement Data-Inflating Operators –Expand the data –projections, image processing, some joins … DoubleResolution(), RotateSolid() Pull to the QPC –Data Shipping policy [FJK96] –Only send back raw arguments +Less data movement Cost: –Computation cost –Transfer of raw argument values

19 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 19 Placement Metric: VRF Volume Reduction Factor : Given operator and relation R, then VDT - volume of data transmitted after applying to R VDA - volume of data originally present in R is Data-Reducing VRF < 1 Composite() is Data-Inflating VRF 1 DoubleRes()

20 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 20 Goal: Plans with small CVRF Cumulative Volume Reduction Factor: Given a plan P to solve query Q over relations R1, …, Rn CVDT - volume of data transmitted by applying all operators in P to R1, …, Rn CVDA- volume of data originally present in R1, …, Rn Search Space Optimizer searches for plans that move minimal amount of data. CVRF(Plan) [0,1]

21 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 21 Performance Evaluation Goals of this study: –Measure how good code shipping can be –Validate heuristics being proposed VRF CVRF –Guide implementation of the optimizer Configured MOCHA with plans that place operators based on heuristics.

22 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 22 Experimental Environment Sequoia 2000 Benchmark –scientific data - points, polygons, satellite images –Distributed applications Software and Hardware: –JDK 1.2 –QPC - Sun Ultra 60, Solaris 2.6 –DAPs - Sun Ultra 1, Sun Ultra5, Solaris 2.6 –Data Sources 2 Informix IUS 9.12 Server –10 Mpbs Ethernet

23 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 23 Reducing vs. Inflating Runnning Time (secs) QPC DAP Query Class Q1Q2Q3 Query classes –Composite of all images –Clipping and sub-setting –Double resolution of images Performance gains –composites 99% data reduction 4-1 better performance –clipping and expansion 80% data reduction 3-1 better performance Validates heuristics

24 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 24 VRF vs Selectivity Select graphs identifiers based on number of vertices and arc length 5Selectivity [HS93] and cardinality [HKWY97] are not enough for distributed predicate placement Need to also consider size of arguments for predicates! Consider 50% selectivity –DAP CVRF = 0.01 –QPC CVRF = 1 Runnning Time (secs) Selectivity QPC DAP QPC DAP QPC DAP QPC DAP QPC DAP VRF is a better metric

25 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 25 Implementation Status Operational System –SIGMOD 2000 Demo Experimental deployment of MOCHA –NASA Earth Scientists (ESIP Federation) –Goddard Space Flight Center –NCSA Land Cover Visualization Tool

26 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 26 Summary and Conclusions Proposed a new Middleware Architecture: MOCHA –Automatic Code Deployment (self-extensible) Shipping Java classes –Query processing based on data movement reduction Proposed VRF metric for placement of functions –Better than selectivity and result cardinality Future work –Deployment of MOCHA for NASA ESIP Federation –Full implementation of MOCHA Optimizer More Info: –http://mocha.umiacs.umd.edu/

27 SIGMOD 2000 M. Rodriguez-Martinez – N. Roussopoulos 27 Problem 2: Query Processing Client Oracle 8iInformixXML DataText Data Internet Translator Not Scalable – Inefficient evaluation of queries Integration Server Catalog 100MB 200MB


Download ppt "© Copyright 2000 M. Rodriguez-Martinez, All Rights Reserved MOCHA : A Self-Extensible Database Middleware System for Distributed Data Sources Manuel Rodriguez-Martinez."

Similar presentations


Ads by Google