Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chair for Computer Science 6 (Data Management) Friedrich-Alexander-University of Erlangen-Nuremberg Michael Daum, Frank Lauterwald, Philipp Baumgärtel,

Similar presentations


Presentation on theme: "Chair for Computer Science 6 (Data Management) Friedrich-Alexander-University of Erlangen-Nuremberg Michael Daum, Frank Lauterwald, Philipp Baumgärtel,"— Presentation transcript:

1 Chair for Computer Science 6 (Data Management) Friedrich-Alexander-University of Erlangen-Nuremberg Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener 2011-09-23 IDEAS 2011 Black-box Determination of Cost Models’ Parameters for Federated Stream-Processing Systems

2 Agenda Problem Statement Calibration of Cost Models Function Approximation Estimating the Costs of Single Operators Evaluation Summary Perspective: Cost Estimation for Federated DSMS 2 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener 2011-09-23

3 Problem Statement 3 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener DSAM: heterogeneous distributed data stream processing Automatic cost-based query distribution Problem: hardware and DSMS specific cost models needed 2011-09-23

4 Things we know a priori 4 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener 2011-09-23 Operator graph Topology Data rates Selectivity Distribution of certain values For some operators: Cost model  Calibration of Cost Models Stream characteristics

5 Things we do not know a priori 5 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener 2011-09-23 Hardware and DSMS-specific parameters of cost models System costs For some operators: cost model  Function approximation

6 Calibration of Cost Models - Parameter Estimation 6 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Cost model consists of Stream and operator-dependent parameters Constant values Hardware/System/Implementation dependent values Test queries and input streams Different values for the stream and operator dependent parameters Cost Measurements Least squares Outlier detection (e.g. RANSAC) 2011-09-23

7 Function Approximation – Nonparametric Models 7 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener No appropriate cost model Operator without existing cost model Existing cost models could not be fitted to a specific system Solution: function approximation Radial Basis Function Network (RBNF) Function approximation instead of interpolation Less centers than input points Moore-Penrose pseudoinverse  least squares solution Improving the function approximation Iterative approach 1. Naive function approximation 2. Improving areas of interest (e.g. discontinuities, high gradient) 2011-09-23

8 Estimating the Costs of Single Operators 8 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Assumptions Only the system costs can be measured The costs of a single operator are independent of other operators  additivity System costs linear dependent on the number of operators Parallel instances of the same operator Latency Parallel operators  latency not dependent on the number of operators Operators have to be connected in series 2011-09-23

9 Evaluation 9 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Coral 8 Test setting Synthetic input streams with constant properties (rate, attribute value distribution) Every test query running for two minutes The test data collected in the first minute is discarded Measured values Latency Memory consumption (resident set size) CPU usage Coral8 status stream Input and output rate Query latency Application Memory 2011-09-23

10 Coral8 Measurements 10 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Filter operator Application memory CPU usage Unexpected behavior: steps and peaks 2011-09-23

11 Costs of Single Operators 11 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener CPU usage linear dependent on the number of operators Slope equals the costs of a single operator Operators 2011-09-23

12 Model Calibration and RBFN 12 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Application memory of the aggregate operator Left side: Calibrated cost model Linear cost model Right side: Function Approximation Adapts to the steps 2011-09-23

13 Cost Estimation for Operator Graphs 13 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Operator graph consisting of 100 parallel filter operators Cost estimation using function approximation 2011-09-23

14 Summary 14 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Cost estimation for black-box systems without cost estimators Calibration of a cost model Default cost model System-specific cost model Function approximation Calibration of a cost model for unknown systems Behavior conforming to cost model is required Nonconforming behavior can be detected (automatically) after some measurements Evaluation CPU usage and memory consumption can be estimated Latency: Queuing theory 2011-09-23

15 Application: Cost Estimation for Federated DSMS 15 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Cost formulas as metadata Cost formulas containing constants, variables and parameters Cost estimation Hardware-dependent and system-dependent parameters loaded from metadata catalog Operator-dependent variables by a metadata provider Stream-dependent variables by a monitoring component or an estimator Interpreter to calculate costs Advantages Both default and system specific cost formulas possible Cost models interchangeable at runtime 2011-09-23

16 Any questions…? 16 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener 2011-09-23

17 Generating Test Data and Test Queries 17 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Niko Pollner, Klaus Meyer-Wegener Identifying parameters Cost model based Identifying query or stream-dependent parameters Generating a set of test data for the parameters Mapping the parameters to the query language and stream properties Operator or query language based No existing cost model Function approximation Identifying important parameters based on the query language and possible stream properties Generating a set of test data 2011-09-23

18 Problem statement Global Query Graph Op1Op2 Op5 Op3Op4 Op6 Stream1 Stream2 Node 1 Node 2 Node 3 Distributed Query Processing Data Rate, Density, Statistics Out Data Rate, Density, Statistics ??? Relevant metadata about inner streams unknown ??? 18 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener SSDBM 2010

19 Propagation of Densities Propagation of input streams‘ statistics Propagation of statistics for inner streams between operators Propagation of statistics for output streams Statistical objective: Attribute Value Distribution (Density) Analytic Operator Model Accurate Formulas Numerical Operator Model Discrete Mappings Training of mapping relation Data Rate, Density, Statistics Operator Input-StreamOutput-Stream Operator Model Data Rate, Density, Statistics Analytic Operator Model Numerical Operator Model 19 Michael Daum, Frank Lauterwald, Philipp Baumgärtel, Klaus Meyer-Wegener SSDBM 2010


Download ppt "Chair for Computer Science 6 (Data Management) Friedrich-Alexander-University of Erlangen-Nuremberg Michael Daum, Frank Lauterwald, Philipp Baumgärtel,"

Similar presentations


Ads by Google