Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Grid Research Group Dept of Computer Science and Engineering The Ohio State University David Chiu and Gagan Agrawal Cost and Accuracy Sensitive Dynamic.

Similar presentations


Presentation on theme: "Data Grid Research Group Dept of Computer Science and Engineering The Ohio State University David Chiu and Gagan Agrawal Cost and Accuracy Sensitive Dynamic."— Presentation transcript:

1 Data Grid Research Group Dept of Computer Science and Engineering The Ohio State University David Chiu and Gagan Agrawal Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid Environments

2 Grid 2008, Tsukuba, Japan 2 Outline of Presentation Background and Motivation System Overview Workflow Enumeration Experimental Results Conclusion

3 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 3 Heterogeneous Data Gathering Background Geospatial datasets are collected by a variety of instruments Datasets are mostly stored in original low-level format across networks More measuring devices are being deployed Measuring Instruments are becoming more advanced

4 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 4 Heterogeneous Data Gathering Background Geospatial datasets are collected by a variety of instruments Datasets are mostly stored in original low-level format across networks More measuring devices are being deployed Measuring Instruments are becoming more advanced

5 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 5 Heterogeneous Data Gathering Background Geospatial datasets are collected by a variety of instruments Datasets are mostly stored in original low-level format across networks More measuring devices are being deployed Measuring Instruments are becoming more advanced

6 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 6 Heterogeneous Data Gathering Background Geospatial datasets are collected by a variety of instruments Datasets are mostly stored in original low-level format across networks More measuring devices are being deployed Measuring Instruments are becoming more advanced

7 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 7 The Current Availability of Tools Tools which provide access and manipulation methods to users are already available –NOAA, GLERL, USACE,... Motivation This is an abstraction!

8 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 8 So.. what goes on behind the scenes? Complexity of Querying Queries are planned and scheduled by the user Service Identification Phase: “what available tools can I use to get ‘water level’?” “return water level of (x,y) on mm/dd/yy at hh:mm” Data Identification Phase: “what available data can I use as input to these services?” Execution Phase: “In what order must I obtain data and execute services?”

9 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 9 “return water level of (x,y) on mm/dd/yy at mm:hh” Example Workflow Plan 1 Using water level gauge stations 1. Identify the K closest gauge stations to (x,y) 2. Obtain water level data for mm/dd/yyyy 3. Extract and interpolate readings of mm:hh within the n-minute intervals 4. Interpolate readings of the K gauge stations (x,y)

10 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 10 “return water level of (x,y) on mm/dd/yy at mm:hh” Example Workflow Plan 2 Using water surface model 1. Use surface model to predict water level at (x,y) 2. Obtain water level data for mm/dd/yyyy and extract readings (x,y, time)

11 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 11 Pitfalls User deals with multiple interfaces for data retrieval and services –Domain specific; cryptic to naive users User deal with data movement –Costly, time consuming –Prone to mistakes User may not be aware of other possibilities –Newer instruments produce more precise results –Other ways of obtaining same information at different accuracy levels complexity

12 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 12 Construction and enumeration of all possibilities! Queries as Workflows We want a system that can automate query planning process that can also meet user constraints! Which plan is the best... w.r.t. time? w.r.t. accuracy?

13 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 13 Workflow Definition Workflows are also equivalent to DAGs

14 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 14 System Overview

15 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 15 System Overview

16 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 16 Assume the following service retrieves a satellite image pertaining to (x,y) with resolution respective to r Questions to ask the system: –How to autonomously deduce that this service can be used? –How to determine what information is needed? –Did the user provide enough information to invoke this service? get_sat_image(double x, double y, double r) From perspective of service params Need for Domain Level Semantics needsInput longitudelatitudegrid_size outputsTo satellite image

17 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 17 System Ontology Applying Domain Information Domain concepts at the core of ontology –e.g., in the geospatial domain, –coordinates, water level, terrain Services and data sets known to derive a certain concept are also indexed In addition, services may require parameters pertaining to specific concepts

18 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 18 System Overview

19 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 19 Overall Goal Query Parser We use the Stanford NLP to process natural languages Goal: from the user query extract –the target domain concept –auxiliary concepts instantiated with given values “return satellite image of (xx,yy) with gridsize of 3m” Domain ConceptValue satellite image(query target) longitudexx latitudeyy gridsize3m

20 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 20 Concept Substantiation “return satellite image of (xx,yy) with gridsize of 3m” Starting from direct object.. - Merge modifiers - Form query’s target concept We want.. - - In addition, we implemented a domain level parser to deal with domain specific patterns such as (x,y), hh:mm:ss, etc. Finally.. - - Value of 3m is substantiated

21 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 21 System Overview

22 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 22 Workflow Enumeration Enumeration is essentially a depth-first traversal of the ontology with intermediate pruning UNROLL

23 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 23 System Overview

24 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 24 Workflow Planner Domain ConceptValue satellite image(query target) longitudexx latitudeyy gridsize3m input Workflow Construction Workflow Candidates Outputs an enumeration of candidates for execution

25 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 25 System Overview

26 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 26 Which candidate to execute?? Workflow Cost Workflow Candidates Workflow System Enumerate

27 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 27 Estimating Time Costs To exemplify... T(S1,...) = t x (S1) + t net (S1) +... Execution and transmission time of service S1 Terms txtx execution time of service t net network transmission time

28 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 28 Estimating Time Costs To exemplify... S2 and S3 can be fetched concurrently Terms txtx execution time of service t net network transmission time T(S1,...) = t x (S1) + t net (S1) + max(T(S2,...), T(S3,...))

29 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 29 Estimating Time Costs To exemplify... Recursively account for S2 Terms txtx execution time of service t net network transmission time T(S1,...) = t x (S1) + t net (S1) + max( (t x (S2) + t net (S2) + max(T(D1), T(S4,...), T(D2))), T(S3,...))

30 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 30 Estimating Time Costs To exemplify... Data elements only account for transmission time Terms txtx execution time of service t net network transmission time T(S1,...) = t x (S1) + t net (S1) + max( (t x (S2) + t net (S2) + max(t net (D1), T(S4,...), t net (D2))), T(S3,...))

31 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 31 Estimating Time Costs To exemplify... Reduce S4 Terms txtx execution time of service t net network transmission time T(S1,...) = t x (S1) + t net (S1) + max( (t x (S2) + t net (S2) + max( t net (D1), (t x (S4) + t net (S4) + max(T(D3))), t net (D2))), T(S3,...))

32 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 32 Estimating Time Costs To exemplify... Reduce D3 Terms txtx execution time of service t net network transmission time T(S1,...) = t x (S1) + t net (S1) + max( (t x (S2) + t net (S2) + max( t net (D1), (t x (S4) + t net (S4) + max(t net (D3))), t net (D2))), T(S3,...))

33 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 33 Estimating Time Costs To exemplify... Reduce S3 Terms txtx execution time of service t net network transmission time T(S1,...) = t x (S1) + t net (S1) + max( (t x (S2) + t net (S2) + max( t net (D1), (t x (S4) + t net (S4) + max(t net (D3))), t net (D2))), (t x (S3) + t net (S3) + t net (D3))

34 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 34 Estimating Time Costs The recursive sum is clear... Terms txtx execution time of service t net network transmission time T(S1,...) = t x (S1) + t net (S1) + max( (t x (S2) + t net (S2) + max( t net (D1), (t x (S4) + t net (S4) + max(t net (D3))), t net (D2))), (t x (S3) + t net (S3) + t net (D3))

35 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 35 How Do We Predict the Terms? Service execution time (t x ) –Each service is trained beforehand with various sized inputs –Modeled using multi-linear regression of minimizing MSE Data output size (d size ) –Known for files. Again, models are trained for services Network transmission time (t net ) –Bandwidth between nodes are typically known –Trivially, (d size / bandwidth)

36 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 36 Estimating Error Costs The recursive sum is similar for error propagation The errors,, attributed from services and data are implemented by domain scientists

37 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 37 Domain Specific Errors.. Recall Workflow Plan 1 for water level extraction? –Find nearest K water gauges and interpolate To interpolate: –where Z i are water level readings at station i –d i are distances from station i to the queried point The error associated with this method is computed as (x,y)

38 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 38 Cost Modeling http://monkey.cs.kent.edu/~dchiu/GeoServices/samplingDEM.php?wsdl <model type="errorModel" output="error" input="input.SIZE, rate.VALUE" equation="(1.0 / rate.VALUE) * input.SIZE" /> <model type="execTimeModel" output="time" input="input.SIZE, rate.VALUE" equation="1.48092163412359E-7 * input.SIZE + 41.14298326806695 * rate.VALUE + -8.071895317928808" /> <model type="outputModel" output="datasize" input="input.SIZE, rate.VALUE" equation="input.SIZE * rate.VALUE" />

39 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 39 ``return water level of at (482593, 4628522) on 01/30/2008 at 00:06'' Back to the Water Level Example Workflow Plan 1: K-InterpolationWorkflow Plan 2: Model Extraction [t_total=3.5001 t_x=1 t_d=0 o=47889 e=0.004] SRVC.getWL( X=482593 Y=4628522 StnID= [t_total=2.5 t_x=0.5 t_d=0 o=0 e=0.004] SRVC.getKNearestStations( Longitude=482593 Latitude=4628522 ListOfStations= [t_total=2 t_x=2 t_d=0 o=47889 e=0] SRVC.GetGSListGreatLakes() RadiusKM=100 K=3 ) time=00:06 date=01/30/2008 ) [t_total=2 t_x=2 t_d=0 o=47889 e=2.4997] SRVC.getWLfromModel( X=482593 Y=4628522 time=00:06 date=01/30/2008 ) Workflow Execution Time = 3.251 Workflow Error = 0.004 Workflow Execution Time = 1.674 Workflow Error = 2.4997

40 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 40 The Apriori Approach Pruning Intermediate Workflows Presumably, both E() and T() are nondecreasing functions –That is, as a workflow gains depth, these values either remain constant or increase Since workflows are built bottom-up, candidates not meeting user constraints can be pruned immediately Space efficient, faster convergence

41 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 41 Cost Model Overhead vs Pruning Experimental Results

42 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 42 Dynamic Accuracy Suggestion What If Neither Workflow Meets QoS? If the system determines that a workflow cannot meet users’ time constraints –Check if workflow exposes an accuracy parameter: sampling rate –Suggests a new value for parameters binary-search for the best possible value by repeatedly invoking the model: Sampling Rate 0.01.0 Time? Error? 0.01.0 Time? Error? sample more 0.01.0 sample less Time? Error?

43 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 43 Data Intensive Queries Experimental Results * Accuracy parameter is sampling rate of datasets

44 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 44 Actual Time vs. QoS Time - Query 1 Experimental Results * Accuracy parameter is sampling rate of datasets

45 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 45 Actual Time vs. QoS Time - Query 2 Experimental Results * Accuracy parameter is sampling rate of datasets

46 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 46 Actual Time vs. Bandwidth - Query 1 Experimental Results * Accuracy parameter is sampling rate of datasets

47 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 47 Actual Time vs. Bandwidth - Query 2 Experimental Results * Accuracy parameter is sampling rate of datasets

48 Cost and Accuracy Sensitive Dynamic Workflow Composition over Grid EnvironmentsGrid 2008, Tsukuba, Japan 48 Conclusion Our system offers –Cost propagation models for time and error –Novel Apriori dynamic workflow composition algorithm –Accuracy parameter adjustment for graceful adaptation to QoS constraints Questions - Comments? Contact Emails: –David Chiu (chiud@cse.ohio-state.edu)chiud@cse.ohio-state.edu –Gagan Agrawal (agrawal@cse.ohio-state.edu)agrawal@cse.ohio-state.edu


Download ppt "Data Grid Research Group Dept of Computer Science and Engineering The Ohio State University David Chiu and Gagan Agrawal Cost and Accuracy Sensitive Dynamic."

Similar presentations


Ads by Google