Xiao Liu, Jinjun Chen, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia {xliu, jchen,

Slides:



Advertisements
Similar presentations
L3S Research Center University of Hanover Germany
Advertisements

Learning on the Test Data: Leveraging “Unseen” Features Ben Taskar Ming FaiWong Daphne Koller.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Representing and Querying Correlated Tuples in Probabilistic Databases
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
The adjustment of the observations
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen Swinburne University.
© 2010 Pearson Prentice Hall. All rights reserved Single Factor ANOVA.
Chapter 3 Simple Regression. What is in this Chapter? This chapter starts with a linear regression model with one explanatory variable, and states the.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold Matthew J. Streeter Genetic Programming, Inc. Mountain View, California
Today Today: More on the Normal Distribution (section 6.1), begin Chapter 8 (8.1 and 8.2) Assignment: 5-R11, 5-R16, 6-3, 6-5, 8-2, 8-8 Recommended Questions:
Correlation and Regression Analysis
Overview of Robust Methods Analysis Jinxia Ma November 7, 2013.
Checking Regression Model Assumptions NBA 2013/14 Player Heights and Weights.
X. Liu, J. Chen, Z. Wu, Z. Ni, D. Yuan, Y. Yang, CCGrid10, , Melbourne, Australia Handling Recoverable Temporal Violations in Scientific Workflow.
The smokers’ proportion in H.K. is 40%. How to testify this claim ?
Hypothesis Testing II The Two-Sample Case.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Statistical Analysis of Loads
Xiao Liu CITR - Centre for Information Technology Research Swinburne University of Technology, Australia Temporal Verification in Grid/
Xiao Liu CS3 -- Centre for Complex Software Systems and Services Swinburne University of Technology, Australia Key Research Issues in.
Agenda Introduction Overview of White-box testing Basis path testing
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Selecting and Recruiting Subjects One Independent Variable: Two Group Designs Two Independent Groups Two Matched Groups Multiple Groups.
1 Chapter 13 Analysis of Variance. 2 Chapter Outline  An introduction to experimental design and analysis of variance  Analysis of Variance and the.
Xiao Liu, Jinjun Chen, Ke Liu, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia.
Geo597 Geostatistics Ch9 Random Function Models.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
3.3 Expected Values.
Univariate Linear Regression Problem Model: Y=  0 +  1 X+  Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 10 Verification and Validation of Simulation Models
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Copyright © 2009 Pearson Education, Inc. Chapter 19 Confidence Intervals for Proportions.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Xiao Liu 1, Yun Yang 1, Jinjun Chen 1, Qing Wang 2, and Mingshu Li 2 1 Centre for Complex Software Systems and Services Swinburne University of Technology.
X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen, Y. Yang, ICPADS10, , Shanghai, China An Effective Framework for Handling Recoverable Temporal Violations.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Copyright © Cengage Learning. All rights reserved. 3 Discrete Random Variables and Probability Distributions.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Surveying II. Lecture 1.. Types of errors There are several types of error that can occur, with different characteristics. Mistakes Such as miscounting.
Error Explanation with Distance Metrics Authors: Alex Groce, Sagar Chaki, Daniel Kroening, and Ofer Strichman International Journal on Software Tools for.
Week 21 Statistical Assumptions for SLR  Recall, the simple linear regression model is Y i = β 0 + β 1 X i + ε i where i = 1, …, n.  The assumptions.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Ch3: Model Building through Regression
Comparing Three or More Means
Confidence Intervals for Proportions
Estimating with PROBE II
Checking Regression Model Assumptions
Chapter 10 Verification and Validation of Simulation Models
Correlation and Regression
Verifying Self-adaptive Applications Suffering Uncertainty
Prepared by Lee Revere and John Large
Distribution of the Sample Proportion
Statistical Assumptions for SLR
Joanna Romaniuk Quanticate, Warsaw, Poland
Chapter 10 – Part II Analysis of Variance
Simulation Berlin Chen
Presentation transcript:

Xiao Liu, Jinjun Chen, Yun Yang CS3: Centre for Complex Software Systems and Services Swinburne University of Technology, Melbourne, Australia {xliu, jchen, Setting Temporal Constraints in Scientific Workflows

2 Introduction  Temporal Verification  Temporal QOS Framework Setting Temporal Constraints in Scientific Workflows  Problem Statement  A probabilistic strategy  Evaluation Conclusion Content

3 Introduction: Temporal Verification Scientific workflow verification: Structure, Performance, Resource, Authorisation, Cost and Time. In reality, complex scientific and business processes are normally time constrained. Hence:  Time constraints are often set when they are modelled as scientific workflow specifications.  Temporal consistency states, i.e. the tendency of temporal violations from consistency to inconsistency, need to be verified and treated proactively and accordingly. Temporal verification is to check the temporal consistency states so as to identify and handle temporal violations.

4 Temporal QOS Framework Constraint Setting  Setting temporal constraints according to temporal QOS specifications. Checkpoint Selection  Selecting necessary and sufficient checkpoints to conduct temporal verification. Temporal Verification  Verifying the consistency states at selected checkpoints. Temporal Adjustment  Handling different temporal violations.

5 Introduction  Temporal Verification  Temporal QOS Framework Setting Temporal Constraints in Scientific Workflows  Problem Statement  A probabilistic strategy  Evaluation Conclusion Content

6 Problem Statement Most current work adopts only few overall user specified temporal constraints without considering system performance.  Few overall constraints: not applicable for local verification and control.  User specified constraint: frequent temporal violations, huge exception handling costs.

7 Two Basic Requirements Temporal constraints should facilitate both overall coarse-grained control and local fine-grained control.  Coarse-grained constraints refer to those assigned to the entire workflow or workflow segments.  Fine-grained constraints refer to those assigned to individual activities. Temporal constraints should be well balanced between user requirements and system performance.

8 Probabilistic Strategy--Assumptions Two assumptions on activity durations  Assumption 1: The distribution of activity durations can be obtained from workflow system logs. Without losing generality, we assume all the activity durations follow the normal distribution model, which can be denoted as N(µ,σ 2 ).  Assumption 2: The activity durations are independent to each other.  Exception handling of assumptions : Using normal transformation and correlation analysis, or moreover, ignoring them first and then adding up afterwards.

9 Probabilistic Strategy--Definitions Weighted Joint Normal Distribution Specification of Activity Durations Probability based Temporal Consistency

10 Weighted Joint Normal Distribution The motivation for weighted joint normal distribution is to estimate the overall completion time of the entire workflow by aggregating the durations of all individual activities. However, they are not in a simple linear relationship. Our strategy is to model each activity duration as random variables and aggregate them according to four basic control- flow structures, i.e. sequence, iteration, parallelism and choice. Since most workflow process models can be easily built by the compositions of the four building blocks, similarly, we can obtain the weighted joint distribution of most workflow processes.

11 Specification of Activity Durations Maximum Duration, Mean Duration, Minimum Duration The 3σ rule depicts that for any sample comes from normal distribution model, it has a probability of 99.73% to fall into the range [µ-3 σ, µ+3 σ]. In our strategy, we have the following specification of activity durations:  Maximum Duration D(a i )= µ+3 σ  Mean Duration M(a i )= µ  Minimum Duration d(a i )= µ-3 σ

12 Probability based Temporal Consistency

13 Probabilistic Strategy—Overview

14 I Want the process be completed in 48 hours Let me check the probability The negotiation process Example: Setting Coarse-grained Constraints

15 That’s not good, how about 52 hours Sir, its 70%, do you agree? Adjust the constraint Example: Setting Coarse-grained Constraints

16 Err… how long will it take if I want to have 90% Then, it increases to 85% Adjust the probability Example: Setting Coarse-grained Constraints

17 Ok, that’s the deal! Let’s do it! It will take us 54 hours Negotiation result Example: Setting Coarse-grained Constraints

18 Ok! But, sir, I need to remind you that this is only a guarantee from statistic sense. If we cannot make it, please blame the guy who comes up with the strategy! Sorry, statistically, no predictions can be 100% sure! Example: Setting Coarse-grained Constraints

19 Example: Setting Fine-grained Constrains Setting fine-grained constraints for individual activities  Assume the probability gained from the last step is θ% that is with a normal percentile of λ. Then the fine-grained constraints for individual activities are (µ i +λσ i ).  For example, if the coarse-grained temporal constraints are of 90% consistency, that is a normal percentile of 1.28, then the fine-grained constraint for activity a i with a distribution of N(µ i, σ i 2 ) is (µ i +1.28σ i ).

20 Evaluation—System Environment Overview of SwinDeW-G environment

21 Step1: Weighted Joint Distribution

22 Step2: Coarse-grained Constraint Negotiation for coarse-grained constraint 6300s 6360s 6390s 6400s 66% 75% 79% 81% WS~N(6210,218 2 ) U(WS)=6400s, λ=0.87

23 Step3: Fine-grained Constraint

24 Introduction  Temporal Verification  Temporal QOS Framework Setting Temporal Constraints in Scientific Workflows  Problem Statement  A probabilistic strategy  Evaluation Conclusion Content

25 Conclusion Temporal verification is important in scientific workflows Setting temporal constraints is a prior task for temporal verification. Two basic requirements:  User requirements & System performance  Coarse-grained & Fine-grained temporal constraints A probabilistic setting strategy  Aggregation: Setting coarse-grained constraints  Propagation: Setting fine-grained constraints Evaluation proves to be effective

26 The End Thanks for your patience and attention!