Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Xiao Liu Sessional Lecturer, Research Fellow Centre of SUCCESS Swinburne University of Technology Melbourne, Australia Overview: Cloud Computing and.

Similar presentations


Presentation on theme: "Dr. Xiao Liu Sessional Lecturer, Research Fellow Centre of SUCCESS Swinburne University of Technology Melbourne, Australia Overview: Cloud Computing and."— Presentation transcript:

1 Dr. Xiao Liu Sessional Lecturer, Research Fellow Centre of SUCCESS Swinburne University of Technology Melbourne, Australia Overview: Cloud Computing and Workflow Research in NGSP Group

2 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Outline SUCCESS Centre and NGSP Group Background: Cloud Computing and Workflow Research Topics  Performance Management in Scientific Workflows  Data Management in Scientific Cloud Workflows  Security and Privacy Protection in the Cloud  Data Reliability Assurance in the Cloud  SwinDeW-C Cloud Workflow System Future Work and Conclusions 2

3 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 The Centre of SUCCESS SUCCESS: Swinburne University Centre for Computing and Engineering Software Systems  SUCCESS is the NO.1 Software Engineering Centre in Australia  SUCCESS is one of the 7 Tire 1 Centres at Swinburne University of Technology (Times World Ranking: 351- 400 ) The ambition of the Centre is to become the top centre for software research in the Southern Hemisphere within the next five years. To achieve world renowned software innovation and engineering with a balanced theoretic, applied, industry and education impact across the Centre 3

4 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 SUCCESS Research Focus Areas  Knowledge and Data Intensive Systems  Nature of Software  Next Generation Software Platforms  SE Education and IBL/RBL  Software Analysis and Testing  Software R&D Group http://www.swinburne.edu.au/ict/success/research- expertise/ http://www.swinburne.edu.au/ict/success/research- expertise/ 4

5 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 NGSP (Small) Group Overview This group conducts research into cloud computing and workflow technologies for complex software systems and services. Members: Leader: Prof Yun Yang (PC Member for ICSE 07/08, FSE09 ICSE 10/11/12) Researchers: A/Prof Jinjun Chen (UTS) Dr Xiao Liu (Postdoc) Dr Dong Yuan (Postdoc) Gaofeng Zhang Wenhao Li Dahai Cao Xuyun Zhang Chang Liu Jofry Hadi SUTANTO Others: Prof John Grundy Prof Chengfei Liu 5 Visitors: Prof Lee Osterweil Prof Lori Clarke Prof Ivan Stojmenovic Prof Paola Inverardi Prof Amit Sheth Prof Wil van der Aalst Prof Hai Zhuge

6 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Primary projects:  (Cloud) workflow technology  ARC LP0990393 (Y Yang, R Kotagiri, J Chen, C Liu)  Cloud computing  ARC DP110101340 (Y Yang, J Chen, J Grundy) Secondary project:  Management control systems for effective information sharing and security in government organisations  ARC LP110100228 (S Cugenasen, Y Yang) R&D Projects – Grants 6

7 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 SwinDeW workflow family including SwinDeW-C  Architectures / Models (D Cao)  Scheduling / Data and service management (D Yuan, X Liu)  Verification / Exception handling (X Liu) Cloud computing:  Data management (D Yuan, X Liu, W Li)  Privacy and Security (G Zhang, X Zhang, C Liu) R&D Projects – Overview 7

8 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011 X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-825, Nov./Dec. 2011. D. Yuan, Y. Yang, X. Liu and J. Chen, On ‑ demand Minimum Cost Benchmarking for Intermediate Datasets Storage in Scientific Cloud Workflow Systems. Journal of Parallel and Distributed Computing, 71:(316-332), 2011 J. Chen and Y. Yang, Localising Temporal Constraints in Scientific Workflows. Journal of Computer and System Sciences, Elsevier, 76(6):464-474, Sept. 2010 G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Journal of Computer and System Sciences, Elsevier, published online, Dec. 2011. Some Recent ERA A* Ranked Publications 8

9 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Outline SUCCESS Centre and NGSP Group Background: Cloud Computing and Workflow Research Topics  Performance Management in Scientific Workflows  Data Management in Scientific Cloud Workflows  Security and Privacy Protection in the Cloud  Data Reliability Assurance in the Cloud  SwinDeW-C Cloud Workflow System Future Work and Conclusions 9

10 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Background: Cloud Computing What is cloud computing? R. Buyya: "A Cloud is a type of parallel and distributed system consisting of a collection of inter-connected and virtualised computers that are dynamically provisioned and presented as one or more unified computing resources based on service-level agreements established through negotiation between the service provider and consumers.” I. Foster: " Cloud computing is a large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualised, dynamically-scalable, managed computing power, storage, platforms, and services are delivered on demand to external customers over the Internet. “ UC Berkeley: Cloud computing is utility computing plus SaaS. 10

11 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Why Cloud Computing Data explosion  TB (10 12 ), PB(10 15 ), exabyte (EB, 10 18 ), zettabyte (ZB, 10 21 ), yottabyte (YB,10 24 )  The total amount of global data in 2010:  Google processes ? data everyday in 2009:  Every day, Facebook 10T, Twitter 7T, Youtube 4.5T Moore's law vs. data explosion speed Buzzwords: data storage, data processing, parallel, distributed, virtualisation, commodity machines, energy consumption, data centres, utility computing, software (everything) as a service 11 1.2 ZB 24 PB

12 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Benefits of Clouds No upfront infrastructure investment  No procuring hardware, setup, hosting, power, etc.. On demand access  Lease what you need and when you need.. Efficient Resource Allocation  Globally shared infrastructure … Nice Pricing  Based on Usage, QoS, Supply and Demand, Loyalty, … Application Acceleration  Parallelism for large-scale data analysis… Highly Availability, Scalable, and Energy Efficient Supports Creation of 3rd Party Services & Seamless offering  Builds on infrastructure and follows similar Business model as Cloud 12

13 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Successful Stories Google Animoto, 750,000 sign up in three days, 25,000 access one hour, 10 times capability required, Amazon NY Times, articles from 1851 to 1980, accomplished in 24 hours at a cost of only US$240 Facebook, Saleforce CRM, IBM Research Compute Cloud ….. 13

14 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Cloud Computing Classification Cloud Services  IaaS: infrastructure as a service, Amazon S3, EC2  PaaS: platform as a service, Google App Engine  SaaS: software as a servcie, Saleforce.com Cloud Types  Public/Internet Clouds  Private/Enterprise Clouds  Hybrid/Mixed Clouds 14

15 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Example (PaaS): Hadoop Project The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using a simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Hadoop provides a reliable shared storage and analysis system  Storage provided by HDFS: a distributed file system that provides high-throughput access to application data  Analysis provided by MapReduce: a software framework for distributed processing of large data sets on compute clusters Hadoop for Yahoo! search Hadoop: The Definitive Guide (by Tom White) http://hadoop.apache.org/

16 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Cloud in Australia Gartner estimated the global demand in 2009 for cloud computing at $46 billion, rising to $150 billion by 2013 The Australian Government’s business operations, ICT costs around $4.3 billion p.a. Australian Government ICT Sustainability Plan 2010 – 2015, an energy efficient technology for the Australian Government Data Centre Strategy. The Department of Finance and Deregulation estimated that costs of $1 billion could be avoided by developing a data centre strategy for the next 15 years. Australian Taxation Office (ATO), Department of Immigration and Citizenship (DIAC),, and Australian Maritime Safety Authority (AMSA), proof of concept, initiatives The Australian Academy of Technological Sciences and Engineering (ATSE), opportunities and challenges for government, universities and business. Westpac, Telstra, MYOB, Commonwealth Bank, Australian and New Zealand Banking Group and SAP, initiatives to support the migration and running of their business applications in the cloud. 16

17 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Cloud in China The national twelfth five years plan http://www.chinacloud.cn/ http://www.china-cloud.com/ http://www.cloudcomputing-china.cn/ 17

18 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Background: Workflow The automation of a business process, in whole or part, during which documents, information or tasks are passed from one participant to another for action, according to a set of procedural rules. A Workflow Management System is a system that provides procedural automation of a business process by managing the sequence of work activities and by managing the required resources (people, data & applications) associated with the various activity steps. -- [Workflow Management Coalition] 18

19 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Why Workflow Originated from office automation Business process management, business agility Business process analysis, re-design Separation of workflow management system from software applications  Just like the separation of database management system from software applications Software component reuse, Web-services  Programming by scripting the composition of software components 19

20 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Workflow Applications Office automation, review and approve process Business process management systems, ERP systems Machine shops, job shops and flow shops Flight booking, insurance claim, tax refund… Scientific workflows IBM WebSphere Workflow Microsoft Windows Workflow Foundation  http://wm.microsoft.com/ms/msdn/netframework/introwf.wmv 20

21 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Workflow Reference Model 21

22 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 22 Example: Pulsar Searching Workflow Astrophysics: pulsar searching Pulsars: the collapsed cores of stars that were once more massive than 6-10 times the mass of the Sun http://astronomy.swin.edu.au/cosmos/P/Pulsar Parkes Radio Telescope (http://www.parkes.atnf.csiro.au/) Swinburne Astrophysics group (http://astronomy.swinburne.edu.au/) has been conducting pulsar searching surveys (http://astronomy.swin.edu.au/pulsar/) based on the observation data from Parkes Radio Telescope.http://astronomy.swinburne.edu.au/http://astronomy.swin.edu.au/pulsar/ Typical scientific workflow which involves a large number of data and computation intensive activities. For a single searching process, the average data volume (not including the raw stream data from the telescope) is over 4 terabytes and the average execution time is about 23 hours on Swinburne high performance supercomputing facility (http://astronomy.swinburne.edu.au/supercomputing/).http://astronomy.swinburne.edu.au/supercomputing/ left: Image of the Crab Nebula taken with the Palomar telescope right: A close up of the Crab Pulsar from the Hubble Space Telescope Credit : Jeff Hester and Paul Scowen (Arizona State University) and NASA Nebula telescope Pulsar Hubble Space Telescope Credit

23 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Pulsar Searching Workflow 23 Dr. Willem van Straten

24 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Outline SUCCESS Centre and NGSP Group Cloud Computing and Workflow Research Topics  Performance Management in Scientific Workflows  Data Management in Scientific Cloud Workflows  Security and Privacy Protection in the Cloud  Data Reliability Assurance in the Cloud  SwinDeW-C Cloud Workflow System Future Work and Conclusions 24

25 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Dr. Xiao Liu xliu@swin.edu.au http://www.ict.swin.edu.au/personal/xliu/ Performance Management in Scientific Workflows Research Topics

26 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 26 Workflow QoS QoS dimensions  time, cost, fidelity, reliability, security … QoS of Cloud Services Workflow QoS  the overall QoS for a collection of cloud services  but not simply add up!

27 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 27 Temporal QoS System performance  Response time  Throughput Temporal constraints  Global constraints: deadlines  Local constraints: milestones, individual activity durations Satisfactory temporal QoS  High performance: fast response, high throughput  On-time completion: low temporal violation rate

28 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 28 Problem Analysis Setting temporal constraints  Coarse-grained and fine-grained temporal constraints  Prerequisite: effective forecasting of activity durations Monitoring temporal consistency state  Monitor workflow execution state  Detect potential temporal violations Temporal violation handling  Where to conduct violation handling  What strategies to be used

29 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Ultimate Goal Achieving on-time completion Measurements:  Temporal correctness  Cost effectiveness 29

30 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Temporal Consistency Model Temporal correctness: workflow execution towards the satisfaction of temporal constraints Temporal consistency model defines the system running state at a specific workflow activity point (i.e. temporal checkpoint) against specific temporal constraints Basic elements: real workflow running time (before and including the activity point), estimated running time for uncompleted workflow (after the checkpoint), temporal constraints

31 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Probability Based Temporal Consistency Model Time attributes for workflow activity a i  Maximum activity duration: D(a i )  Mean activity duration: M(a i )  Minimum activity duration: d(a i )  Runtime activity duration: R(a i ) 3 sigm rule, normal distribution, 99.73%  (μ-3σ, μ+3σ), R(a i )~N(μ, σ)  D(a i )= μ+3σ, M(a i )= μ, d(a i )= μ-3σ

32 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Probability Based Temporal Consistency Model Type of Temporal Constraints  Upper bound temporal constraint, U(W)  Lower bound temporal constraint, L(W)  Fixed-time temporal constraint, F(W) Relationship  Upper bound, lower bound, symmetric  Upper bound, fixed-time, special case Choice  Upper bound/lower bound constraint for workflow build-time  Fixed-time constraint for workflow runtime

33 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Probability Based Temporal Consistency Model

34 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Probability Based Temporal Consistency Model

35 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Temporal Framework 35

36 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Temporal Framework Component 1: Temporal Constraint Setting  Forecasting workflow activity durations  Setting coarse-grained temporal constraints  Setting fine-grained temporal constraints Component 2: Temporal Consistency Monitoring  Temporal checkpoint selection  Temporal verification Component 3: Temporal Violation Handling  Temporal violation handling point selection  Temporal violation handling 36

37 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Component 1: Temporal Constraint Setting 37

38 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Forecasting Activity Durations Statistical time-series pattern based forecasting strategies Selected Publications:  X. Liu, Z. Ni, D. Yuan, Y. Jiang, Z. Wu, J. Chen, Y. Yang, A Novel Statistical Time-Series Pattern based Interval Forecasting Strategy for Activity Durations in Workflow Systems, Journal of Systems and Software (JSS), vol. 84, no. 3, Pages 354-376, March 2011.  X. Liu, J. Chen, K. Liu and Y. Yang, Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns, Proc. of 4th IEEE International Conference on e-Science (e-Science08), pages 23-30, Indianapolis, USA, Dec. 2008. 38

39 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Setting Temporal Constraints Probability based temporal consistency model Time analysis based on Stochastic Petri Nets Selected Publications:  X. Liu, Z. Ni, J. Chen, Y. Yang, A Probabilistic Strategy for Temporal Constraint Management in Scientific Workflow Systems, Concurrency and Computation: Practice and Experience (CCPE), Wiley, 23(16):1893- 1919, Nov. 2011.  X. Liu, J. Chen and Y. Yang, A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows, Proc. 6th International Conference on Business Process Management (BPM2008), Lecture Notes in Computer Science, Vol. 5240, pages 180-195, Milan, Italy, Sept. 2008. 39

40 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Component 2: Temporal Consistency Monitoring 40

41 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Temporal Consistency Monitoring Minimum (Probability) Time Redundancy based Checkpoint Selection Strategy Temporal Dependency based Checkpoint Selection Strategy Selected Publications:  X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-825, Nov./Dec. 2011.  J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011

42 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Component 3: Temporal Violation Handling 42

43 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Violation Handling Violation Handling Point Selection (Probability) Time deficit allocation Workflow local rescheduling strategy – ACO, GA, PSO Selected Publications:  X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen and Y. Yang, A Novel General Framework for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in Scientific Workflow Systems, Journal of Systems and Software, vol. 84, no. 3, pp. 492-509, 2011  X. Liu, Y. Yang, Y. Jiang and J. Chen, Do We Need to Handle Every Temporal Violation in Scientific Workflow Systems, submitted to ACM Transactions on Software Engineering and Methodology 43

44 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Experiment Results on Temporal Violation Rates 44

45 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Cost Analysis

46 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Yearly Cost and Time Reduction Yearly cost reduction for the pulsar searching workflow Yearly time reduction for the pulsar searching workflow

47 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 47 Dr. Dong Yuan, Dr. Xiao Liu dyuan@swin.edu.au, xliu@swin.edu.au dyuan@swin.edu.auxliu@swin.edu.au http://www.ict.swin.edu.au/personal/dyuan/ Data Management in Scientific Cloud Workflows Research Topics

48 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Data Management in Cloud Computing Scientific applications in cloud computing  Computation and data intensive applications  Massive computation and storage resources  Pay-as-you-go model Computation and storage trade-off  Some datasets should be stored (Storage cost)  Some datasets can be regenerated (computation cost) Data Placement

49 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Data Dependency Graph (DDG) A classification of the application data  Original data and generated data Data provenance  A kind of meta-data that records how data are generated. DDG

50 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Attributes of a Dataset in DDG A dataset d i in DDG has the attributes:  x i ($) denotes the generation cost of dataset d i from its direct predecessors.  y i ($/t) denotes the cost of storing dataset d i in the system per time unit.  f i (Boolean) is a flag, which denotes the status whether dataset d i is stored or deleted in the system.  v i (Hz) denotes the usage frequency, which indicates how often d i is used.

51 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Attributes of a Dataset in DDG  provSet i denotes the set of stored provenances that are needed when regenerating dataset d i.  CostR i ($/t) is d i ’s cost rate, which means the average cost per time unit of d i in the system. Cost = Computation + Storage  Computation: total cost of computation resources  Storage: total cost of storage resources

52 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Cost Model of Datasets Storage in the Cloud Total cost rate for storing datasets in a DDG  S is the storage strategy of the DDG This cost model also represents the trade-off between computation and storage in the cloud  For a DDG with n datasets, there are 2 n different storage strategies

53 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Minimum cost benchmark What is the minimum cost benchmark?  The minimum cost for storing and regenerating datasets in the cloud  The best trade-off between computation and storage in the cloud  We need to find the Minimum Cost Storage Strategy (MCSS) for the application datasets Significance of the minimum cost benchmark  Due to the pay-as-you-go model, cost-effectiveness is very important to users for deploying their applications in the cloud  The minimum cost benchmark is for users to evaluate the cost- effectiveness of their storage strategies.

54 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Static On-Demand Minimum Cost Benchmarking The static benchmarking is provided as an on- demand service for users  Whenever a benchmarking request comes, the corresponding algorithms will be triggered to calculate the minimum cost benchmark, which is a one-time only computation.  This approach is suitable for the situation that only occasional benchmarking is requested. CTT-SP algorithm  A novel algorithm designed to find the MCSS of a DDG with polynomial time complexity  CTT-SP: Cost Transitive Tournament Shortest Path

55 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Linear CTT-SP Algorithm CTT-SP algorithm for linear DDG Essences of the algorithm:  Construct a Cost Transitive Tournament based on DDG  In the CTT, every path (from the start to the end) represent a storage strategy of the DDG.  The paths have one-to-one mapping to the storage strategies.

56 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Linear CTT-SP Algorithm  Set weights to the edges in CTT  We denote the weight of the edge from d i to d j as, which is defined as “the sum of cost rates of d j and the datasets between d i and d j, supposing that only d i and d j are stored and the rest of datasets between d i and d j are all deleted”.  Formally:  The length of each path equals to the TCR (Total Cost Rate) of the corresponding storage strategy.

57 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Linear CTT-SP Algorithm  Find the Shortest Path from d s to d e in the CTT  The MCSS S min is to Store the datasets that P min traverses.  The minimum cost benchmark is

58 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 General CTT-SP Algorithm Take the simple DDG below as example (with a block) For a general DDG, we select one branch from the first dataset to the last dataset as main branch (e.g. { d 1, d 2, d 5, d 6, d 7, d 8 } ) to construct the CTT. For the rest of datasets, we denote them as sub branches (e.g. { d 3, d 4 } ).

59 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 General CTT-SP Algorithm The general CTT-SP algorithm is a recursive algorithm  For the sub branches, given different stored predecessors and successors, the MCSS would be different, hence cannot be calculated at the beginning.  In the general CTT-SP algorithm, we will recursively call it on the sub branches and dynamically add the cost rates to the edges in the CTT of the main branch

60 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Dynamic on-the-fly Minimum Cost Benchmarking The benchmarking service is delivered on the fly to instantly respond to the benchmarking requests  By saving and utilising the pre-calculated results, whenever the application cost changes in the cloud, we can dynamically calculate the new minimum cost and keep the benchmark updated.  This approach is suitable for the situation that more frequent benchmarking is requested at runtime. Partitioned Solution Space (PSS)  PSS saves all the possible MCSSs of a DDG segment.  For a DDG segment, given particular stored predecessors and successors, we can quickly locate the corresponding MCSS from the PSS.

61 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 PSS for a DDG_LS (Linear DDG Segment) A DDG_LS has different MCSSs according to its preceding and succeeding datasets’ storage statuses. CTT for a DDG_LS  Different selections of the start and end datasets ( d s and d e ) may lead to different MCSSs for the segment.

62 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 PSS for a DDG_LS Partition of the solution space We assume that S i,j and S i',j' be two MCSSs in the solution space SCR i,j < SCR i',j'. The border of S i,j and S i',j' in the solution space is that given particular X and V, the TCR of storing the DDG_LS with S i,j and S i',j' are equal. Hence we have Hence, the border of S i,j and S i',j' in the solution space is a straight line.

63 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 PSS for a DDG_LS If we assume, the equation can be further simplified to The figure below demonstrate the partition of the solution space.

64 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 PSS for a DDG_LS We can calculate the partition lines of all the potential MCSSs in the solution space, which form the PSS. With PSS, given any X and V, we can quickly locate the corresponding MCSS for the DDG_LS.

65 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Dynamic on-the-fly Minimum Cost Benchmarking PSS based benchmarking approach (key ideas)  Merge the PSSs of the DDG_LSs to derive the PSS of the whole DDG, from which the minimum cost benchmark can be obtained.  Save all the calculated PSSs along this process in a hierarchy.  Whenever the application cost changes, we can quickly derive the new minimum cost benchmark from the saved PSSs.  Hence, we can dynamically keep the minimum cost benchmark updated, so that benchmarking requests can be instantly responded on the fly.

66 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Saving PSSs We save all the PSSs of a DDG in a hierarchy  The level number indicates the number of DDG_LSs merged in the PSS at that level.  The link between two PSSs at Levels i and i+1 in the hierarchy means the corresponding DDG segment of the PSS at Level i+1 contains the DDG segment of the PSS at Level i.

67 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Cost-Effective Storage Strategies Cost Rate based Storage Strategy  The strategy directly compares generation cost rate and storage cost rate for every dataset to decide its storage status.  The strategy can guarantee that the stored datasets in the system are all necessary.  The strategy can dynamically check whether the re- generated datasets need to be stored, and if so, adjust the storage strategy accordingly.  This strategy is highly efficient with fairly reasonable cost effectiveness.

68 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Cost-Effective Storage Strategies Local-Optimisation based Storage Strategy  The strategy divides the DDG with large number of application datasets into small linear segments (DDG_LS).  The strategy utilise the linear CTT-SP algorithm to find the MCSS of every segment, hence achieves the local- optimisation  This strategy is highly cost-effective with very reasonable runtime efficiency.

69 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Pulsar Searching Application Case Study Analysing ONE PIECE of the observation data, six datasets are generated. We directly utilise the on-demand benchmarking approach  MCSS is storing d2, d4, d6 and deleting d1, d3, d5.  The minimum cost benchmark is $0.51 per day.

70 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 PSSs merging process There are two phases in the execution: 1) Files Preparation 2) Seeking Candidates. Two DDG_LSs are generated correspondingly.

71 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Pulsar Searching Application Case Study

72 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Pulsar Searching Application Case Study Datasets Strategies Extracted beam De-dispersion files Accelerated de-dispersion files Seek results Pulsar candidates XML files 1) Store none datasetDeleted 2) Store all datasetsStored 3) Generation cost based strategy DeletedStored Deleted Stored 4) Usage based strategy DeletedStoredDeleted 5) Cost rate based strategy Deleted Stored (deleted initially) DeletedStoredDeletedStored 6) Local-optimisation based strategy DeletedStoredDeletedStoredDeletedStored 7) Minimum cost benchmark DeletedStoredDeletedStoredDeletedStored

73 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Data Placement Compute near big data! In scientific cloud workflows, large amounts of application data need to be stored in distributed data centres, a data manager must intelligently select data centres in which these data will reside, by considering:  The dependencies between datasets  The movement of large datasets  Some data has fixed locations

74 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 A matrix based k-means clustering strategy Build-time: to group the existing datasets into k data centres based on data dependencies  Step 1: Setup and cluster the dependency matrix  Step 2: Partition and distribute datasets Runtime: to dynamically clusters newly generated datasets to the most appropriate data centres based on dependencies  Step 1: Data pre-allocation by the clustering algorithm  Step 2: Adjust data placement among data centres when new workflows are deployed or some data centres become overloaded

75 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015

76 76 Gaofeng Zhang gzhang@swin.edu.au Security and Privacy Protection in the Cloud Research Topics

77 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Background Data Security vs. Data Privacy Privacy in cloud computing  Massive data store and compute in open cloud environment  Customers cannot control inside cloud The severity of privacy risk in cloud computing One specific privacy risk in cloud computing  Indirectly private information (collectively information)  Normal service processes and functions (not disruption) The approach: noise obfuscation for privacy protection

78 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Privacy Protection in Cloud Roles in the view of privacy in regular IT system  Privacy owner, Privacy user and Privacy theft Privacy owner Privacy theft Privacy user Keep safe between Privacy owner and Privacy user!

79 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Privacy Protection in Cloud Microsoft’s View on Cloud Ecosystem Powerful, Green and Smart Cloud—IBM

80 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Privacy Protection in Cloud Roles in the view of privacy in Cloud  Privacy owner, privacy user and privacy theft Privacy owner Privacy theft Privacy user Virtualisation disable the “keeping safe between Privacy owner and Privacy user!”

81 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Noise Obfuscation(1) Background  Massive data stores and computes in open cloud environments.  Customers cannot control inside cloud. Main idea: “Dilute” real private information with noise information  Not noise signal!

82 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Noise Obfuscation(2) A Motivating example:  One customer, who often travels to one city in Australia, like ‘Sydney’, checks the weather report regularly from a weather service in cloud environments before departure. The frequent appearance of service requests about the weather report for ‘Sydney’ can reveal the privacy that the customer usually goes to ‘Sydney’. But if a system aids the customer to inject other requests like ‘Perth’ or ‘Darwin’ into the ‘Sydney’ queue, the service provider cannot distinguish which ones are real and which ones are ‘noise’ as it just sees a similar style of service request. These requests should be responded and cannot reveal the location privacy of the customer. In such cases, the privacy can be protected by noise obfuscation in general. From ‘data’ privacy to ‘process’ privacy!

83 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Noise Generation  Historical probability based noise generation strategy  Time-series pattern based noise generation strategy  Association probability based noise generation strategy  …… Noise Utilisation  Trust model and injection strategy for noise obfuscation  …… Noise Cooperation Mechanism  Privacy protection framework under noise obfuscation Research Topics

84 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 84 Wenhao Li wli@swin.edu.au Cost-Effective Data Reliability Assurance in the Cloud Research Topics

85 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 The growing of Cloud data:  It is estimated that by 2015 the data stored in the Cloud will reach 0.8 ZB, while more data are stored or processed temperately in their journey. (IDC)  The size of Cloud applications is also expanding Challenge:  How to reduce the data storage cost for using Cloud storage services without sacrificing data reliability assurance. Background

86 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Data reliability modeling in the Cloud Replication-based cost-effective data reliability management approaches Data loss detection and data recovery Research issues

87 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Incremental replication strategy CIR (Cost-effective Incremental Replication)  The generation of replicas follows an incremental pattern, in which replica is created only when current replicas cannot provide sufficient data reliability assurance to meet users requirement. Data reliability management mechanism based on proactive replica checking PRCR (Proactive Replica Checking for Reliability)  According to different data reliability requirements, each file have no more than two replicas stored in the Cloud.  A replica checking process is proactively conducted to detect data loss and recover replica. Replication-based Approaches

88 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 88 CIR can significantly reduce at most 2/3 of current Cloud storage cost, especially for data with short storage duration and low data reliability. PRCR can reduce 1/3 to 2/3 of current Cloud storage cost, especially when the data amount is big.

89 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 89 Dahai Cao dcao@swin.edu.au Cloud Workflow System Design and Development Research Topics

90 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 SwinCloud – Cloud Computing Testbed SwinCloud 90

91 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Prototype: SwinDeW-C Cloud Workflow System SwinDeW-C 91

92 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 New Progress Successfully deploy on the Amazon Cloud Eucalyptus: the cloud infrastructure platform

93 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 Call for paper and call for workshop 2012 International Conference on Cloud and Green Computing, Nov. 1-3, 2012, Xiangtan, Hunan, China  http://kpnm.hnust.cn/confs/cgc2012/ http://kpnm.hnust.cn/confs/cgc2012/ Important Dates:  Workshop Proposal: Ongoing as received  Submission Deadline:  June 30, 2012  Authors Notification: July 30, 2012  Final Manuscript Due: August 10, 2012  Registration Due: August 18, 2012 93

94 Xiao Liu, Cloud Computing and Workflow Research in NGSP Group, Sunday, August 23, 2015 End - Q&A Thanks for your attention! 94


Download ppt "Dr. Xiao Liu Sessional Lecturer, Research Fellow Centre of SUCCESS Swinburne University of Technology Melbourne, Australia Overview: Cloud Computing and."

Similar presentations


Ads by Google