Presentation is loading. Please wait.

Presentation is loading. Please wait.

Economics of Computations and Job-Specific Service Level Agreements Bin Li. and Dr. Lee Gillam. Department of Computing, FEPS.

Similar presentations


Presentation on theme: "Economics of Computations and Job-Specific Service Level Agreements Bin Li. and Dr. Lee Gillam. Department of Computing, FEPS."— Presentation transcript:

1 Economics of Computations and Job-Specific Service Level Agreements Bin Li. and Dr. Lee Gillam. Department of Computing, FEPS

2 Outline Part I Service Level Agreement SLA definition SLA type Non-Negotiable: AWS, GAE Examples Negotiable: job-specific, task oriented Simple service brokerage use case Aim: to build job-specific comparison service for computational market SLA frameworks Proposed SLA structure: based on WS-agreement standard Service level characteristics Availability, performance, autonomic, security Potentials in computational market Motivations and literatures

3 Level of service is formally defined between service provider and service consumer Legal service contract: rights and liabilities. Provider: reputation Consumer: trust basis What the services will deliver? How the services are used? Choose which provider? Legal agreement document Services description Requirements Charges Legal issues (rights and liabilities) Penalty / Compensation Service Level Agreements (SLAs) 3

4 SLA Type Non-Negotiable: Pre-defined, abstract and obscure, In favor of provider, general provider liabilities are documented to satisfy the most common requirements of consumer, Least rights for the consumer, Common in Cloud service Involved penalty: usage credits or stop using service Negotiable: Legal document, long and boring with lots of legal terms, difficult to understand SLAs cont.

5 (standard edition agreement) … ANY USE (of APP service) THEREOF SHALL BE AT CUSTOMER'S OWN RISK. GOOGLE AND ITS LICENSORS MAKE NO WARRANTY OF ANY KIND … NON-INFRINGEMENT. GOOGLE ASSUMES NO RESPONSIBILITY FOR THE PROPER USE OF THE SERVICE. … GOOGLE MAKES NO REPRESENTATION THAT GOOGLE (OR ANY THIRD PARTY) WILL ISSUE UPDATES OR ENHANCEMENTS TO THE SERVICE. GOOGLE DOES NOT WARRANT THAT THE FUNCTIONS CONTAINED IN THE SERVICE WILL BE UNINTERRUPTED OR ERROR FREE. Google Apps (SLAs) (SLA) During the Term of the applicable Google Apps Agreement, the Google Apps Covered Services web interface will be operational and available to Customer at least 99.9% of the time in any calendar month (the "Google Apps SLA"). If Google does not meet the Google Apps SLA, and if Customer meets its obligations under this Google Apps SLA, Customer will be eligible to receive the Service Credits (not money back but 3 to 15 days longer service)... Customer must notify Google within thirty days from the time Customer becomes eligible to receive a Service Credit. Failure to comply with this requirement will forfeit Customer’s right to receive a Service Credit. 5

6 Amazon Web Service (SLAs) (EC2) AWS will use commercially reasonable efforts to make Amazon EC2 available with an Annual Uptime Percentage (defined below) of at least 99.95% during the Service Year. In the event Amazon EC2 does not meet the Annual Uptime Percentage commitment, you will be eligible to receive a Service Credit. (S3) AWS will use commercially reasonable efforts to make Amazon S3 available with a Monthly Uptime Percentage (defined below) of at least 99.9% during any monthly billing cycle (the “Service Commitment”). In the event Amazon S3 does not meet the Service Commitment, you will be eligible to receive a Service Credit (10% to 25% of you monthly billing). “The test for commercially reasonable efforts is less stringent than that imposed by the ‘best efforts’ clauses contained in some agreements.” -- http://definitions.uslegal.com/c/commercially-reasonable-efforts/ To receive a Service Credit, you must submit a request (i) include your account number … (ii) include … the dates and times of each incident of Region Unavailable that you claim to have experienced including instance ids of the instances that were running and affected during the time of each incident; (iii) include your server request logs … (iv) … within thirty (30) business days … 99.95% availability = 0.178days/year down = 4.3 hours/year down

7 SLAs cont. ---- Job-specific SLA SLA Type Negotiable: End-User with critical data or applications requirements, Representing more flexible user requirements Job/Application-specific, task-oriented Handel manually: inefficient Dynamic SLAs Job-/Application-Specific SLA Can be applied to both types of SLAs Describe services of particular submitted task Server management: automatically and dynamically (autonomic) create SLAs while the user demand changes, per-job SLA, different from ITIL (a general continual SLA) Concept and practice of SLA brings the notion of risk management into computational market System performance monitoring: system availability, forecasting Ensure QoS: Act as a contract between providers and users, negotiate with brokers. Clarifies the business nature and parties’ obligations 7

8 Simple Service Brokerage Use Case

9 Use case cont. 9

10

11 Aim: provide the same kind of comparison service for compute resources (Cloud service). Goods: compute service which is job-specific. Retailers: resource providers (Amazon, Rackspace, Microsoft, Google). Invoice: SLA. Other factors: Availability (risk or availability confidence), Insurance, Price Penalty etc. Key: machine readable (automation, autonomic and efficiency) Objective 11

12 Structure Xml example TWO FRAMEWORKS: Web Service Agreement (WS-Agreement. OGF): GRAAP, part of Service-Oriented Architecture (SOA), XML syntax, Machine readable. Web Service Level Agreement (WSLA): IBM Cloud computing use case group WS-Agreement: SDTs: identify the work to be done the required platform; the software involved; the set of expected arguments; input/output resources; etc. GTs: provide assurance between provider and consumer on quality of service (QoS) price of the service; insurance price; the probability of failure; the penalty for failure; the starting time the probability of completion; etc. SLA Frameworks &WS-Agreement Structure

13 Cloud Service Levels (Characteristics) Availability: (how often the service can be accessed over a time horizon) Numbers of “Nines”: S3: monthly: 99.9% availability = outrage 43.2 minutes/month Who should define unavailability? EC2: “Unavailable” means that all of your running instances have no external connectivity during a five minute period and you are unable to launch replacement instances. Job-specific: future resource availability Reliability: how well consumer trust the provider Related to availability, but slightly different; consumer opinion Combine cloud offerings: great power and flexibilities but less reliability Confidence Level: How confidence the provider itself with its availability “nines”? Job-specific: probability of (job) completion Performance: Throughput: how quick the service respond; Load balancing: how the overload is avoid; Elasticity: ability of growing infinitely with limitations; Linearity: the system performance as workload increases; Agility: how quick when respond to scaling up or down; Data durability: the likelihood of data loss; etc. Autonomic: monitoring, automation and dynamic, machine readable. Security: privacy, data encryption, legal issues 13

14 S3: 99.9% -> 99.99%, 43.2minutes/month -> 4.32 minutes/month 3 providers, 99% each, and independent, an application implemented across 3, availability becomes 99%^3, to 97% 1 provider, 3 zones, each 99%, increase application availability “nine”s by add the number of zones where it running. 1 zone, 99%; 2 zones 99%+99%*1%, 99.99%; 3 zones 99.9999%

15 Grid, Utility, Cloud…… Computing Potential computational market

16 Grid, Utility, Cloud…… Computing Computational Market Biggest structure change in IT since 1960s. TechMarketView: by 2012, uk software market 15% will be delivered by Cloud. (22% are applications) Potential computational market 16

17 Grid, Utility, Cloud…… Computing Computational Market Economics Issues Service Level Agreements......................................absent: Pricing, Liability, etc. Risk Assessment Potential computational market

18 Time series Analysis Grid, Utility, Cloud…… Computing Computational Market Economics Issues Service Level Agreements......................................absent: Pricing, Liability, etc. Risk Assessment Resource Monitoring ........Analysis Analogy........  Derivatives Risk Ana Financial Derivatives Financial Risk Management Measures Financial Market ..................................  ................................  ................................................................  Resource PoF Firms PoD ...................................  Potential computational market 18

19 Financial Grids: Macleod G., Donachy P., Harmer T.J., Perrot R. H., Conlon B., Press J., Lungu F., “Implied Volatility Grid: Grid Based Integration to Provide On Demand Financial Risk Analysis”, Belfast e-Science Centre, Queen’s University of Belfast, 2005. Donachy P., Stødle D., “Risk Grid - Grid Based Integration of Real-Time Value-at-Risk (VaR) Services”, EPSRC UK e-Science All Hands Meeting, 2003. Germano G., Engel M., “City@home: Monte Carlo derivative pricing distributed on networked computers”, Grid Technology for Financial Modelling and Simulation, 2006. Schumacher J., Jaekel U., and Zimmermann F., “Grid Services for Derivatives Pricing”, Grid Technology for Financial Modelling and Simulation, 2006. Computational economics: Gray, J. (2003): Distributed Computing Economics. Microsoft Research Technical Report: MSRTR-2003-24 (also presented in Microsoft VC Summit 2004, Silicon Valey, April 2004) Chetty, M. and Buyya., R. (2002). Weaving electrical and computational grids: How analogous are they? Computing in Science and Engineering, to appear, May/June 2002. Kenyon, C. and Cheliotis, G. (2002). Architecture requirements for commercializing grid resources. In 11th IEEE International Symposium on High Performance Distributed Computing (HPDC'02). Kenyon, C. and Cheliotis, G. (2003), Grid Resource Commercialization: Economic Engineering and Delivery Scenarios. Grid Resource Management: State of the Art and Research Issues. Kerstin, V., Karim, D., Iain, G. and James, P. (2007), AssessGrid, Economic Issues Underlying Risk Awareness in Grids, LNCS, Springer Berlin / Heidelberg Birkenheuer, G., Hovestadt, M., Voss, K., Kao, O., Djemame, K., Gourlay, I., Padgett,J.: Introducing Risk Management into the Grid. Proc. 2nd IEEE Intl. Conf. on e-Science and Grid Computing, Amsterdam, The Netherlands (2006) Background and literature:

20 Financial MarketComputational Market Resources Equities, Commodities, Currencies... Financial derivatives Computers, workstations, Network speed, clusters… computational power Capacity characteristic Storable (Stock) / Non-storable (futures, forwards) Non-storable Analysis Underlying prices changes time series Resource usage time series Time horizon Holding period (Hourly, daily, weekly, yearly) Hourly, daily, weekly, yearly PortfolioMany Resources (assets)Many Computer resources ConfidenceConfidence Level / percentileConfidence of resources availability ResultThe expected worst lossOptimize the resource use RiskMarket losses Resource Portfolio probability of Failure DefaultCompany probability of defaultResource probability of failure Comparison 20

21 Summary Part I Service Level Agreement SLA definition SLA type Non-Negotiable: AWS, GAE Examples Negotiable: job-specific, task oriented Simple service brokerage use case Aim: to build job-specific comparison service for computational market SLA frameworks Proposed SLA structure: based on WS-agreement standard Service level characteristics Availability, performance, autonomic, security Potentials in computational market Motivations and literatures

22 Outline Part II Analogy: Financial market Financial market vs. computational market Financial risk management, portfolio theory Value-at-Risk (option free portfolio) Credit Risk CDS, CDO Default probability (Moody’s KMV) Asset market value and volatility Distance of Default Probability of Default Constructing Job-specific SLA Building probability of failure Building probability of completion Building job-specific charges Managing multiple Job-specific SLAs (providers) Conclusion and Future Work

23 Thank you for your attention Questions

24 Computational Economics and Job-Specific Service Level Agreements Bin Li. and Dr. Lee Gillam. Department of Computing, FEPS

25 Aim: provide the same kind of comparison service for compute resources (Cloud service). Goods: compute service which is job-specific. Retailers: resource providers (Amazon, Rackspace, Microsoft, Google). Invoice: SLA. Other factors: Availability (risk or availability confidence), Insurance, Price Penalty etc. Key: machine readable (automation, autonomic and efficiency) Objective 25

26 Structure Xml example TWO FRAMEWORKS: Web Service Agreement (WS-Agreement. OGF): GRAAP, part of Service-Oriented Architecture (SOA), XML syntax, Machine readable. Web Service Level Agreement (WSLA): IBM Cloud computing use case group SDTs: identify the work to be done the required platform; the software involved; the set of expected arguments; input/output resources; etc. GTs: provide assurance between provider and requester on quality of service (QoS) price of the service; insurance price; the probability of failure; the penalty for failure; the starting time the probability of completion; etc. SLA Frameworks &WS-Agreement Structure

27 Cloud Service Levels (Characteristics) Availability: (how often the service can be accessed over a time horizon) Numbers of “Nines”: 99.95% availability = outrage 4.3 hours/year Who should define unavailability? EC2: “Unavailable” means that all of your running instances have no external connectivity during a five minute period and you are unable to launch replacement instances. Job-specific: future resource availability Reliability: how well consumer trust the provider Related to availability, but slightly different; consumer opinion Combine cloud offerings: great power and flexibilities but less reliability Confidence Level: How confidence the provider itself with its availability “nines”? Job-specific: probability of (job) completion Performance: Throughput: how quick the service respond; Load balancing: how the overload is avoid; Elasticity: ability of growing infinitely with limitations; Linearity: the system performance as workload increases; Agility: how quick when respond to scaling up or down; Data durability: the likelihood of data loss; etc. Autonomic: monitoring, automation and dynamic, machine readable. Security: privacy, data encryption, legal issues

28 Outline Part II Analogy: Financial market Financial market vs. computational market Financial risk management, portfolio diversification Value-at-Risk (option free portfolio) Credit Risk CDS, CDO Credit rating: default probability (Moody’s KMV) Asset market value and volatility Distance of Default Probability of Default Constructing Job-specific SLA Building probability of failure Building probability of completion Building job-specific pricing Managing multiple Job-specific SLAs (providers) Conclusion

29 Time series Analysis Grid, Utility, Cloud…… Computing Computational Market Economics Issues Service Level Agreements......................................absent: Pricing, Liability, etc. Risk Assessment Resource Monitoring ........Analysis Analogy........  Derivatives Risk Ana Financial Derivatives Financial Risk Management Measures Financial Market ..................................  ................................  ................................................................  Resource PoF Firms PoD ...................................  Potential computational market

30 Grid for Financial Risk Analysis Risk Fact: Risk is an integral part of the real world in general, and the financial world in particular. Market Grid infrastructures in Bank of America and HSBC: 3000 to 6000 processors Computational services market: Customers willing to pay for use of computer systems instead of purchasing and maintaining hardware and software. Grid / Cloud: HP, Amazon, Sun, IBM etc. Financial Risk Management: Monitory based, losses or profits. Risk can only be reduced (Mitigated) but never eliminated. Fundamental risk management theory: Portfolio (diversification). To ensure market event has reduced impact on the whole portfolio Depends on the correlation or covariance of the return and other assets. Diversified portfolio: standard deviation of each asset; correlation among assets Useful analysis measurements (models): Mean-Variance; Correlation; The sensitivities (The Greeks); Value-at-Risk

31 Value-at-Risk (VaR) Defined by Philippe Jorion, Value at Risk theory “summarizes the worst maximum potential loss in value of a portfolio of financial instruments over a certain target horizon with a given level of confidence”. 3 Components: Confidence Level (Quantiles), Holding Period (Time Horizon) Monetary Base.

32 Value-at-Risk (VaR)

33 Monte Carlo Simulation using Condor DAG Value-at-Risk (VaR) Methods Comparison

34 VaR Monte Carlo Simulation Evaluation Single Financial Instrument MSC Speedup Option-free Financial Portfolio MSC Speedup

35 Credit Risk Associated with the risk that a reference entity or an obligor who fails to meet its repayment in due time. Repayment: principles and debts The credit risk = firm default risk There are two main determinants of credit risk: Loss Given Default (LGD). (Distance to Default) or Probability of Default(PD), that is, the probability that the debtor does not pay. accounting-based models market-based models (Moody’s KMV)

36 Moody’s KMV-Merton

37 The market value of the business, reflecting the equity market’s expectations of future cash flows Not directly observable—implied from the market value of equity and the book liabilities using option pricing theory Reflects deterioration & improvement before book assets or earnings Market values are dynamic and forward looking; they are the source of the model’s predictive power The Market Value of Assets

38 A measure of the liabilities due in the event the firm is in distress Non-cash and long-term obligations put less financial stress on the firm Firms often increase leverage as they deteriorate The Default Point captures the point where the typical firm defaults The Default Point

39 A Measure of Business Risk. The uncertainty around the market value of the business Reflects the degree of difficulty in forecasting the future cash flows Quantifies business risk: larger firms in the same industry tend to have lower volatility Computed by “de-levering” Equity Volatility Asset Volatility

40 Moody’s KMV-Merton Cont.

41

42 1 Yr Distribution of asset value at horizon Asset Value Today EDF Time Value Default Point Distance-to-Default Asset Volatility Moody’s KMV Possible asset value path

43 Some results of company PD

44 The Bridge Risk analysis Complex financial products and markets Service-based Financial Grids Computational Economics compute Resources Risk-balanced portfolio Develop possible formulation provide construct

45 Grid based financial risk analysis applications (Financial Grids): - Great demands on available resources; - Assume availability at any given time. Aim: -Ability to predict (risks of resource availability for) the predictability (risks on historical use portfolio). Major impetus for work - Uncertainty: availability of computation Resource - Predict future resource availability: computation Resource Monitoring The Bridge

46 Building probability of failure Closest work: Kerstin et al: risk-aware Grid architecture. Kerstin, V., Karim, D., Iain, G. and James, P., “AssessGrid, Economic Issues Underlying Risk Awareness in Grids”, LNCS, Springer Berlin / Heidelberg, 2007 Specific financial analysis for creating computation economy over queuing- based systems.  Computation Economy as a commodity market; Due considerations: 1. For trading and hedging of risk, options, futures and structured products. 2. Collecting data: historical computation resource use -> predict future resource use for such class of applicatioons. 3. Construction of portfolios of computer resources (Extension of financial models (CDOs) offers potential for a future market in computation economics). Diversify the risk (resource probability of failure) within the overall portfolio.

47 CPU usage (Real Time, year data) CPU usage (Changes, year data) CPU usage (Changes, MC simulated, normal) Predict Future Resource Availability Grid Resource Historical Usage Analyzing: Data source: UK’s National Grid Service (NGS) Monitoring system: Ganglia Grid middleware: Globus Data dimensions: 37 system metrics in XML, including use of network bandwidth, temperature and CPU use Minimum capture interval: 15 seconds Measurements: Distribution analysis Skewness, Kurtosis analysis Prediction: Simulation under Normal distribution assumption Simulation under Laplace distribution assumption

48

49 Building job-specific charges Price Comparison Service: Ami: computation resource price benchmark. Amazon Web Service: success Cloud business model; computation resource cost in real market.

50 (Price obtained in Dec, 2009) AWS Linux Instances Monthly Charge Moderate I/O base instance price 434Mb Ubunt u image costs LargeSmall VaR (Small) Amazon EC2 VM per Instance instance-hour (or partial hour) $0.44$0.11 EC2 Bandwidth In$0.10 $0.01 Out$0.17 $0.01 S3 use Outbound data transfer (per month) $0.17 $0.01 other$0.30 TAXES15% Total Cost (incl. VAT)$1.36$0.98$0.51 Some REAL Reliability: Of 64 instances in 10 experiments, only 7 completed (1 failing node in other 3) VaR (640,000 simulation) AWSCondor Eucalypt us Overall submission (seconds) 9095228 Cost ($)0.51 0.48 (90/95 *0.51) 0.20 (90/228* 0.51) Price benchmark

51 Building probability of completion Foster’s Hypothesis

52 106s 234s 76s

53 Is a Cloud better than a Supercomputer? Grid/HPC: shorter application runtime and less distributions Cloud: longer application runtime and larger distributions ready and relatively easy to use. Performance

54 Financial CDO Future commercialized computational market: multiple providers (SLAs) Collateralized Debt Obligations (CDOs) Structured transaction Generic CDO: Special Purpose Vehicle (SPV) Underlying assets Collateral Management Tranche Management Risk-identified chunks: Tranches (in the order that secured to be get paid. Eg. AAA; AA; BBB; BB and equity) Premium: basis points for each tranche CDO Components Managing multiple SLAs

55 Constructing Resource CDO Processes: sort resources among the system into different classes according to the historical information. make different basis points with premium to guarantee various performances. top class resource should have highest premium to insure the most availability and performance. resources CDO

56 Managing multiple SLAs (Autonomic SLAs) Dynamically alter themselves as the resource status changes. Strongly connected to the resource CDO, therefore the monitoring system. Also considers the situation while the job in tranches fails. The more expensive and lower risk submission is always guaranteed completion. Protects the processes in the more senior tranches. Protecting the brokers. Multiple providers? Future grid and Cloud computing will benefit.

57 Analogy: Financial price changes – Computation resource usage changes Financial risk management – Risk assessment in computation market Financial derivatives – Service level agreement Firm probability of default – machine probability of failure Build Computation Economy: Key: Binding autonomic SLA with Risk analysis Aim: Computation price comparison Measuring risk: Predict the predictability (future resource availability) Risk mitigate: Resource CDOs Initial steps: predict future resource availability (probability of failure); building probability of completion; build job-specific service price benchmark; construct resource CDO; Conclusion

58 Future Work Objectives and Contributions To produce a methodology for calculating and evaluating resource portfolio risk of failure. Done: VaR, Option Black-Schole model implementations, sensitivities, correlation and moments’ analysis. Further work: Expected shortfall and related financial models research, provide an algorithm of calculating resource portfolio risk of failure. Constructing an algorithm to create on-the-fly resource tranches (resource CDO). Done: General portfolio selection techniques, historical data collection from NGS, future resource availability simulation with normal and Laplace assumption. Further work: Understanding of more complicated financial derivatives’ risk analysis, obtain long term Grid historical data from NGS, and finally to create an algorithm of constructing resource CDO. Automatic creation of SLAs. Further work: WS-Agreement standards with XML practice, an application for automatic creating SLAs To adapt the use of resource portfolio risk of failure and resource CDO, create autonomic SLAs. Combine the objectives all above. Extend our previous analysis as cloud is expanding. Future work

59 Further references Li, B., Gillam, L., and O'Loughlin, J. (2010) Towards Application-Specific Service Level Agreements: Experiments in Clouds and Grids, In Antonopoulos and Gillam (Eds.), Cloud Computing: Principles, Systems and Applications. Springer-Verlag. Li, B. and Gillam, L. (2009), Grid Service Level Agreements using Financial Risk Analysis Techniques, In Antonopoulos, Exarchakos, Li and Liotta (Eds.), Handbook of Research on P2P and Grid Systems for Service- Oriented Computing: Models, Methodologies and Applications. IGI Global. http://binlialfie.appspot.com/publications.html Thank you for your attention Questions


Download ppt "Economics of Computations and Job-Specific Service Level Agreements Bin Li. and Dr. Lee Gillam. Department of Computing, FEPS."

Similar presentations


Ads by Google