Costs Institutional reluctance towards Cloud –Cloud is expensive, we can do it cheaper –(better/faster/more securely….) –Do we trust staff? –How much do we trust staff? Is there a padlock on the stationery cupboard? Do you have an institutional credit card in your pocket? (What’s your spending limit) Is there a BYOD policy? –How do we monitor what they are doing? –How do we stop them doing it?
Costs What does Cloud actually cost? –It depends Cost of transferring 1TB of data out of a range of cloud providers –cost of transfer does not include cost of running instances. Reservations, spot instances, and other exotic pricing mechanisms will only make for more difficult comparisons. Cost analysis of cloud computing for research, Curtis+Cartwright & Surrey, report available at the EPSRC website
Costs What are institutional costs? –We still don’t know! –Hidden goodwill (academics, researchers, students) – but what if that were more geared to the science instead? –Depreciation (balance sheet) –Depreciation (technology) –Utilization, and, or against, desire to switch-off –Opportunity cost (of research not done)? How encourage researchers to do Cloud rather than populate broom cupboards? –And research agencies / peer reviewers to consider flexibility over and above cost? Cost analysis of cloud computing for research, Curtis+Cartwright & Surrey, report available at the EPSRC website
Costs Researchers don’t care about such things? –Exposing costs can encourage efficiency but might inhibit activity Utilization a key factor in determining value –Which users, how much; how much waste? If scale frees researchers from infrastructural constraints, they may design larger, more demanding simulations and experiments –possible that increased use of cloud computing will lead to more and better science but with an associated increase in cost. Various recommendations –costs differ –but performance differences can help to explain costs
Benchmarks An alternative question: is it value for money? –Tendency to throw HPC loads at the Cloud (e.g. Magellan report) MPPTEST benchmark MPI, bandwidth on current EC2 clusters (Jan, 2012) compared to Walker’s results from 2008. Note that EC2 performance shows improvement above 2.0E+08. PG Placement Group
Benchmarks An alternative question: is it value for money? –“Utility supercomputing” – not yet equivalent, but if you don’t have handy access to a supercomputer….. –Cost of queue? MPPTEST benchmark, latency on EC2 clusters (Jan, 2012) compared to Walker’s results from 2008. Note that EC2 latency has also reduced towards 50 μs.
Benchmarks We view benchmarks as a means to an end. –Is it “good enough”? (Can be better than nothing) –Performance of applications predicated on performance of underlying (virtualized) resources –We know what the label on the box says, but what kind of present do we get? –Quality of Service (QoS) for Service Level Agreements (SLAs) –SLAs for Brokers (provider agreements may remain unchanged) –Various practical matters will not be addressed here.
Benchmarks We view benchmarks as a means to an end. –Tests on: AWS (several regions); Rackspace (UK, US); IBM SmartCloud (several DCs); a private Cloud (OpenStack) at Surrey. Many different machine types, and 2 Linux distros – > 4000 runs, usually 10 per. –Using Bonnie++ and IOZone for disk LINPACK for CPU flops STREAM for memory bandwidth iPerf for network bandwidth MPPTEST for MPI (see previous) A bzip2 application benchmark –We want “simplicity so the results are understandable”, following Gray Don’t try to optimize performance in any way – what is the present like when we simply take it “out of the box” and inspect it? Gray, J. (ed.) (1993), “The Benchmark Handbook For Database and Transaction Processing Systems”. Morgan Kaufmann.
Benchmarks We get lots of files of numbers. Cloudharmony STREAM (copy) results with test dates between April 2010 and March 2011 [accessed 30 January 2012] What we saw…. An artefact of the problem size used? Ram (MB) Virtual CPU (#) Instance Storage (GB) Architecture (- bits) Price (per UHR, £) IBM (Copper)4096Intel-based, 260RHEL6, 640.206 Rackspace (5)4096 AMD-based, not stated. 160Ubuntu10.04 640.16 Openstack (m1.medium) 4096210Ubuntu10.04 64-- AWS (m1.large) 7680 (7.5GB)4850 Ubuntu10.04 / RHEL6 64 0.34
Benchmarks What leads to variance in the (private) provider? –We created “noisy neighbours” (STREAM in parallel VMs) –Various behaviours
Benchmarks Performance variability over time? Three AWS instances (m1.small, ami-6936fb00, US-E). Tarball the root file system, then bzip2 compress, delete bz2 and repeat.
Benchmarks Network –connectivity within the provider (typically within a region); –connectivity in relation to any co-located instances as might be relevant to HPC-type activities - MPPTest; –connectivity amongst providers and regions such that data might be migrated to benefit from cost differences –we‘re about to try dealing with a 5TB 25TB dataset
Benchmarks Built with d3; http://vis.stanford.edu/papers/d3http://vis.stanford.edu/papers/d3 –Dynamic visualization – generate multiple charts per page by selecting subsets of available values. –Scaled to best result per benchmark. –Annotation per boxplot. –Download values per boxplot – simple CSV. –Available very soon! (But not on IE8 and other early browsers)
Benchmarks and Brokers Means to an end? –Cost fixed, but performance variable, and need workload estimations –AWS CloudWatch allows users to set alarms for various metrics such as CPUUtilization (as a percentage), DiskReadBytes, DiskWriteBytes, NetworkIn, and NetworkOut, amongst others. Benchmarks are highly related to this set of metrics. But unless an AutoScaling policy has been created, alarms will only be sent by email. –QoS parameters in SLAs Introduce Cloud Brokers: quality as differentiator. WS Agreement has notion of QoS parameters in SLAs; principal example is through a Key Performance Indicator (KPI) Target (wsag:KPITarget) as a Service Level Objective (wsag:ServiceLevelObjective), and relates to Response Time (wsag:KPIName). Examples elsewhere use Availability, and a threshold (e.g. gte 98.5, to indicate greater than or equal to 98.5%). Have to distinguish, inter alia, capability versus throughput. Probability of, and penalty for, failure. Lee Gillam, Bin Li, John O.Loughlin (2012) "Adding Cloud Performance to Service Level Agreements". 2nd International Conference on Cloud Computing and Services Science, CLOSER 2012. Accepted
Costs, Benchmarks and Brokers OGF Agreement Monitoring (source: WSAG4J, Agreement monitoring). Lee Gillam, Bin Li, John O.Loughlin (2012) "Adding Cloud Performance to Service Level Agreements". 2nd International Conference on Cloud Computing and Services Science, CLOSER 2012. Accepted
Financial CDO Brokers An inspiration: Collateralized Debt Obligations (CDOs) Underlying assets – CDS, a spread indicates level of risk; ($50+ trillion in CDS protection in 2007 …. then…) Potential for default Default correlations important Lower order tranches take losses first For us, price of the underlying may be fixed but performance is variable, so model “bang for buck” instead. But want to handle situation where “buck goes bang”. Further notion of recovery rates interesting here. Li, B., Gillam, L., and O'Loughlin, J. (2010) Towards Application-Specific Service Level Agreements: Experiments in Clouds and Grids, In Antonopoulos and Gillam (Eds.), Cloud Computing: Principles, Systems and Applications. Springer-Verlag. Li, B., and Gillam, L. (2009), Towards Job-specific Service Level Agreements in the Cloud, Cloud-based Services and Applications, in 5th IEEE e-Science International Conference, Oxford, UK.
The Surrey Private Cloud Data Centre Hardware + Operating Systems Systems Administration (“meatware”) Network Services Web, etc Users Academic, Administrative, Students Requests/Response for new ‘stuff’
The Surrey Private Cloud OpenStack has been quite well tested by these benchmarks –we know it’s not a highly capable infrastructure, but at least it’s capable and we know how to size usefully. –nova+swift+glance are stable, but high latency of registering new images in the system, and Keystone Identity service not yet worked out –63 MSc students over 3 years on a module about Cloud Computing AWS, Google App Engine, MapReduce, OpenStack, now running happily. Gillam, L., Li, B. and O’Loughlin, J. (2012). Teaching Clouds: Lessons Taught and Lessons Learnt. Forthcoming in Cloud Computing for Teaching and Learning: Strategies for Design and Implementation, IGI.
And just one shameless plug An Open Access journal –Accepting publications and special issue proposals. –Launch imminent…. –Two special issues – Cloud Security and Intelligent & Autonomic Clouds at advanced stages.
Contact: L.Gillam@surrey.ac.ukL.Gillam@surrey.ac.uk Qs? The work presented has been supported in part by the EPSRC and JISC (EP/I034408/1) and by KTP 1739, and is in parts collaborative with Curtis+Cartwright and CDO2 Ltd. We also gratefully acknowledge Amazon Web Services (AWS) for their research grant awards.