Presentation on theme: "Infrastructure-as-a-Service Cloud Computing for Science April 2010 Salishan Conference Kate Keahey Nimbus project lead Argonne National."— Presentation transcript:
Infrastructure-as-a-Service Cloud Computing for Science April 2010 Salishan Conference Kate Keahey Nimbus project lead Argonne National Laboratory Computation Institute, University of Chicago
Cloud Computing for Science l Environment control l Resource control
Workspaces l Dynamically provisioned environments u Environment control u Resource control l Implementations u Via leasing hardware platforms: reimaging, configuration management, dynamic accounts… u Via virtualization: VM deployment Isolation
Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node VWS Service The Nimbus Workspace Service
Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node Pool node The workspace service publishes information about each workspace Users can find out information about their workspace (e.g. what IP the workspace was bound to) Users can interact directly with their workspaces the same way the would with a physical machine. VWS Service The Nimbus Workspace Service
2/8/2014 The Nimbus Toolkit: http//workspace.globus.org Nimbus: Cloud Computing for Science l Allow providers to build clouds u Workspace Service: a service providing EC2-like functionality u WSRF and WS (EC2) interfaces l Allow users to use cloud computing u Do whatever it takes to enable scientists to use IaaS u Context Broker: turnkey virtual clusters, u Also: protocol adapters, account managers and scaling tools l Allow developers to experiment with Nimbus u For research or usability/performance improvements u Open source, extensible software u Community extensions and contributions: UVIC (monitoring), IU (EBS, research), Technical University of Vienna (privacy, research) l Nimbus:
Clouds for Science: a Personal Perspective A Case for Grid Computing on VMs In-Vigo, VIOLIN, DVEs, Dynamic accounts Policy-driven negotiation Xen released EC2 released First STAR production run on EC2 Science Clouds available Nimbus Context Broker release 2010 First Nimbus release Experimental Clouds for Science OOI starts
STAR experiment l STAR: a nuclear physics experiment at Brookhaven National Laboratory l Studies fundamental properties of nuclear matter l Problems: u Complexity u Consistency u Availability Work by Jerome Lauret, Leve Hajdu, Lidia Didenko (BNL), Doug Olson (LBNL)
STAR Virtual Clusters l Virtual resources u A virtual OSG STAR cluster: OSG headnode (gridmapfiles, host certificates, NFS, Torque), worker nodes: SL4 + STAR u One-click virtual cluster deployment via Nimbus Context Broker l From Science Clouds to EC2 runs l Running production codes since 2007 l The Quark Matter run: producing just-in-time results for a conference:
Priceless? l Compute costs: $ 5, u 300+ nodes over ~10 days, u Instances, 32-bit, 1.7 GB memory: l EC2 default: 1 EC2 CPU unit l High-CPU Medium Instances: 5 EC2 CPU units (2 cores) u ~36,000 compute hours total l Data transfer costs: $ u Small I/O needs : moved <1TB of data over duration l Storage costs: $ 4.69 u Images only, all data transferred at run-time l Producing the result before the deadline… …$ 5,771.37
Cloud Bursting ALICE: Elastically Extend a GridElastically Extend a cluster React to Emergency OOI: Provide a Highly Available Service
2/8/2014 The Nimbus Toolkit: http//workspace.globus.org Hadoop in the Science Clouds l Papers: u CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications by A. Matsunaga, M. Tsugawa and J. Fortes. eScience u Sky Computing, by K. Keahey, A. Matsunaga, M. Tsugawa, J. Fortes, to appear in IEEE Internet Computing, September 2009 U of FloridaU of Chicago Purdue Hadoop cloud
Genomics: Dramatic Growth in the Need for Processing 454Solexa Next gen Solexa GB From Folker Meyer, The M5 Platform moving from big science (at centers) to many players democratization of sequencing we currently have more data than we can handle moving from big science (at centers) to many players democratization of sequencing we currently have more data than we can handle
Ocean Observatory Initiative CI: Linking the marine infrastructure to science and users
Benefits and Concerns Now l Benefits u Environment per user (group) u On-demand access u We dont want to run datacenters! u Capital expense -> operational expense u Growth and cost management l Concerns u Performance: Cloud computing offerings are good, but they do not cater to scientific needs u Price and stability u Privacy
From Walker, ;login: 2008 Performance (Hardware) l Challenges u Big I/O degradation u Small CPU degradation l Ethernet vs Infiniband u No OS bypass drivers, no infiniband l New development u OS bypass drivers
Performance (Configuration) l Trade-off: CPU vs. I/O l VMM configuration u Sharing between VMs u VMM latency u Performance instability l Multi-core opportunity l A price performance trade- off From Santos et al., Xen Summit 2007 Ultimately a change in mindset: performance matters!
From Performance… …to Price-Performance l Instance definitions for science u I/O oriented, well-described l Co-location of instances l To stack or not to stack l price point u CPU: on-demand, reserved, spot pricing u Data: high/low availability? l Data access performance and availability l Pricing u Finer and coarser grained
Availability, Utilization, and Cost/Price l Most of science today is done in batch l The cost of on-demand u Overprovisioning or request failure? l Clouds + HTC = marriage made in heaven? l Spot pricing courtesy of Rob Simmonds, example of WestGrid utilization
Data in the Cloud l Storage clouds and SANs u AWS Simple Storage Service (S3) u AWS Elastic Block Store (EBS) l Challenges u Bandwidth performance and sharing u Sharing data between users u Sharing storage between instances u Availability l Data Privacy (the really hard issue) Descher et al., Retaining Data Control in Infrastructure Clouds, ARES (the International Dependability Conference), 2009.
Cloud Markets l Is computing fungible? l Can it be fungible? u Diverse paradigms: IaaS, PaaS, SaaS, and other aaS.. u Interoperability u Comparison basis What if my IaaS provider double their prices tomorrow?
IaaS Cloud Interoperability l Cloud standards u OCCI (OGF), OVF (DMTF), and many more… u Cloud-standards.org l Cloud abstractions u Deltacloud, jcloud, libcloud, and many more… l Appliances, not images u rBuilder, BCFG2, CohesiveFT, Puppet, and many more…
Can you give us a better deal? l In terms of… u Performance, deployment, data access, price l Based on relevant scientific benchmarks l Comprehensive, current, and public The Bitsource: CloudServers vs EC2 LKC Cost by Instance
Science Cloud Ecosystem l Goal: time to science in clouds -> zero l Scientific appliances l New tools u turnkey clusters, cloud bursting, etc. u Open source important u Change in the mindset l Education and adaptation u Profiles u New paradigm requires new approaches u Teach old dogs some new tricks
The Magellan Project QDR InfiniBand Switch Login Nodes (4) Gateway Nodes (~20) File Servers (8) (/home) 160TB ESNe t 10Gb/ s ESNe t 10Gb/ s Aggregation Switch Router Compute 504 Compute Nodes Nehalem Dual quad-core 2.66GHz 24GB RAM, 500GB Disk QDR IB link Totals 4032 Cores, 40TF Peak 12TB RAM, 250TB Disk Joint project: ALCF and NERSC Funded by DOE/ASCR ARRA Apply at: ANI 100 Gb/s
The FutureGrid Project Apply at: NSF-funded experimental testbed (incl. clouds) ~6000 total cores connected by private network
Parting Thoughts l Opinions split into extremes: u Cloud computing is done! u Infrastructure for toy problems l The truth is in the middle u We know that IaaS represents a viable paradigm for a set of scientific applications… u …it will take more work to enlarge that set u Experimentation and open source are a catalyst l Challenge: lets make it work for science!