Clouds, Interoperation and PRAGMA Philip M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2.

Clouds, Interoperation and PRAGMA Philip M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2

Remember the Grid Promise? The Grid is an emerging infrastructure that will fundamentally change the way we think about-and use-computing. The word Grid is used by analogy with the electric power grid, which provides pervasive access to electricity and has had a dramatic impact on human capabilities and society The grid: blueprint for a new computing infrastructure, Foster, Kesselman. From Preface of first edition, Aug 1998

Some Things that Happened on the Way to Cloud Computing Web Version 1.0 (1995) 1 Cluster on Top 500 (June 1998) Dot Com Bust (2000) Clusters > 50% of Top 500 (June 2004) Web Version 2.0 (2004) Cloud Computing (EC2 Beta - 2006) Clusters > 80% of Top 500 (Nov. 2008)

Gartner Emerging Tech 2005

Gartner Emerging Tech - 2008

Gartner Emerging Tech 2010

What is fundamentally different about Cloud computing vs. Grid Computing Cloud computing – You adapt the infrastructure to your application – Should be less time consuming Grid computing – you adapt your application to the infrastructure – Generally is more time consuming Cloud computing has a financial model that seems to work – grid never had a financial model – The Grid “Barter” economy only valid for provider-to- provider trade. Pure consumers had no bargaining power

IaaS – Of Most Interest to PRAGMA Sun 3Tera IBM Amazon EC2 Run (virtual) computers to solve your problem, using your software Rackspace GoGrid

Cloud Hype “Others do all the hard work for you” “You never have to manage hardware again” “It’s always more efficient to outsource” “You can have a cluster in 8 clicks of the mouse” “It’s infinitely scalable” …

Amazon Web Services Amazon EC2 – catalytic event in 2006 that really started cloud computing Web Services access for – Compute (EC2) – Storage (S3, EBS) – Messaging (SQS) – Monitoring (CloudWatch) – + 20 (!) More services “I thought this was supposed be simple”

Basic EC2 Amazon Machine Images (AMIs) S3 – Simple Storage Service EBS – Elastic Block Store Amazon Cloud Storage Elastic Compute Cloud (EC2) Copy AMI & Boot AMIs are copied from S3 and booted in EC2 to create a “running instance” When instance is shutdown, all changes are lost – Can save as a new AMI

Basic EC2 AMI (Amazon Machine Image) is copied from S3 to EC2 for booting – Can boot multiple copies of an AMI as a “group” – Not a cluster, all running instances are independent – Clusters Instances are about $2/Hr (8 cores) ($17K/year) If you make changes to your AMI while running and want them saved – Must repack to make a new AMI Or use Elastic Block Store (EBS) on a per-instance basis

Some Challenges in EC2 1.Defining the contents of your Virtual Machine (Software Stack) 2.Preparing, packing, uploading image 3.Understanding limitations and execution model 4.Debugging when something goes wrong 5.Remembering to turn off your VM – Smallest 64-bit VM is ~$250/month running 7x24

One Problem: too many choices

Reality for Scientific Applications The complete software stack is critical to proper operation – Libraries – compiler/interpreter versions – file system location – Kernel This is the fundamental reason that the Grid is hard: my cluster is not the same environment as your cluster – Electrons are universal, software packages are not

People and Science are Distributed PRAGMA – Pacific Rim Applications and Grid Middleware Assembly – Scientists are from different countries – Data is distributed Cyber Infrastructure to enable collaboration When scientists are using the same software on the same data – Infrastructure is no longer in the way – It needs to be their software (not my software)

PRAGMA’s Distributed Infrastructure Grid/Clouds 26 institutions in 17 countries/regions, 23 compute sites, 10VM sites UZH Switzerland NECTEC KU Thailand UoHyd India MIMOS USM Malaysia HKU HongKong ASGC NCHC Taiwan HCMUT HUT IOIT-Hanoi IOIT-HCM Vietnam AIST OsakaU UTsukuba Japan MU Australia KISTI KMU Korea JLU China SDSC USA UChile Chile CeNAT-ITCR Costa Rica BESTGrid New Zealand CNIC China LZU China UZH Switzerland LZU China ASTI Philippines IndianaU USA UValle Colombia

Our Goals Enable Specialized Applications to run easily on distributed resources Investigate Virtualization as a practical mechanism – Multiple VM Infrastructures (Xen, KVM, OpenNebula, Rocks, WebOS, EC2) Use Geogrid Applications as a driver of the process

GeoGrid Applications as Driver I am not part of GeoGrid, but PRAGMA members are!

Deploy Three Different Software Stacks on the PRAGMA Cloud QuiQuake – Simulator of ground motion map when earthquake occurs – Invoked when big earthquake occurs HotSpot – Find high temperature area from Satellite – Run daily basis (when ASTER data arrives from NASA) WMS server – Provides satellite images via WMS protocol – Run daily basis, but the number of requests is not stable. Source: Dr. Yoshio Tanaka, AIST, Japan

Example of current configuration 21 WMS ServerQuiQuakeHot spot Fix nodes to each application Should be more adaptive and elastic according to the requirements. Source: Dr. Yoshio Tanaka, AIST, Japan

1 st step: Adaptive resource allocation in a single system 22 WMS ServerQuiQuakeHot Spot WMS ServerQuiQuakeHot Spot WMS ServerQuiQuakeHot spot Big Earthquake ! Increase WMS requests Change nodes for each application according to the situation and requirements. Source: Dr. Yoshio Tanaka, AIST, Japan

2 nd Step: Extend to distributed environments NASA (AIST) ERSDAC JAXA TDRS Terra/ASTER ALOS/PALSAR UCSD OCC NCHC Source: Dr. Yoshio Tanaka, AIST, Japan

What are the Essential Steps 1.AIST/Geogrid creates their VM image 2.Image made available in “centralized” storage 3.PRAGMA sites copy Geogrid images to local clouds 1.Assign IP addresses 2.What happens if image is in KVM and site is Xen? 4.Modified images are booted 5.Geogrid infrastructure now ready to use

Basic Operation VM image authored locally, uploaded into VM- image repository (Gfarm from U. Tsukuba) At local sites: – Image copied from repository – Local copy modified (automatic) to run on specific infrastructure – Local copy booted For running in EC2, adapted methods automated in Rocks to modify, bundle, and upload after local copy to UCSD.

VM hosting server VM Deployment Phase I - Manual http://goc.pragma-grid.net/mediawiki-1.16.2/index.php/Bloss%2BGeoGrid Geogrid + Bloss # rocks add host vm container=… # rocks set host interface subnet … # rocks set host interface ip … # rocks list host interface … # rocks list host vm … showdisks=yes # cd /state/partition1/xen/disks # wget http://www.apgrid.org/frontend...http://www.apgrid.org/frontend # gunzip geobloss.hda.gz # lomount –diskimage geobloss.hda -partition1 /media # vi /media/boot/grub/grub.conf … # vi /media/etc/sysconfig/networkscripts/ifc… … # vi /media/etc/sysconfig/network … # vi /media/etc/resolv.conf … # vi /etc/hosts … # vi /etc/auto.home … # vi /media/root/.ssh/authorized_keys … # umount /media # rocks set host boot action=os … # Rocks start host vm geobloss… frontend vm-container-0-0 vm-container-0-2 vm-container-…. Geogrid + Bloss vm-container-0-1 VM devel server Website Geogrid + Bloss

What we learned in manual approach AIST, UCSD and NCHC met in Taiwan for 1.5 days to test in Feb 2011 Much faster than Grid deployment of the same infrastructure It is not too difficult to modify a Xen image and run under KVM Nearly all of the steps could be automated Need a better method than “put image on a website” for sharing

Gfarm file server Gfarm Client Gfarm meta-server Gfarm file server Centralized VM Image Repository QuickQuake Geogrid + Bloss Nyouga Fmotif Gfarm Client VM images depository and sharing vmdb.txt

Gfarm using Native tools

VM hosting server VM Deployment Phase II - Automated http://goc.pragma-grid.net/mediawiki-1.16.2/index.php/VM_deployment_script Geogrid + Bloss Gfarm Cloud $ vm-deploy quiquake vm-container-0-2 vmdb.txt Gfarm Client Quiquake Nyouga Fmotif quiquake, xen-kvm,AIST/quiquake.img.gz,… Fmotif,kvm,NCHC/fmotif.hda.gz,… frontend vm-container-0-0 vm-container-0-2 vm-container-…. vm-container-0-1 vm-deploy Gfarm Client VM development server S Quiquake

AIST HotSpot + Condor gFS SDSC (USA) Rocks Xen NCHC (Taiwan) OpenNebula KVM LZU (China) Rocks KVM AIST (Japan) OpenNebula Xen IU (USA) Rocks Xen Osaka (Japan) Rocks Xen gFC gFS GFARM Grid File System (Japan) AIST QuickQuake + Condor AIST Geogrid + Bloss AIST Web Map Service + Condor UCSD Autodock + Condor NCHC Fmotif = VM deploy Script VM Image copied from gFarm VM Image copied from gFarm VM Image copied from gFarm VM Image copied from gFarm VM Image copied from gFarm Condor Master VM Image copied from gFarm S S S S S S S gFC gFS = Grid Farm Client = Grid Farm Server slave slave slave slave slave slave Put all together Store VM images in Gfarm systems Run vm-deploy scripts at PRAGMA Sites Copy VM images on Demand from gFarm Modify/start VM instances at PRAGMA sites Manage jobs with Condor

Moving more quickly with PRAGMA Cloud PRAGMA 21 – Oct 2011 – 4 sites: AIST, NCHC, UCSD, and EC2 (North America) SC’11 – Nov 2011 – New Sites: Osaka University Lanzhou University Indiana University CNIC EC2 – Asia Pacific

Condor Pool + EC2 Web Interface 4 different private clusters 1 EC2 Data Center Controlled from Condor Manager in AIST, Japan

PRAGMA Compute Cloud UoHyd India MIMOS Malaysia NCHC Taiwan AIST OsakaU Japan SDSC USA CNIC China LZU China LZU China ASTI Philippines IndianaU USA JLU China Cloud Sites Integrated in Geogrid Execution Pool

Roles of Each Site PRAGMA+Geogrid AIST – Application driver with natural distributed computing/people setup NCHC – Authoring of VMs in a familiar web environment. Significant Diversity of VM infra UCSD – Lower-level details of automating VM “fixup” and rebundling for EC2 We are all founding members of PRAGMA

NCHC WebOS/Cloud Authoring Portal Users start with well- defined Base Image then add their software

Getting things working in EC2 Short Background on Rocks Clusters Mechanisms for using Rocks to create an EC2 compatible image Adapting methodology to support non-Rocks defined images

38 Technology transfer of commodity clustering to application scientists Rocks is a cluster/System Configuration on a CD – Clustering software (PBS, SGE, Ganglia, Condor, … ) – Highly programmatic software configuration management – Put CDs in Raw Hardware, Drink Coffee, Have Cluster. Extensible using “Rolls” Large user community – Over 1PFlop of known clusters – Active user / support list of 2000+ users Active Development – 2 software releases per year – Code Development at SDSC – Other Developers (UCSD, Univ of Tromso, External Rolls Supports Redhat Linux, Scientific Linux, Centos and Solaris Can build Real, Virtual, and Hybrid Combinations (2 – 1000s) Rocks – http:// www.rocksclusters.org Rocks Core Development NSF award #OCI-0721623#OCI-0721623

Key Rocks Concepts Define components of clusters as Logical Appliances (Compute, Web, Mgmt, Login DB, PFS Metadata, PFS Data, … ) – Share common configuration among appliances – Graph decomposition of the full cluster SW and Config – Rolls are the building blocks: reusable components (Package + Config + Subgraph) Use installer’s (Redhat Anaconda, Solaris Jumpstart) text format to describe an appliance configuration – Walk the Rocks graph to compile this definition Heterogeneous Hardware (Real and Virtual HW) with no additional effort

Triton Resource Large Memory PSDAF 256 GB & 512 GB Nodes (32 core) 8TB Total 128 GB/sec ~ 9TF x28 Shared Resource Cluster 16 GB/Node 4 - 8TB Total 256 GB/sec ~ 20 TF x256 A Mid-Sized Cluster Resource Includes : Computing, Database, Storage, Virtual Clusters, Login, Management Appliances Campus Research Network UCSD Research Labs Large Scale Storage (Delivery by Mid May) 2 PB ( 384 TB Today) ~60 GB/sec ( 7 GB/s ) ~ 2600 (384 Disks Now) http://tritonresource.sdsc.edu

What’s in YOUR cluster?

How Rocks Treats Virtual Hardware It’s just another piece of HW. – If RedHat supports it, so does Rocks Allows mixture of real and virtual hardware in the same cluster – Because Rocks supports heterogeneous HW clusters Re-use of all of the software configuration mechanics – E.g., a compute appliance is compute appliance, regardless of “Hardware” Virtual HW must meet minimum HW Specs – 1GB memory – 36GB Disk space* – Private-network Ethernet – + Public Network on Frontend * Not strict – EC2 images are 10GB

Extended Condor Pool (Very Similar to AIST GeoGrid) Cluster Private Network (e.g. 10.1.x.n) Rocks Frontend Node 0 Condor Collector Scheduler Node n Node 1 Job Submit Identical system images Cloud 1 Condor Pool with both local and cloud resources Cloud 0

VM Container Rocks Frontend Guest VM Kickstart Guest VM ec2_enable=true 1 Bundle as S3 Image 2 Amazon EC2 Cloud Register Image as EC2 AMI 4 Boot AMI as an Amazon Instance 5 Disk Storage “Compiled” VM Image Optional: Test and Rebuild of Image Local Hardware Complete Recipe

At the Command Line: provided by the Rocks EC2 Roll/Xen Rolls 1.rocks set host boot action=install compute-0-0 2.rocks set host attr compute-0-0 ec2_enable true 3.rocks start host vm compute-0-0 – After reboot inspect, then shut down 4.rocks create ec2 bundle compute-0-0 5.rocks upload ec2 bundle compute-0-0 6.ec2-register /image.manifest.xml 7.ec2-run instances

Amazon EC2 Cloud VM Container Rocks Frontend Guest VM vm-deploy nyouga2 vm-container-0-20 1 Bundle as S3 Image 3 Register Image as EC2 AMI 5 Boot AMI as an Amazon Instance 6 Disk Storage “Modified” VM Image Local Hardware Gfarm Makeec2.sh 2 Modify to Support Non-Rocks Images for PRAGMA Experiment

Observations This is much faster than our Grid deployments Integration of private and commercial cloud is at proof-of-principle state Haven’t scratched the surface of when one expands into an external cloud Networking among instances in different clouds has pitfalls (firewalls, addressing, etc) Users can focus on the creation of their software stack

Heterogenous Clouds

More Information Online

Revisit Cloud Hype “Others do all some of the hard work for you” “You never still have to manage hardware again” “It’s always sometimes more efficient to outsource” “You can have a cluster in 8 clicks of the mouse, but it may not have your software” “It’s infinitely scalable” Location of data is important Interoperability across cloud infrastructures is possible …

Thank You! ppapadopoulos@ucsd.edu

Clouds, Interoperation and PRAGMA Philip M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2.

Similar presentations

Presentation on theme: "Clouds, Interoperation and PRAGMA Philip M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clouds, Interoperation and PRAGMA Philip M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2.

Similar presentations

Presentation on theme: "Clouds, Interoperation and PRAGMA Philip M. Papadopoulos, Ph.D University of California, San Diego San Diego Supercomputer Center Calit2."— Presentation transcript:

Similar presentations

About project

Feedback