Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma,

Slides:

Advertisements

Similar presentations

Libra: An Economy driven Job Scheduling System for Clusters Jahanzeb Sherwani 1, Nosheen Ali 1, Nausheen Lotia 1, Zahra Hayat 1, Rajkumar Buyya 2 1. Lahore.

Advertisements

Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)

Evaluating Caching and Storage Options on the Amazon Web Services Cloud Gagan Agrawal, Ohio State University - Columbus, OH David Chiu, Washington State.

Introduction to Grid Application On-Boarding Nick Werstiuk

Computing Infrastructure

© 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Performance Measurements of a User-Space.

The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.

Cloud Service Models and Performance Ang Li 09/13/2010.

Elastic HPC Extending the Cluster into the Cloud Ruth Lynch, Research IT Service 13 th November 2009.

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.

Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.

Cloud Storage in Czech Republic Czech national Cloud Storage and Data Repository project.

Efficient Event-based Resource Discovery Wei Yan*, Songlin Hu*, Vinod Muthusamy +, Hans-Arno Jacobsen +, Li Zha* * Chinese Academy of Sciences, Beijing.

The Who, What, Why and How of High Performance Computing Applications in the Cloud Soheila Abrishami 1.

Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.

OPNET Technologies, Inc. Performance versus Cost in a Cloud Computing Environment Yiping Ding OPNET Technologies, Inc. © 2009 OPNET Technologies, Inc.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

1 Distributed Systems Meet Economics: Pricing in Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of.

VoipNow Core Solution capabilities and business value.

Benchmarking Parallel Code. Benchmarking2 What are the performance characteristics of a parallel code? What should be measured?

Towards High-Availability for IP Telephony using Virtual Machines Devdutt Patnaik, Ashish Bijlani and Vishal K Singh.

Performance Evaluation

OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.

1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.

Distributed Systems Meet Economics: Pricing In The Cloud Authors: Hongyi Wang, Qingfeng Jing, Rishan Chen, Bingsheng He, Zhengping He, Lidong Zhou Presenter:

CloudCmp: Shopping for a Cloud Made Easy Ang Li Xiaowei Yang Duke University Srikanth Kandula Ming Zhang Microsoft Research 6/22/2010HotCloud 2010, Boston1.

Cloud Don McGregor Research Associate MOVES Institute

High Performance Computing with cloud Xu Tong. About the topic Why HPC(high performance computing) used on cloud What’s the difference between cloud and.

An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.

SALSASALSASALSASALSA Performance Analysis of High Performance Parallel Applications on Virtualized Resources Jaliya Ekanayake and Geoffrey Fox Indiana.

1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Dual Stack Virtualization: Consolidating HPC and commodity workloads in the cloud Brian Kocoloski, Jiannan Ouyang, Jack Lange University of Pittsburgh.

Supporting GPU Sharing in Cloud Environments with a Transparent

Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.

Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.

Emalayan Vairavanathan

Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.

November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.

Introduction to Cloud Computing

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

Trace Generation to Simulate Large Scale Distributed Application Olivier Dalle, Emiio P. ManciniMar. 8th, 2012.

Partner Logo Overcoming Buyer Objections Q

1 Time & Cost Sensitive Data-Intensive Computing on Hybrid Clouds Tekin Bicer David ChiuGagan Agrawal Department of Compute Science and Engineering The.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

Early Experiences with Energy-Aware (EAS) Scheduling

J. J. Rehr & R.C. Albers Rev. Mod. Phys. 72, 621 (2000) A “cluster to cloud” story: Naturally parallel Each CPU calculates a few points in the energy grid.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

Cost-effective Cloud HPC Resource Provisioning by Building Semi-Elastic Virtual Clusters Shuangcheng Niu 1, Jidong Zhai 1, Xiaosong Ma 2,3 Xiongchao Tang.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Shanjiang Tang, Bu-Sung Lee, Bingsheng He, Haikun Liu School of Computer Engineering Nanyang Technological University Long-Term Resource Fairness Towards.

Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,

June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.

Scale up Vs. Scale out in Cloud Storage and Graph Processing Systems

Ian Gable HEPiX Spring 2009, Umeå 1 VM CPU Benchmarking the HEPiX Way Manfred Alef, Ian Gable FZK Karlsruhe University of Victoria May 28, 2009.

| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Making the Cloud Less Cloudy

What is Cloud Computing - How cloud computing help your Business?

Department of Computer Science University of California, Santa Barbara

Haiyan Meng and Douglas Thain

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

EGI High-Throughput Compute

Presentation transcript:

Cloud Versus In-house Cluster: Evaluating Amazon Cluster Compute Instances for Running MPI Applications Yan Zhai, Mingliang Liu, Jidong Zhai Xiaosong Ma, Wenguang Chen Tsinghua University & NCSU & ORNL

HPC in cloud? Cloud service viable for HPC applications? Yes Mostly for loosely-coupled codes Has cloud grabbed majority of HPC users’ mind? No For tightly-coupled codes, Performance still major concern Lower performance -> higher cost

Amazon EC2 CCI Emerging of the high performance cloud like Amazon EC2 CCI (Cluster Computing Instance) High end computation hardware Exclusive resource usage Updated inter connection (10GbE network) Has CCI changed cloud HPC landscape?

Our work Several months of evaluating EC2 CCI Comprehensive performance and cost evaluations Focused on tightly coupled MPI programs Micro, macro benchmarks, and real world applications Exploring IO configurability issues

Outline Background & Motivation Evaluation and observations Will HPC cloud save you money? Application performance results Wish list to cloud service providers Conclusion

Will HPC cloud save you money? Cost: driving factor for going for cloud Cloud vs. in-house cluster Pay-as-you-go vs. fixed hardware investment Workload-dependent decision Relative performance of individual applications Mixture of applications Expected utilization level of in-house cluster

Runtime performance Cloud and 16-node in-house cluster configuration: CloudLocal CPUXeon X5570 (8 cores each)Xeon X5670(12 cores each) Memory23GB48GB Network10GbEQDR Infiniband FSNFS OSAmazon Linux AMI RHEL 5.5 VirtualizationPara-virtualizationNo

Selected applications GRAPES [1] (weather simulation) CPU- and memory-intensive Moderate communication MPI-Blast [2] (biological sequence matching) Large input Relatively little communication POP [3] (ocean modeling) Communication-intensive Large number of small messages

GRAPES results Time(s) Process number

MPI-Blast results Time(s) Process number

POP results Time(s) Process number

Performance summary Cloud offers performance close to in-house cluster For some applications … Communication still severe concern For communication-heavy apps Major problem: large latency Similar observation from benchmarking results [4] NPB class C and D Intel MPI Benchmarks STREAM memory benchmark

Coming back to cost Issue

Effective time elapsed in application

Coming back to cost Issue Time period before the local cluster becomes out of date

Coming back to cost Issue Cost of cloud per instance, 1.6$/(hour*instance)

Coming back to cost Issue Time to finish one job of A in cloud

Coming back to cost Issue Cost to buy and deploy local cluster

Coming back to cost Issue Effective time used to run applications

Coming back to cost Issue Time to finish one job of A in local

Coming back to cost Issue Cost for one job of application A in local side. If right side is larger, then cloud is more effective

Parameters used in local cluster Expense itemAmount Dell 5670 Servers (include service)$6508/node Infiniband NIC$612/node Infiniband Switch$6891 SAN with NFS server and RAID5$36753 Hosting (energy included)$15251/rack/year Assumed life span3 year

Utilization rate threshold for applications Utilization Rate Threshold(%)

Utilization rate threshold for applications Utilization Rate Threshold(%) This means if you use local cluster more than about 25% to run GRAPES per year, you’d better stay local

Further considerations in cost Calculation biased toward local cluster Assumes 24x7 availability in 3 years No failures, maintenance, holidays … Labor cost not counted Cloud provides continuous hardware upgrades Yesterday: Amazon announced New CCI instances Lowered price for current configuration: $1.60->$1.30 Heavy HPC users may get further cloud discount Reserved instances on AWS

Reduced pricing effect Utilization Rate Threshold(%)

Reserved Instance discount

3 x 365 x 24 hours

Reserved Instance discount Under a certain utilization rate, the time required for cloud to produce same amount of jobs as local

Reserved instance discount effect Utilization Rate Threshold(%)

Summary to cost Rough steps to evaluate cost effectiveness Estimate local utilization rate Short term run to acquire per job time Calculate threshold utilization rate If estimate utilization rate > calculated threshold Local is more cost-effective Else Cloud is more cost-effective

Our wish list to cloud service providers Improved network latency Pre-configured OS image Optimized library for specific cloud platform More flexible charging Current model designed for commercial servers Fine-granule accounting for clusters To allow large-scale development and testing System scale Current upper limit: dozens of nodes

Outline Background & Motivation Evaluation and observations Will HPC cloud save you money? Application performance results Wish list to cloud service providers Conclusion

Amazon EC2 CCI becoming competitive choice for HPC Even when running tightly-coupled simulations May deliver similar performance as in-house clusters Except for codes with heavy communication Flexibility and elasticity valuable Users may try out different resource types No up-front hardware investment Per user, per-application system software M. Liu et al., “One Optimized I/O Configuration per HPC application : Leveraging the Configurability of Cloud”, APSys 2011

Acknowledgment Research sponsored by Intel Collaborators: Bob Kuhn, Scott Macmillan, Nan Qiao

references [1] D. Chen, J. Xue, X. Yang, H. Zhang, X. Shen, J. Hu, Y. Wang, L. Ji, and J. Chen. New generation of multi-scale NWP system (GRAPES): general scientic design. Chinese Science Bulletin, 53(22):3433{3445, [2] A. Darling, L. Carey, and W. Feng. The design, implementation, and evaluation of mpiBLAST. In Proceedings of the ClusterWorld Conference and Expo, in conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution, [3] LANL. Parallel ocean program (pop). April [4] T. University. Technique report.

Thanks!