Enabling On-Demand Scientific Workflows on a Federated Cloud Steven C. Timm, Gabriele Garzoglio, Seo-Young Noh, Haeng-Jin Jang KISTI project evaluation.

Slides:



Advertisements
Similar presentations
Creating HIPAA-Compliant Medical Data Applications with Amazon Web Services Presented by, Tulika Srivastava Purdue University.
Advertisements

Distributed Data Processing
Fermilab, the Grid & Cloud Computing Department and the KISTI Collaboration GSDC - KISTI Workshop Jangsan-ri, Nov 4, 2011 Gabriele Garzoglio Grid & Cloud.
System Center 2012 R2 Overview
High Performance Computing Course Notes Grid Computing.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Idle virtual machine detection in FermiCloud Giovanni Franzini September 21, 2012 Scientific Computing Division Grid and Cloud Computing Department.
Cloud computing Tahani aljehani.
Tier-1 experience with provisioning virtualised worker nodes on demand Andrew Lahiff, Ian Collier STFC Rutherford Appleton Laboratory, Harwell Oxford,
Minerva Infrastructure Meeting – October 04, 2011.
Cloud Computing Why is it called the cloud?.
Clouds on IT horizon Faculty of Maritime Studies University of Rijeka Sanja Mohorovičić INFuture 2009, Zagreb, 5 November 2009.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Assignment 3: A Team-based and Integrated Term Paper and Project Semester 1, 2012.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
1 port BOSS on Wenjing Wu (IHEP-CC)
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
DISTRIBUTED COMPUTING
Ceph Storage in OpenStack Part 2 openstack-ch,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Overview of Atmosphere.
Cloud Computing in NASA Missions Dan Whorton CTO, Stinger Ghaffarian Technologies June 25, 2010 All material in RED will be updated.
Presented by: Sanketh Beerabbi University of Central Florida COP Cloud Computing.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
FermiCloud: Enabling Scientific Workflows with Federation and Interoperability Steven C. Timm FermiCloud Project Lead Grid & Cloud Computing Department.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware A Cloud Computing Methodology Study of.
Web Technologies Lecture 13 Introduction to cloud computing.
European Grid Initiative Data Services and Solutions Part 2: Data in the cloud Enol Fernández Data Services.
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Ian Collier, STFC, Romain Wartel, CERN Maintaining Traceability in an Evolving Distributed Computing Environment Introduction Security.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Auxiliary services Web page Secrets repository RSV Nagios Monitoring Ganglia NIS server Syslog Forward FermiCloud: A private cloud to support Fermilab.
1 This Changes Everything: Accelerating Scientific Discovery through High Performance Digital Infrastructure CANARIE’s Research Software.
Authentication, Authorization, and Contextualization in FermiCloud S. Timm, D. Yocum, F. Lowe, K. Chadwick, G. Garzoglio, D. Strain, D. Dykstra, T. Hesselroth.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
On-demand Services for the Scientific Program at Fermilab Gabriele Garzoglio Grid and Cloud Services Department, Scientific Computing Division, Fermilab.
FermiCloud Project: Integration, Development, Futures Gabriele Garzoglio Associate Head Grid & Cloud Computing Department Fermilab Work supported by the.
Fabric for Frontier Experiments at Fermilab Gabriele Garzoglio Grid and Cloud Services Department, Scientific Computing Division, Fermilab ISGC – Thu,
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Hao Wu, Shangping Ren, Gabriele Garzoglio, Steven Timm, Gerard Bernabeu, Hyun Woo Kim, Keith Chadwick, Seo-Young Noh A Reference Model for Virtual Machine.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Bringing Federated Identity to Grid Computing Dave Dykstra CISRC16 April 6, 2016.
July 18, 2011S. Timm FermiCloud Enabling Scientific Computing with Integrated Private Cloud Infrastructures Steven Timm.
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad
Accessing the VI-SEEM infrastructure
Running LHC jobs using Kubernetes
ATLAS Cloud Operations
StratusLab Final Periodic Review
StratusLab Final Periodic Review
Provisioning 160,000 cores with HEPCloud at SC17
Hybrid Cloud Architecture for Software-as-a-Service Provider to Achieve Higher Privacy and Decrease Securiity Concerns about Cloud Computing P. Reinhold.
University of Technology
WLCG Collaboration Workshop;
Cloud Computing R&D Proposal
Presentation transcript:

Enabling On-Demand Scientific Workflows on a Federated Cloud Steven C. Timm, Gabriele Garzoglio, Seo-Young Noh, Haeng-Jin Jang KISTI project evaluation 23 October 2014 REPORT of CRADA FRA / KISTI-C14014

Outline History of Fermilab-KISTI Collaboration Staffing Goals of Collaboration Virtual Provisioning and Infrastructure –Large scale demo of federated cloud –Provisioning algorithms –Coordinated workflows Interoperability and Federation –Automated image conversion script –Interoperability with Google Compute Engine and Microsoft Azure –Object Storage Investigation –X.509 Authorization and Authentication Output of Project Conclusions 23-Oct-2014CRADA FRA / KISTI-C140142

History of Fermilab-KISTI collaboration KISTI made available as Open Science Grid resource for Fermilab/CDF use in 2009 with assistance from Fermilab personnel KISTI personnel Seo-Young Noh and Hyunwoo Kim worked with us at Fermilab in summer of 2011 –First virtualized MPI work completed and first vcluster prototype Cooperative Research and Development first year March – September 2013 Cooperative Research and Development second year March- October Oct-2014CRADA FRA / KISTI-C140143

Staffing: G. Garzoglio: Principal Investigator S. Timm: Project Lead H. Kim: Programmer J. Boyd, G. Bernabeu, N. Sharma, N. Peregonow: Operations Group T. Levshina, K. Herner: User Support P. Mhashilkar: GlideinWMS support H. Wu, S. Palur, X. Yang: IIT Graduate Students A. Balsini: INFN visiting student K. Shallcross: Contractor, Instant Technology KISTI coordination: Seo-Young Noh 23-Oct-2014CRADA FRA / KISTI-C140144

FermiCloud Project Development 23-Oct-2014CRADA FRA / KISTI-C Phase 1: ( ) “Build and Deploy the Infrastructure” Hardware Purchase, OpenNebula Phase 1: ( ) “Build and Deploy the Infrastructure” Hardware Purchase, OpenNebula Phase 2: ( ) “Deploy Management Services, Extend Infrastructure and Research Capabilities” X.509, Accounting, Virtualized fabric Phase 2: ( ) “Deploy Management Services, Extend Infrastructure and Research Capabilities” X.509, Accounting, Virtualized fabric Phase 3: ( ) “Establish Production Services and Evolve System Capabilities in Response to User Needs High Availability, Live migration Phase 3: ( ) “Establish Production Services and Evolve System Capabilities in Response to User Needs High Availability, Live migration Time Today

Fermilab/KISTI Joint Project Goals 23-Oct-2014CRADA FRA / KISTI-C Phase 4: “Expand the service capabilities to serve more of our user communities” Cloud Bursting, AWS, Idle VM, Interop Year 1 of CRADA, 2013 Phase 4: “Expand the service capabilities to serve more of our user communities” Cloud Bursting, AWS, Idle VM, Interop Year 1 of CRADA, 2013 Time Today Phase 5: “Run experimental workflows and make them transparent to users. AWS caching, 1000VM Scale, Ceph, Year 2 of CRADA, 2014 Phase 5: “Run experimental workflows and make them transparent to users. AWS caching, 1000VM Scale, Ceph, Year 2 of CRADA, 2014 Phase 6: “Expand Cloud Federation to more sites, stakeholders, data” S3 Caching, Complex service workflow, cost-aware provisioning Year 3 of CRADA, 2015 Phase 6: “Expand Cloud Federation to more sites, stakeholders, data” S3 Caching, Complex service workflow, cost-aware provisioning Year 3 of CRADA, 2015

1000-VM Workflow demonstration. Goal for this year was demonstrating scale at 1000 virtual machine level. Previously had done up to 100 on FermiCloud and AWS. To get to 1000 VM on FermiCloud we needed the following: –A faster and more reliable OpenNebula –Network Address space to put that many virtual machines –Good provisioning system to deliver and initialize the VM image To get to 1000 VM on Amazon we needed the following –Better Squid caching (this was limiting factor previously) –Faster way to make changes on stock image. For both, we needed unified batch submission system 23-Oct-2014CRADA FRA / KISTI-C140147

Fermi Cloud GlideinWMS – Grid Bursting Job Queue FrontEnd Users GlideinWMS Pilot Factory Pilots Jobs Gcloud KISTI Amazon AWS 3/27/2014 Gabriele Garzoglio – On-demand services for the scientific program at Fermilab Jobsub_submit FermiGrid OSG Unified submit tool for grid and cloud 8

FermiCloud 1000-VM test OpenNebula 4.8 “econe-server” with X.509 authentication Routable private network—can reach anywhere on-site Fermilab but not off-site 140 Dell Poweredge 1950 servers, formerly part of CDF Farms (vintage 2007) 8 cores, 16GB RAM Bluearc NFS as image datastore –Qcow2 image is copied to each node and run from local disk. Opennebula’s own CLI could launch 1000 VM’s easily in about 30 minutes Issues (all have been worked around temporarily and reported to OpenNebula developers): –HTCondor use of OpenNebula API creates one ssh keypair per vm, total number of allowed keypairs is 300 –Database growth sometimes causes the DescribeInstances call to time out. Can be worked around by aggressive pruning of database. 923-Oct-2014CRADA FRA / KISTI-C14014

FermiCloud to AWS conversion tool 23-Oct-2014CRADA FRA / KISTI-C

SHOAL squid discovery system 23-Oct-2014CRADA FRA / KISTI-C

Results—NOvA Near Detector Cosmic Ray Simulation jobs total run Up to 1000 each simultaneously on AWS and FermiCloud. Results sent back to Fermilab dCache servers in the FTS “Drop box” On AWS used m3.large instance which can run 2 jobs at once. Approximate cost for 6 hours of AWS computing at that scale, $ Oct-2014CRADA FRA / KISTI-C

Provisioning Algorithm Details 3 Papers on provisioning algorithms this year from student Hao Wu Modeling the Virtual Machine Overhead under FermiCloud –Published in proceedings of CCGrid 2014, May 2014 A Reference Model for Virtual Machine Launching Overhead –Referee’s comments addressed, submitted to IEEE Transactions on Cloud Computing Overhead-Aware-Best-Fit (OABF) Resource Allocation Algorithm for Minimizing VM Launching Overhead –Accepted to MTAGS workshop at SC2014. Cost-aware provisioning—Goal to launch virtual machine on platform where it takes least time and money Instrumentation improvements made for vcluster software. 23-Oct-2014CRADA FRA / KISTI-C

Object Storage Evaluation Goal: find replacement for SAN-based shared file system which is expensive, not scalable, and difficult to maintain. Originally intended to look at several object stores including Ceph CERN reported favorable experience with Ceph as backend for OpenStack Cinder and Glance (Object and block stores). Focus on RBD (remote block device) component of Ceph Successfully ran virtual machines in OpenNebula using Ceph Remote Block Devices. –Allows live migration capacity Can export full image to local disk with RBD export—OpenStack’s preferred method Tested reliability and stability against failure and reboot Tested multiple iozone and rbd exports from virtual machines and from bare metal machines Conclusion: promising technology, will defer final implementation until Sci. Linux 7 more widely available. 23-Oct-2014CRADA FRA / KISTI-C

Google Compute Engine / Microsoft Azure HTCondor already supported Google Compute Engine Now API has completely changed, work has to be redone We collected information so HTCondor developers can fix the API Also identified bug in the documentation of how to use current Google REST API For Microsoft Azure—did basic operations from Web GUI including learning how to upload/download an image. Identified basic API’s needed to launch a virtual machine, forwarded them to HTCondor developers for eventual inclusion Neither Google or Azure supports any open standard of access that is used by any other cloud. –Unlikely there will ever be a single API that can contact all. –Makes federation tools like HTCondor and GlideinWMS that can contact all even more promising. OAUTH2-based system of Google authorization is promising alternative for REST API authentiction that doesn’t have long-lived tokens. Both clouds willing to give sufficient cost breaks to the government to make it worth our time. 23-Oct-2014CRADA FRA / KISTI-C

X.509 Authentication/Authorization Study Original X.509 authentication code of OpenNebula written by Fermilab Key technology in EGI Federated Cloud project as well as FermiCloud. But X.509 authentication on ReST API is unusual, most USA cloud sites use (or will soon use) token-based federation and authorization systems such as OAUTH2 or OpenID. Amazon will deprecate their SOAP API which was X.509 based at end of this year. OpenStack community headed for token-based ReST API’s We have developed a candidate X.509 authorization system for OpenNebula which can be used in command line, browser, and web API. –Should we deploy It or go another direction? –Federation with KISTI’s cloud is important requirement in decision. Results of our review accepted to Cloud Federation Mgmt. Workshop, London, Dec Oct-2014CRADA FRA / KISTI-C

Output: Journal Papers Submitted to IEEE Transactions on Cloud Computing (Referee comments received and processed) –A Reference Model For Virtual Machine Launching Overhead (H. Wu et al) –Understanding the Performance and Potential of Cloud Computing for Scientific Applications (I. Sadooghi et al) Conference Papers Submitted and Accepted –X.509 Authentication/Authorization in FermiCloud (Cloud Federation Workshop, London, Dec. 2014) –Overhead-Aware-Best-Fit (OABF) Resource Allocation Algorithm for Minimizing VM Launching Overhead MTAGS workshop at Supercomputing 2014, Nov. 2014) 23-Oct-2014CRADA FRA / KISTI-C

Output part II Talks –7 talks total—6 at peer-reviewed conferences plus invited talk –S. Timm Computing Techniques CERN, May 2014 Software Modules –VM Conversion Tool –Shoal/Squid configuration/installation scripts –Vcluster enhancements for intelligent VM launching Documentation –How to use Google Cloud and Microsoft Azure Cloud 2 more Conference papers in preparation –Squid/Shoal Client + running on AWS and FermiCloud->CHEP –Ceph comparison with Fusion FS -> Cluster Oct-2014CRADA FRA / KISTI-C

Joint Project Effort PERSONADJUSTED FTE Effort G. Bernabeu0.76 G. Garzoglio0.45 H. Kim3.44 N. Peregonow0.22 S. Timm2.42 TOTAL Oct-2014CRADA FRA / KISTI-C ExpenseMoney ($US) 3 IIT Students33540 INFN Student*6000 Consultant32800 AWS Computing*1579 Travel*5233 Temp housing1088 Indirect cost16770 DOE Fee3000 TOTAL100000

Future Focus Workflows and Interoperability –GlideinWMS and HTCondor support for Google, Microsoft, OpenStack Nova –Grid servers on the cloud –Investigation of using S3 for temporary input/output cache –Policy-based provisioning –Investigate running CERN LHC workflows on federated cloud. Infrastructure as a service/Facilities –Auto-scale data movement and proxy services based on demand –Object storage investigation –Unified installation/configuration/monitoring for bare metal and virtual machines –Hooks to enable automated launch of special worker nodes, interactive login machines –User request particular platforms on demand. 23-Oct-2014CRADA FRA / KISTI-C

CRADA project summary 2014 We successfully have scaled up scientific workflows to the 1000-Virtual Machine Scale Larger cloud remains accessible from production system Four real Fermilab experiments have run real workflows on cloud in past year Good progress made towards new commercial clouds Student’s work in production and benefitting stakeholders Relationship with KISTI remains strong: Contribute to joint software development of vcluster Cross-train on Grid and Cloud issues 23-Oct CRADA FRA / KISTI-C14014

Acknowledgements None of this work could have been accomplished without: –The excellent support from other departments of the Fermilab Computing Sector – including Computing Facilities, Site Networking, and Logistics. –The excellent collaboration with the open source communities – especially HTCondor, and OpenNebula, –The GlideinWMS project for enhanced cloud provisioning –As well as the excellent collaboration and contributions from KISTI. –Illinois Institute of Technology students and professors Ioan Raicu and Shangping Ren –INFN students. 23-Oct-2014CRADA FRA / KISTI-C