CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Presenter: Dave Schulz Research Computing Services University of Calgary.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
© University of Reading IT Services ITS Support for e­ Research Stephen Gough Assistant Director of IT Services 18 June 2008.
University of Southampton Electronics and Computer Science M-grid: Using Ubiquitous Web Technologies to create a Computational Grid Robert John Walters.
Quick Overview of Virtual PC Tyler S. Farmer Sr. Technology Specialist II Education Solutions Group Microsoft Corporation.
SALSA HPC Group School of Informatics and Computing Indiana University.
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
Samford University Virtual Supercomputer (SUVS) Brian Toone 4/14/09.
Amazon. Cloud computing also known as on-demand computing or utility computing. Similar to other utility providers like electric, water, and natural gas,
1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
OxGrid, A Campus Grid for the University of Oxford Dr. David Wallom.
A quick introduction to CamGrid University Computing Service Mark Calleja.
Condor Overview Bill Hoagland. Condor Workload management system for compute-intensive jobs Harnesses collection of dedicated or non-dedicated hardware.
PRESTON SMITH ROSEN CENTER FOR ADVANCED COMPUTING PURDUE UNIVERSITY A Cost-Benefit Analysis of a Campus Computing Grid Condor Week 2011.
THE COST OF CONDOR: MEASURING POWER USAGE OF SCIENTIFIC COMPUTATION USING THE DESKTOP FLEET Supervisors: Brian Davis Sam Moskwa Summer Scholar: Monish.
PresentPC August 2009 Erick Engelke Engineering Computing.
HPCC Mid-Morning Break Interactive High Performance Computing Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
1 port BOSS on Wenjing Wu (IHEP-CC)
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Horst Severini Chris Franklin, Josh Alexander University of Oklahoma Implementing Linux-Enabled Condor in Windows Computer Labs.
VirtualBox What you need to know to build a Virtual Machine.
HTCondor and Beyond: Research Computer at Syracuse University Eric Sedore ACIO – Information Technology and Services.
Experiences with a HTCondor pool: Prepare to be underwhelmed C. J. Lingwood, Lancaster University CCB (The Condor Connection Broker) – Dan Bradley
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Simplifying Resource Sharing in Voluntary Grid Computing with the Grid Appliance David Wolinsky Renato Figueiredo ACIS Lab University of Florida.
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
Future of the Server Room Tour. Ottawa Montreal Calgary Vancouver Toronto Future of Your Server Room Three Pillars of Windows Server 2008 Virtualization.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Using Virtual Servers for the CERN Windows infrastructure Emmanuel Ormancey, Alberto Pace CERN, Information Technology Department.
Condor and DRBL Bruno Gonçalves & Stefan Boettcher Emory University.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Server Virtualization
The Alternative Larry Moore. 5 Nodes and Variant Input File Sizes Hadoop Alternative.
Server Performance, Scaling, Reliability and Configuration Norman White.
Page 1 Process Migration & Allocation Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this.
Having a Blast! on DiaGrid Carol Song Rosen Center for Advanced Computing December 9, 2011.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Enabling the use of e-Infrastructures with.
| IIT Annual Update Overview Divisional Update IIT Staff are focused on customers first IIT has improved the learner experience.
Horst Severini, Chris Franklin, Josh Alexander, Joel Snow University of Oklahoma Implementing Linux-Enabled Condor in Windows Computer Labs.
Oxford eScience OxGrid: Virtualisation at Oxford Rhys Newman Manager of Interdisciplinary Grid Development, Oxford University Campus Grid Workshop – Edinburgh.
Understanding Performance Testing Basics by Adnan Khan.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Predrag Buncic (CERN/PH-SFT) Software Packaging: Can Virtualization help?
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Creating Grid Resources for Undergraduate Coursework John N. Huffman Brown University Richard Repasky Indiana University Joseph Rinkovsky Indiana University.
Let's talk about Linux and Virtualization in 'vLAMP'
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
OpenPBS – Distributed Workload Management System
Dag Toppe Larsen UiB/CERN CERN,
Dag Toppe Larsen UiB/CERN CERN,
Heterogeneous Computation Team HybriLIT
Clinton A Jones Eastern Kentucky University Department of Technology
Introduction to Operating System (OS)
CernVM Status Report Predrag Buncic (CERN/PH-SFT).
Chapter 1: Introduction
Dev Test on Windows Azure Solution in a Box
Basic Grid Projects – Condor (Part I)
Subject Name: Operating System Concepts Subject Number:
Azure Container Service
Chapter 1: Introduction
Presentation transcript:

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Presenter: Dave Schulz Research Computing Services University of Calgary IT University of Calgary EcoGrid

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Example Job Types General purpose – arbitrary linux apps Rendering video and still images Charmm Matlab Maple Parameter sweeps etc.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Rendered Pictures Examples –Note: Hi bandwidth – even for on campus

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary What is EcoGrid? Cycle scavenging system -- using otherwise idle CPU cycles to perform useful work Most lab computers are powered on but idle for most of the night.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Consider: –Assumptions: Idle from 6pm to 6am – 12h / day Idle all weekend – 48h/week 2000 EcoGrid Nodes –Calculation: Idle Time = 12h*5 + 24h * 2 = 108hours / week 108h/week = 6480 CPU Minutes per week 2000 nodes * 6480 minutes/week = CPU Minutes / Week! Or CPU Minutes / Year To Contrast, The Westgrid Matrix cluster (128 nodes) running at 100% for one year would only have CPU minutes.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Goals (by July ‘09) nodes Enough demand to consume 100% of the cluster Full web based reporting and statistics Other clusters connected –Origin –Terminus –Matrix –Lattice

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Benefits Huge untapped computing resource Compute cycles available to the campus without the need to purchase more equipment Cluster will always have some fairly current hardware Efficiently using power already wasted by idle computers Little – if any – impact on lab users

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Drawbacks More network utilization Possible heat capacity of lab environmental systems Somewhat increased electrical power draw –Lab power system should be able to supply this power but may not under normal lab conditions

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary How is it done? Using Condor and Innotek VirtualBox (Windows Platforms) Next build will use QEMU –Checkpointing the machine –Jobs survive nightly reboots Using Condor natively (MAC/Linux Platforms)

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary What is Condor? Developed by the University of Wisconsin- Madison Runs on many common operating systems – but the jobs must be designed for that operating system Windows is supported but little demand for HPC applications Condor project started in 1988

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Why Condor? Users retain full control over their computers Provides job checkpointing, migration, and restart (with certain restrictions) DAGMAN – Directed Acyclic Graph job MANager –Takes care of job dependencies –Even allows portions of jobs to be run on completely dis-similar clusters. –Very easy to express job dependencies Very resilient to Network Problems –Jobs finish and wait until the network is restored to complete.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary What is QEMU? Processor Emulator with extensions to quickly run code built for the host processor Open Source Runs Linux Guest on Windows™ Virtually undetectable to the Windows User –Runs as a service – only visible in task manager as a running task.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Image Size Small node filesystem image ~20Mb which kickstarts a full system upon first bootup. Reinstalls can be triggered from the headnode, so software updates and fixes can be pulled in RPM form at the next reboot.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Networking Central manager is unable to initiate direct TCP/IP connections to the nodes so something else is required. Options –VPN –IP Tunneling –Connection Brokering

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Networking Cont’d We have chosen to use GCB – Generic Connection Brokering – which is a part of the Condor distribution. The compute nodes establish and maintain a connection to the GCB at startup. When the Central Manager needs to open a connection to the node, it contacts it via the GCB machine.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Networking Cont’d Vulture (a type of Condor) is the central manager. It coordinates all of the machines and the jobs they run. Ecogrid is the submit machine, the one where the users login to and submit their jobs.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Scalability GCB nodes can be created as network load requires.

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Where do we plan on using this? Lab Computers (via VirtualBox/QEMU) DTP Desktop Computers (via VirtualBox/QEMU) Linux Labs (natively) Other Clusters (via the Globus Interface) Will provide one common interface to many clusters

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Ideal Workload Serial jobs -- Possibly 2 processor depending on the available hosts Jobs that can be broken into smaller jobs Parameter sweeps Self Compiled (To take advantage of checkpointing and restart) COMING SOON: Matlab jobs!!

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Timeline / Currently Currently: 80 IT Labs machines in the Elbow Room Hoping to roll out a number of Linux labs which have Matlab installed before the end of summer

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Going Forward Phase II – Expansion of project to non UCIT labs

Web Portal Demo CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary

Team Members Stephen Cartwright Robert Fridman Eric Merth David Schulz Carol Sin

CANHEIT | On the EDGE | June 15-18, 2008 | University of Calgary Information Resources Condor Website: QEMU Website: bellard.org/qemubellard.org/qemu VirtualBox Website: Local Website: hpc.ucalgary.ca/EcoGridhpc.ucalgary.ca/EcoGrid