April 2009 1 Open Science Grid Clemson Campus Grid Sebastien Goasguen School of Computing Clemson University, Clemson, SC.

Slides:



Advertisements
Similar presentations
Jaime Frey Computer Sciences Department University of Wisconsin-Madison OGF 19 Condor Software Forum Routing.
Advertisements

Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Distributed Data Processing
CHEPREO Tier-3 Center Achievements. FIU Tier-3 Center Tier-3 Centers in the CMS computing model –Primarily employed in support of local CMS physics community.
Samford University Virtual Supercomputer (SUVS) Brian Toone 4/14/09.
Campus High Throughput Computing (HTC) Infrastructures (aka Campus Grids) Dan Fraser OSG Production Coordinator Campus Grids Lead.
Supporting Transformative Research Through Regional Cyberinfrastructure (CI) Dr. Dali Wang, Grid Infrastructure Specialist.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Minerva Infrastructure Meeting – October 04, 2011.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Server and Short to Mid Term Storage Funding Research Computing Funding Issues.
OSG Campus Grids Dr. Sebastien Goasguen, Clemson University ____________________________.
HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
“... providing timely, accurate, and useful statistics in service to U.S. agriculture.” Achieving Innovation and Efficiencies through Organizational Change.
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
Purdue RP Highlights TeraGrid Round Table September 23, 2010 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Welcome to CW 2007!!!. The Condor Project (Established ‘85) Distributed Computing research performed by.
Key Project Drivers - FY11 Ruth Pordes, June 15th 2010.
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
Gurcharan S. Khanna Director of Research Computing RIT
IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
SURAgrid Governance Committee Art VandenbergMike Sachon SGC ChairSGC Co-Chair September 27, 2007.
OSG - Education, Training and Outreach OpenScienceGrid.org/Education OpenScienceGrid.org/About/Outreach OpenScienceGrid.org/Education OpenScienceGrid.org/About/Outreach.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.
SG - OSG Improving Campus Research CI Through Leveraging and Integration: Developing a SURAgrid-OSG Collaboration John McGee, RENCI/OSG Engagement Coordinator.
17-April-2007 High Performance Computing Basics April 17, 2007 Dr. David J. Haglin.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Research and Educational Networking and Cyberinfrastructure Russ Hobby, Internet2 Dan Updegrove, NLR University of Kentucky CI Days 22 February 2010.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Sep 21, 20101/14 LSST Simulations on OSG Sep 21, 2010 Gabriele Garzoglio for the OSG Task Force on LSST Computing Division, Fermilab Overview OSG Engagement.
Miron Livny Center for High Throughput Computing Computer Sciences Department University of Wisconsin-Madison Open Science Grid (OSG)
Condor Team Welcome to Condor Week #10 (year #25 for the project)
Grid job submission using HTCondor Andrew Lahiff.
National Science Foundation CI-TEAM Proposal: Blast on Condor How Will This Help [InstAbbrev]? Your Name Here Your Job Title Here Your Department Here.
Purdue Campus Grid Preston Smith Condor Week 2006 April 24, 2006.
Open Science Grid For CI-Days NYSGrid Meeting Sebastien Goasguen, John McGee, OSG Engagement Manager School of Computing.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
Condor Week 2007 Windows Condor Pool at Clemson University Sebastien Goasguen School of Computing and Clemson Computing Information Technology (CCIT) May.
Cyberinfrastructure What is it? Russ Hobby Internet2 Joint Techs, 18 July 2007.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Biomedical and Bioscience Gateway to National Cyberinfrastructure John McGee Renaissance Computing Institute
Derek Wright Computer Sciences Department University of Wisconsin-Madison Condor and MPI Paradyn/Condor.
Cyberinfrastructure: An investment worth making Joe Breen University of Utah Center for High Performance Computing.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Miron Livny Computer Sciences Department University of Wisconsin-Madison Condor and (the) Grid (one of.
An Introduction to Campus Grids 19-Apr-2010 Keith Chadwick & Steve Timm.
Purdue RP Highlights TeraGrid Round Table May 20, 2010 Preston Smith Manager - HPC Grid Systems Rosen Center for Advanced Computing Purdue University.
How High Throughput was my cluster? Greg Thain Center for High Throughput Computing.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
Condor Project Computer Sciences Department University of Wisconsin-Madison Condor Introduction.
Northwest Indiana Computational Grid Preston Smith Rosen Center for Advanced Computing Purdue University - West Lafayette West Lafayette Calumet.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Purdue RP Highlights TeraGrid Round Table November 5, 2009 Carol Song Purdue TeraGrid RP PI Rosen Center for Advanced Computing Purdue University.
Building on virtualization capabilities for ExTENCI Carol Song and Preston Smith Rosen Center for Advanced Computing Purdue University ExTENCI Kickoff.
Campus Grids Working Meeting Report Rob Gardner University of Chicago OSG All Hands March 10, 2010.
Parag Mhashilkar (Fermi National Accelerator Laboratory)
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
What is OSG? (What does it have to do with Atlas T3s?) What is OSG? (What does it have to do with Atlas T3s?) Dan Fraser OSG Production Coordinator OSG.
CernVM and Volunteer Computing Ivan D Reid Brunel University London Laurence Field CERN.
Accessing the VI-SEEM infrastructure
GLOW A Campus Grid within OSG
Presentation transcript:

April Open Science Grid Clemson Campus Grid Sebastien Goasguen School of Computing Clemson University, Clemson, SC

April Open Science Grid Outline Campus Grid Principles and motivation A user experience and other examples Architecture

April Open Science Grid Grid Collection of resources that can be shared among users. Resources can be computing systems, storage systems, instruments…most of the focus is still on computing grid. Grid services help monitor, access and make effective use of the grid.

April Open Science Grid Campus Grid A collection of campus computing resources shared among campus users – Centralized (IT operated) – De-centralized (IT + dpt resources) HPC resources and HTC resources Evolution of Research Computing groups that exists on some campuses.

April Open Science Grid Why a Grid ? Don’t duplicate efforts – Faculty don’t really want to be managing clusters Users need more…always – First on campus, then in the nation.. Enable partnerships Generate external Funding – Building a grid is a spark to collaborative work and a partnership between IT and faculty – CI is in a lot of proposal now, and faculty can’t do it alone

April Open Science Grid Campus Compute Resources HPC (High Performance Computing) – Topsail/Emerald (UNC), Sam/HenryN/POWER5 (NCSU), Duke Shared Cluster Resource (Duke) HTC (High Throughput Computing) – Tarheel Grid, NCSU Condor pool, Duke departmental pools

April Open Science Grid Why HTC ? Because if you don’t have HPC resources, you can build a HTC resource with little investment You already have the machines in your instructional labs Even Research can happen on Windows: – Cygwin – Co-Linux – VM setup

April Open Science Grid Clemson Campus Condor Pool Back to 2007: Machines in 50 different locations on Campus ~1,700 job slots >1.8M hours served in 6 months

April Open Science Grid Clemson (circa 2007) 1085 windows machines, 2 linux machines (central and a OSG gatekeeper), condor reporting 1563 slots 845 maintained by CCIT 241 from other campus depts >50 locations From 1 to 112 machines in one location Student housing, labs, library, coffee shop Mary Beth Kurz, first condor user at Clemson: March 215,000 hours, ~110,000 jobs April 110,000 hours, ~44,000 jobs

April Open Science Grid The world before Condor 1800 input files 3 alternative genetic algorithm designs 50 replicates desired Estimated running time on 3.2 GHz machine with 1 GB RAM: 241 days Slides from Dr. Kurz

April Open Science Grid First submit file attempt Monday noon-ish Used the documentation and examples at Wisconsin condor site and created: Universe = vanilla Executable = main.exe log = re.log output = out.$(Process).out arguments = 1 llllll-0 Queue Forgot to specify Windows and Intel and also to transfer the output back (thanks David Atkinson) Got a single submit file to run 2 specific input files by mid- afternoon Tuesday Slides from Dr. Kurz

April Open Science Grid Tuesday 6 pm – submitted 1800 jobs in a Cluster Universe = vanilla Executable = MainCondor.exe requirements = Arch=="INTEL" && OpSYS=="WINNT51" should_transfer_files = YES transfer_input_files = InputData/input$(Process).ft whenToTransferOutput = ON_EXIT log = run_1/re_1.log output = run_1/re_1.stdout error = run_1/re_1.err transfer_output_remaps = "1.out = run_1/opt1-output$(Process).out" arguments = 1 input$(Process) queue ran at a time, but that eventually got resolved Slides from Dr. Kurz

April Open Science Grid Wednesday afternoon: Love notes Slides from Dr. Kurz

April Open Science Grid Since Mary-Beth….Much more Research

April Open Science Grid Bioengineering Research Replica Exchange Molecular Dynamics simulations to provide atomic-level detail about implant biocompatibility. The body's response to implanted materials is mediated by a layer of proteins that adsorbs almost immediately to the crystalline polylactide surface of the implant. Chris O’Brien Center for Advanced Engineering Fibers and Films

April Open Science Grid Atomistic Modeling Molecular dynamics simulations to predict energetic impacts inside a nuclear fusion reactor. Model ~2800 atoms Simulate 20,000 time steps per impact Damage accumulates after each impact Simulate 12,000 independent impacts to improve statistics Steve Stuart Chemistry Department

April Open Science Grid Visualization - Blender Research Experience for Undergraduates at CAEFF Render high definition frames for a movie using Blender, an open source 3D content creation suite. Used PowerPoint slides from workshop to get up and running Brian Gianforcano Rochester Institute of Technology

April Open Science Grid Anthrax Use Autodock for running molecular level simulations of the effects of using anthrax toxin receptor inhibitors May Be useful in treating cancer May be useful in treating anthrax intoxication Mike Rogers Childrens Hospital Boston

April Open Science Grid Computational Economics Three s then up and running Data envelopment analysis Linear programming methods to estimate measures of efficiency production in companies. Paul Wilson Department of Economics

April Open Science Grid How to find users ? You already know them – Biggest users in Engineering in Science – Monte-Carlo (Chemistry, Economics...) – Parameter Sweep – Rendering (Arts) – Data mining (Bioinformatics) Find a campus champion who is going to go door to door ( Yes, traveling sales man type person) Mailings to faculty, training events…

April Open Science Grid Clemson’s pool Clemson's Pool o Orignially mostly Windows, +100 locations on campus. o Now 6,000 linux slots as well o Working on 11,500 slots setup, ~120 TFlops o Maintained by Central IT o CS dpt tests new configs o Other dpt adopt the Central IT images o BOINC Backfill to maximize utilization. o Connected to OSG via an OSG CE. Total Owner Claimed Unclaimed Matched Preempting Backfill INTEL/LINUX INTEL/WINNT INTEL/WINNT SUN4u/SOLARIS X86_64/LINUX Total

April Open Science Grid Clemson’s pool history

April Open Science Grid Started with a simple pool

April Open Science Grid Then added OSG CE

April Open Science Grid Then added HPC Cluster

April Open Science Grid Then added BOINC Multi-tier job queues to fill the pool Local users, then OSG, then BOINC

April Open Science Grid Clemson’s pool BOINC backfill Put Clemson in World Community Grid, and Reached #1 on WCG in the world, contributing ~4 years per day when no local jobs are running # Turn on backfill functionality, and use BOINC ENABLE_BACKFILL = TRUE BACKFILL_SYSTEM = BOINC BOINC_Executable = C:\PROGRA~1\BOINC\boinc.exe BOINC_Universe = vanilla BOINC_Arguments = --dir $(BOINC_HOME) --attach_project cbf9dNOTAREALKEYGETYOUROWN035b4b2

April Open Science Grid Clemson’s pool BOINC backfill Reached #1 on WCG in the world, contributing ~4 years per day when no local jobs are running = Lots of pink

April Open Science Grid OSG VO through BOINC LIGO VO very little jobs to grab

April Open Science Grid Summary of main steps Deploy Condor on Windows labs – Define startup policies – Define Power usage policy if you want Deploy Condor as backfill of HPC resources Setup OSG gateway to backfill Campus Grid – Lower priority than campus users Setup BOINC to backfill Windows labs (OSG jobs don’t like windows too well…this may change with VMs)

April Open Science Grid Staffing Senior unix admin (manages central manager and OSG CE) Junior Windows admin (manages lab machines) Grad or junior staff (tester) Estimated $35k to build condor pool, since then fairly low maintenance ~.5 FTE (including OSG connectivity).

April Open Science Grid Clemson’s Grid Fall 2009 (Hopefully…)

April Open Science Grid Usual Questions Security – I don’t want outside folks to run on our machines ! (this is actually a policy issue). OSG users are well identified and can be blocked if compromised. – IP based security (only on campus folks can submit) – Submit host security (only folks with access to a submit machine can submit) Why BOINC ? – NSF sponsored project, very successful at running embarrassingly parallel apps – Always has jobs to do – Humanitarian / Philanthropy statement

April Open Science Grid Usual Questions Power – Doesn’t this use more power ? – People are looking into wake on lan setup where machines are awaken when work is ready. – Running on windows may actually be more power efficient than on HPC systems (slower but no so slow, might cost less power…) Why give to other Grid users ? – Because when you need more than what your campus can afford, I will let you run on my stuff….

April Open Science Grid Other Campus Grids CI-TEAM is a NSF award to outreach to campuses, help them build their cyberinfrastructure and make use of it as well as the national OSG infrastructure. “Embedded Immersive Engagement for Cyberinfrastructure, EIE-4CI”

April Open Science Grid Other Campus Grids Other Large Campus Pools Purdue –14,000 slots (Led by US-CMS Tier-2). GLOW in Wisconsin (Also US-CMS leadership). FermiGrid (Multiple Experiments as stakeholders). RIT and Albany have created +1,000 pools after CI-days in Albany in December 2007

April Open Science Grid Purdue is now condorizing the whole campus and soon the whole state Their CI efforts are bringing them a lot of external funding They provide great service to the local and national scientific communities

April Open Science Grid Campus Grid “levels” Small Grids (dpt size), University wide (instructional labs), Centralized resources (IT), Flocked resources. Trend towards Regional “Grids” (NWICG, NYSGRID,NJEDGE SURAGRID, LONI…) leverage OSG framework to access more resources and share there own resources.

April Open Science Grid Conclusions Resources can be integrated into a cohesive unit a.k.a “GRID” You have local knowledge to do it You have local users who need it You can persuade your administration that this is good Others have done it with great results

April Open Science Grid E N D