Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.

Slides:



Advertisements
Similar presentations
Cluster Computing at IQSS Alex Storer, Research Technology Consultant.
Advertisements

Workflows: from Development to Automated Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin -
Overview of Wisconsin Campus Grid Dan Bradley Center for High-Throughput Computing.
Blackbird: Accelerated Course Archives Using Condor with Blackboard Sam Hoover, IT Systems Architect Matt Garrett, System Administrator.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
CS351 © 2003 Ray S. Babcock Software Testing What is it?
COMP 110 Introduction to Programming Mr. Joshua Stough August 22, 2007 Monday/Wednesday/Friday 3:00-4:15 Gardner Hall 307.
Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.
High Throughput Computing with Condor at Notre Dame Douglas Thain 30 April 2009.
COMP 14 – 02: Introduction to Programming Andrew Leaver-Fay August 31, 2005 Monday/Wednesday 3-4:15 pm Peabody 217 Friday 3-3:50pm Peabody 217.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Big Projects  Part of this class is about picking a cool software project and building it 1.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Zach Miller Condor Project Computer Sciences Department University of Wisconsin-Madison Flexible Data Placement Mechanisms in Condor.
When and How to Use Large-Scale Computing: CHTC and HTCondor Lauren Michael, Research Computing Facilitator Center for High Throughput Computing STAT 692,
1 Uploading and Publishing New Tools Michael McLennan Software Architect HUBzero™ Platform for Scientific Collaboration This work licensed under Creative.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
HTPC - High Throughput Parallel Computing (on the OSG) Dan Fraser, UChicago OSG Production Coordinator Horst Severini, OU (Greg Thain, Uwisc) OU Supercomputing.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
Workflow Management in Condor Gökay Gökçay. DAGMan Meta-Scheduler The Directed Acyclic Graph Manager (DAGMan) is a meta-scheduler for Condor jobs. DAGMan.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
1 Project Information and Acceptance Testing Integrating Your Code Final Code Submission Acceptance Testing Other Advice and Reminders.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, UWisc Condor Week April 13, 2010.
Compiled Matlab on Condor: a recipe 30 th October 2007 Clare Giacomantonio.
Workflows: from Development to Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Turning science problems into HTC jobs Wednesday, July 20 th 9am Zach Miller & Greg Thain Condor Team University of Wisconsin-Madison.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Alain Roy Computer Sciences Department University of Wisconsin-Madison I/O Access in Condor and Grid.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Cluster 2004 San Diego, CA A Client-centric Grid Knowledgebase George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison September 23 rd,
HTCondor and Workflows: An Introduction HTCondor Week 2015 Kent Wenger.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
ASSIGNMENT 2 Salim Malakouti. Ticketing Website  User submits tickets  Admins answer tickets or take appropriate actions.
Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
How to configure, build and install Trilinos November 2, :30-9:30 a.m. Jim Willenbring.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
INF3110 Group 2 EXAM 2013 SOLUTIONS AND HINTS. But first, an example of compile-time and run-time type checking Imagine we have the following code. What.
Building the International Data Placement Lab Greg Thain Center for High Throughput Computing.
1 Getting Started with OSG Connect ~ an Interactive Tutorial ~ Emelie Harstad, Mats Rynge, Lincoln Bryant, Suchandra Thapa, Balamurugan Desinghu, David.
Campus Grid Technology Derek Weitzel University of Nebraska – Lincoln Holland Computing Center (HCC) Home of the 2012 OSG AHM!
Five todos when moving an application to distributed HTC.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
Greg Thain Computer Sciences Department University of Wisconsin-Madison HTPC on the OSG.
UCS D OSG Summer School 2011 Life of an OSG job OSG Summer School A peek behind the scenes The life of an OSG job by Igor Sfiligoi University of.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
Christina Koch Research Computing Facilitators
Intermediate HTCondor: More Workflows Monday pm
Condor DAGMan: Managing Job Dependencies with Condor
Operations Support Manager - Open Science Grid
Licenses and Interpreted Languages for DHTC Thursday morning, 10:45 am
Intermediate HTCondor: Workflows Monday pm
Work report Xianghu Zhao Nov 11, 2014.
Moving CHTC from RHEL 6 to RHEL 7
Troubleshooting Your Jobs
Brandon Scheirman, Software Trainer
Job Submission Via File Transfer
PU. Setting up parallel universe in your pool and when (not
Thursday AM, Lecture 1 Lauren Michael
Presentation transcript:

Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison

2012 OSG User School Overview Your DAG runs (once)! Now what?  Need to make it run everywhere  Need to make it run everytime  Need to make it run unattended 2

2012 OSG User School Brief note about scientific research If others can’t reproduce your work, it isn’t real research!  Work hard to make this happen.  11% of published cancer research reproducible! 3

2012 OSG User School The expanding onion Laptop (1 machine)  You control everything! Local cluster (1000 cores)  You can ask an admin nicely Campus (5000 cores)  You better have something for the admin OSG (50,000 cores)  You don’t even know the admins 4

2012 OSG User School Making it run everywhere What does an OSG machine have?  Assume the worst: nothing Bring as much as possible with you:  Won’t that slow me down? Bring:  Executable  Environment  Random numbers  Tools 5

2012 OSG User School Bringing it with you: Matlab What’s the problem with matlab?  Licenses What’s the solution?  “compiling” 6

2012 OSG User School How to bring Matlab along 1)Purchase & install matlab “compiler” 2)Run compiler as follows: $ mcc -m -R -singleCompThread -R -nodisplay -R -nojvm –nocache foo.m 3) This creates run_foo.sh (et. al.) 4) Create tarball of the runtime $ cd /usr/local/mathworks-R2009bSP1 ; cd.. $ tar cvzf ~/m.tar.gz matchworks-R2009bSP1 7

2012 OSG User School More matlab 5) Edit the run_foo.sh tar xzf m.tgz mkdir cache chmod 0777 cache export MCR_CACHE_ROOT=`pwd`/cache Make a submit file: universe = vanilla executable = run_foo.sh arguments =./mathworks-R2009bSP1 should_transfer_files = yes when_to_transfer_output = on_exit transfer_input_files = m.tgz, foo queue 8

2012 OSG User School Final notes on matlab Cache the runtime for extra credit Other interpreters similar (R, Python, etc) 9

2012 OSG User School But I can’t make it work everywhere Using  Request_memory  Request_disk  Request_cpus GLIDEIN whitelist/blacklist if a site is somehow bad.  But talk to the GOC first 10

2012 OSG User School Improved storage Listen to Derek’s methods:  Sandbox  Caching  Prestaging  SE horsepower 11

2012 OSG User School Making it work everytime What could possibly go wrong?  Eviction  File corruption  Performance suprises  Network  Paging  Disk  …  Maybe even a bug in your code 12

2012 OSG User School Performance Surprises One bad node can ruin your whole day “Black Hole” machines  Avoidance tricks and their autoclustering costs Using PERIODIC_HOLD / RELEASE  To avoid ill performing jobs/machines 13

2012 OSG User School File Corruption If you don’t check, it will happen…  ETL Running sha1 yourself on both sides DAG PRE and POST scripts  “Trust, but verify” Example here 14

2012 OSG User School What to do if a check fails Understand something about failure Use DAG RETRY Let rescue dag continue…  Workflow specific 15

2012 OSG User School Running unattended This is the ultimate goal! Need to automate:  Data collection  Data cleansing  Submission (condor cron)  Analysis and verification  LaTex and paper submission 16

2012 OSG User School If this were a test… 20 points for finishing at all 10 points for the right answer 1 point for every error condition checked 17

2012 OSG User School Questions? Questions? Comments?  Feel free to ask me questions later: 18