Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.

Slides:



Advertisements
Similar presentations
Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Advertisements

Workflows: from Development to Automated Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin -
Lecture 11: Operating System Services. What is an Operating System? An operating system is an event driven program which acts as an interface between.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Intermediate Condor: DAGMan Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy University of California at Santa Barbara.
External Sorting CS634 Lecture 10, Mar 5, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Advanced Database Technology February 12, 2004 DATA STORAGE (Lecture based on [GUW ], [Sanders03, ], and [MaheshwariZeh03, ])
Using Secondary Storage Effectively In most studies of algorithms, one assumes the "RAM model“: –The data is in main memory, –Access to any item of data.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
BLOCK DIAGRAM OF COMPUTER
CS 346 – Chapter 8 Main memory –Addressing –Swapping –Allocation and fragmentation –Paging –Segmentation Commitment –Please finish chapter 8.
HTCondor workflows at Utility Supercomputing Scale: How? Ian D. Alderman Cycle Computing.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
Workflows: from Development to Production Thursday morning, 10:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Workflows: from development to Production Thursday morning, 10:00 am Greg Thain University of Wisconsin - Madison.
Dealing with real resources Wednesday Afternoon, 3:00 pm Derek Weitzel OSG Campus Grids University of Nebraska.
Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.
Turning science problems into HTC jobs Wednesday, July 29, 2011 Zach Miller Condor Team University of Wisconsin-Madison.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
1 CS/COE0447 Computer Organization & Assembly Language CHAPTER 4 Assessing and Understanding Performance.
Intermediate Condor: Workflows Rob Quick Open Science Grid Indiana University.
Intermediate HTCondor: More Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
Intermediate Condor: Workflows Monday, 1:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
O PERATING S YSTEM. What is an Operating System? An operating system is an event driven program which acts as an interface between a user of a computer,
Lecture Topics: 11/24 Sharing Pages Demand Paging (and alternative) Page Replacement –optimal algorithm –implementable algorithms.
Peter F. Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Managing Job.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
A year & a summer of June – August
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Condor Project Computer Sciences Department University of Wisconsin-Madison Running Interpreted Jobs.
Computer Performance. Hard Drive - HDD Stores your files, programs, and information. If it gets full, you can’t save any more. Measured in bytes (KB,
Building the International Data Placement Lab Greg Thain Center for High Throughput Computing.
Five todos when moving an application to distributed HTC.
CSE 351 Caches. Before we start… A lot of people confused lea and mov on the midterm Totally understandable, but it’s important to make the distinction.
Getting the Most out of HTC with Workflows Friday Christina Koch Research Computing Facilitator University of Wisconsin.
Turning science problems into HTC jobs Tuesday, Dec 7 th 2pm Zach Miller Condor Team University of Wisconsin-Madison.
Memory Management Damian Gordon.
Christina Koch Research Computing Facilitators
Getting the Most out of Scientific Computing Resources
Intermediate HTCondor: More Workflows Monday pm
Condor DAGMan: Managing Job Dependencies with Condor
Operations Support Manager - Open Science Grid
Getting the Most out of Scientific Computing Resources
Memory Management.
Intermediate HTCondor: Workflows Monday pm
Examples Example: UW-Madison CHTC Example: Global CMS Pool
CSC 322 Operating Systems Concepts Lecture - 12: by
Advanced QlikView Performance Tuning Techniques
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
BIMSB Bioinformatics Coordination
Troubleshooting Your Jobs
Haiyan Meng and Douglas Thain
What’s Different About Overlay Systems?
CS246 Search Engine Scale.
CS246: Search-Engine Scale
Overview of Workflows: Why Use Them?
PU. Setting up parallel universe in your pool and when (not
Thursday AM, Lecture 1 Lauren Michael
Presentation transcript:

Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison

2012 OSG User School Overview From script…  Pragmatics  Estimation To production 2

2012 OSG User School From schematics… 3

2012 OSG User School … to the real world 4

2012 OSG User School It starts with a script.. #!/bin/sh # Comment while read n do md5sum $n done > output 5

2012 OSG User School Scriptify as much as possible What is the minimal number of manual steps? Even 1 might be too many. Zero is perfect! 6

2012 OSG User School First run locally: To measure usage Did you get all the inputs? Did it run correctly?  Are you sure? Run once remotely  NOT ON SUBMIT MACHINE!  Might be surprised Once working, run a couple of times If big variance, should you take the…  Average? Median? Worst case? 7

2012 OSG User School Estimations: Orders of Magnitude Don’t sweat the small stuff  AMD == Intel == 2.4 Ghz == 3.6 Ghz But pay attention to the big stuff: What makes this hard?  From nanoseconds to terabytes Powers of Ten, by the Eames 8

2012 OSG User School Resources Jobs Need CPU  Wall Clock vs. CPU cycles Disk  Working (execute side)  Total (submit side)  Bandwidth  File transfer queue time Network bandwidth  Usually for file transfer only 9

2012 OSG User School User Log shows all 005 ( ) 06/07 14:12:55 Job terminated. (1) Normal termination (return value 0) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 00:00:00 - Total RemoteUsage Usr 0 00:00:00, Sys 0 00:00:00 - Total Local Usage 5 - Run Bytes Sent By Job Run Bytes Received By Job 5 - Total Bytes Sent By Job Total Bytes Received By Job Partitionable Resources : Usage Request Cpus : 1 Disk (KB) : Memory (MB) :

2012 OSG User School High Throughput What Isn’t High Throughput?  Quick starting jobs  Very short job runtimes  Micro optimizations What is?  Constant job pressure  Many jobs  Long total workflow times 11

2012 OSG User School Rules of Thumb for OSG Jobs CPU (single-threaded)  Use between 5 minutes and 2 hours wall  Upper limit somewhat soft Disk  Keep scratch working space < 20 Gb  Minimize OSG_APP_DATA usage  Submit disk usage depends on machine  Intermediate needs vs Sinks/Sources 12

2012 OSG User School Rules of Thumb (cont.) Network  Primarily file I/O  Fill in FILE SIZE / RUN TIME HERE Use squid caching where appropriate 13

2012 OSG User School How to apply rules of Thumb Not always clear cut Need to keep rules of thumb in mind May need to join many short processes  Easy with shell wrapper  But careful with error conditions Or break up one big one… 14

2012 OSG User School Golden Rules for DAGs Beware of the shish kabob! Use a single user log Use PRE and POST script RETRY is your friend DAGs of DAGs are good 15

2012 OSG User School Do you have the full DAG Are there manual steps (esp. at beginning and end) that could be automated? 16

2012 OSG User School Start with “Conceptual DAG” 17 A B1 D B2 B3 C1 C2 C3

2012 OSG User School Map to Actual workflow based on resource usage 18 A B1 D B2 B3 C1 C2 C3

2012 OSG User School Two transforms Merging nodes Splitting nodes 19

2012 OSG User School Merging is easy Scripting  Avoids transfer of intermediate files  Careful with error conditions  Debugging can be a bit tricky 20

2012 OSG User School Breaking up is hard to do… Ideally into parallel jobs not always possible Often need to checkpoint Standard universe can help User-defined checkpointing Checkpoint images can be hard to manage 21

2012 OSG User School Putting it all together Should have one functional dag With appropriate run times 22

2012 OSG User School Questions? Questions? Comments?  Feel free to ask me questions later: Upcoming sessions 23