Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Makeflow Li Yu University of Notre Dame 1.

Similar presentations


Presentation on theme: "Introduction to Makeflow Li Yu University of Notre Dame 1."— Presentation transcript:

1 Introduction to Makeflow Li Yu University of Notre Dame 1

2 Overview 2  Distributed systems are hard to use!  An abstraction is a regular structure that can be efficiently scaled up to large problem sizes.  Today – Makeflow and Work Queue: ◦ Makeflow is a workflow engine for executing large complex workflows on clusters, grids and clouds. ◦ Work Queue is Master/Worker framework. ◦ Together they are compact, portable, data oriented, good at lots of small jobs and familiar syntax.

3 General Workflow 3 D13D12 D11D10 F3 D14 F4 D15 D16D17D18 F5 Final Output D1 F1 D2D5 … D7D6D10 F2 …

4 Makeflow 4  Makeflow is a workflow engine for executing large complex workflows on clusters, grids and clouds.  Can express any arbitrary Directed Acyclic Graph (DAG).  Good at lots of small jobs.  Data is treated as a first class citizen.  Has a syntax similar to traditional UNIX Make  It is fault-tolerant.

5 Application – Data Mining 5  Betweenness Centrality ◦ Vertices that occur on many shortest paths between other vertices have higher betweenness than those that do not. ◦ Application: social network analysis. ◦ Complexity: O(n 3 ) where ‘n’ is the number of vertices. Highest Betweenness

6 The Workflow 6 VertexNeighbors V1V2, V5… V2V10, V13 …… V5500000V1000, … algr Output1 VertexCredits V123 V22355 …… V5.5M46923 Output2 OutputN Final Output Add ……

7 Size of the Problem 7  About 5.5 million vertices  About 20 million edges  Each job computes 50 vertices (110K jobs) VertexNeighbors V1V2, V5… V2V10, V13 …… V5500000V1000, … VertexCredits V123 V22355 …… V5.5M46923 Raw : 250MB Gzipped: 93MB Raw : 30MB Gzipped: 13MB Input Data FormatOutput Data Format

8 The Result 8  Resource used:  300 Condor CPU cores  250 SGE CPU cores  Runtime:  2000 CPU Days -> 4 Days  500X speedup!

9 Application - Biocompute 9  Sequence Search and Alignment by Hashing Algorithm (SSAHA)  Short Read Mapping Package (SHRiMP)  Genome Alignment:  CGGAAATAATTATTAAGCAA | | | | | | | | | GTCAAATAATTACTGGATCG  Single nucleotide polymorphism (SNP) discovery

10 The Workflow 10 Align Matches1 Matches2 MatchesN All Matches Combine … … Query Split Read1 Reference Read1 Reference …

11 Sizes of some real workloads 11  Anopheles gambiae: 273 million bases  2.5 million reads consisting of 1.5 billion bases were aligned using SSAHA  Sorghum bicolor: 738.5 million bases  11.5 million sequences consisting of 11 billion bases were aligned using SSAHA  7 million query reads of Oryza rufipogon to the genome Oryza sativa using SHRiMP

12 Performance 12

13 Makeflow Example 13 part1 part2 part3: input.data split.py./split.py input.data out1: part1 mysim.exe./mysim.exe part1 >out1 out2: part2 mysim.exe./mysim.exe part2 >out2 out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

14 Makeflow Syntax  A Makeflow script consists of a set of rules.  Each rule specifies:  a set of target files to create;  a set of source files needed to create them;  a command that generates the target files from the source files. 14 Out1 : part1 mysim.exe./mysim.exe part1 >out1 Target file(s)Source file(s) Command

15 No Phony Rules  A correct rule: out1: part1 mysim.exe./mysim.exe part1 >out1  An incorrect rule: out1:./mysim.exe part1 >out1  Another incorrect rule: clean: rm –rf *.o 15

16 16 part1 part2 part3: input.data split.py./split.py input.data out3: part3 mysim.exe./mysim.exe part3 >out3 result: out1 out2 out3 join.py./join.py out1 out2 out3 > result

17 A Real Example – Image Processing 17 Internet 1. Download 2. Convert 3. Combine into Movie

18 Image Processing - Makeflow Script 18 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL=http://www.cse.nd.edu/~ccl/images/a.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL Comments start with ‘#’

19 Image Processing - Makeflow Script 19 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL=http://www.cse.nd.edu/~ccl/images/capitol.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL Stands for: /afs/nd.edu/user37/ ccl/software/extern al/imagemagick/bin/ convert

20 Image Processing - Makeflow Script 20 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL=http://www.cse.nd.edu/~ccl/images/capitol.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL Forces this job to run on the controlling machine.

21 Image Processing - Makeflow Script 21 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL=http://www.cse.nd.edu/~ccl/images/capitol.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL

22 Image Processing - Makeflow Script 22 # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/afs/nd.edu/user37/ccl/software/external/imagemagick/bin/convert URL=http://www.cse.nd.edu/~ccl/images/capitol.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif a.90.jpg: a.jpg $CONVERT -swirl 90 a.jpg a.90.jpg a.180.jpg: a.jpg $CONVERT -swirl 180 a.jpg a.180.jpg a.270.jpg: a.jpg $CONVERT -swirl 270 a.jpg a.270.jpg a.360.jpg: a.jpg $CONVERT -swirl 360 a.jpg a.360.jpg a.jpg: LOCAL $CURL -o a.jpg $URL

23 Get the example.makeflow script 23 % mkdir /tmp/makeflow % cd /tmp/makeflow % cp ~lyu2/Public/example.makeflow. % cat example.makeflow # This is an example of Makeflow. CURL=/usr/bin/curl CONVERT=/usr/bin/convert URL=http://www.cse.nd.edu/~ccl/images/a.jpg a.montage.gif: a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg LOCAL $CONVERT -delay 10 -loop 0 a.jpg a.90.jpg a.180.jpg a.270.jpg a.360.jpg a.270.jpg a.180.jpg a.90.jpg a.montage.gif ……

24 Setup the cctools environment (in csh) 24  Set the PATH to use cctools: % setenv PATH ~ccl/software/cctools/bin:$PATH  If the PATH is set correctly: % makeflow -h Use: makeflow [options] Where options are: -c Clean up: remove logfile and all targets. …… ……  If the PATH is NOT set correctly: % makeflow –h makeflow: Command not found.

25 Run the Makeflow Script 25  Just use the local machine: % makeflow example.makeflow  Output: makeflow: checking for duplicate targets... makeflow: DAG created. makeflow: checking rules for consistency... makeflow: Width of DAG: 4 ……………… ……………… makeflow: nothing left to do.  Now we can check if the target file - a.montage.gif is successfully created. % display a.montage.gif

26 Re-run a Makeflow Script  If you run it a second time, nothing would happen, because all of the target files are already created: % makeflow example.makeflow makeflow: nothing left to do  Use the -c option to clean everything up before trying it again: % makeflow -c example.makeflow 26

27 Run the Makeflow Script with a Distributed System 27  Use a distributed system with ‘-T’ option: ◦ ‘-T condor’: uses the Condor batch system % makeflow -T condor example.makeflow ◦ Take advantage of Condor MatchMaker BATCH_OPTIONS=Requirements=(Memory>1024)\n Arch= x86_64 ◦ ‘-T sge’: uses the Sun Grid Engine % makeflow -T sge example.makeflow ◦ ‘-T wq’: uses the Work Queue framework % makeflow -T wq example.makeflow

28 Makeflow with Work Queue 28 Start workers on local machines, clusters, via campus grid, etc. Worker Makeflow InputApp Output App put App put Input work “App Output” get Output exec DAG

29 Ways of starting workers  Start one worker on your local machine work_queue_worker hostname port  Start some Condor workers condor_submit_workers hostname port  Start some SGE workers sge_submit_workers hostname port 29

30 Make Your Own Cloud 30 Condor SGE Makeflow –T wq example.makeflow Cloud 1100 cores unlimited 4000 cores (but you can only have 250)

31 Setup Condor environment 31  Set the PATH to use condor: % setenv PATH ~condor/software/bin:$PATH  If the PATH is set correctly: % condor_q -- Submitter: cclsubmit00.cse.nd.edu: : cclsubmit00.cse.nd.edu ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 idle, 0 running, 0 held  If the PATH is NOT set correctly: % condor_q condor_q: Command not found.

32 Re-run the makeflow with Work Queue 32 Go to the experiment directory and clean things up: % cd /tmp/makeflow % makeflow –c example.makeflow Run the example with Work Queue: % condor_submit_workers `hostname` 9123 5 % makeflow –T wq example.makeflow

33 Google “Makeflow” 33

34 Contact us  Li Yu  lyu2@nd.edu lyu2@nd.edu  Peter Bui  pbui@nd.edu pbui@nd.edu  Prof. Douglas Thain  dthain@nd.edu dthain@nd.edu  Cooperative Computing Lab  http://www.cse.nd.edu/~ccl http://www.cse.nd.edu/~ccl 34


Download ppt "Introduction to Makeflow Li Yu University of Notre Dame 1."

Similar presentations


Ads by Google