Presentation is loading. Please wait.

Presentation is loading. Please wait.

Doug Haigh, SAS Institute Inc.

Similar presentations


Presentation on theme: "Doug Haigh, SAS Institute Inc."— Presentation transcript:

1 Doug Haigh, SAS Institute Inc.
Divide and Conquer Writing Parallel SAS® Code to Speed Up Your SAS Program Doug Haigh, SAS Institute Inc. Copyright © 2010, SAS Institute Inc. All rights reserved.

2 Introduction Have you ever wanted to
Text and drive at the same time? Watch the big game and read a book at the same time? Be on vacation at the beach and get work done at the office? Humans are not good at doing two things at the same time but your SAS code can be

3 Introduction Parallel code is when two or more streams of execution occur at nearly the same time Time

4 Introduction Parallel SAS code requires SAS/CONNECT
One CONNECT client to many CONNECT servers Parallel SAS code using SAS/CONNECT created by SAS Data Integration Studio SAS Enterprise Miner SAS Enterprise Guide / PROC SCAPROC SCAPROC = SAS Code Analyzer #SASGF15

5 Background SIGNON / SIGNOFF RSUBMIT / ENDRSUBMIT
Establish/terminate connection to CONNECT server on Same machine Remote machine SAS Grid machine RSUBMIT / ENDRSUBMIT Sends SAS code to CONNECT server for processing May or may not wait for code to complete Same machine (SASCMD SIGNON) Remote machine (Spawner SIGNON) SAS Grid machine (grid-enabled SIGNON – SAS Grid Manager) #SASGF15

6 Simple SIGNON OPTIONS SASCMD="!SASCMD";
%let mySess=mySpawnerHost.myDomain.com 1234; %sysfunc(grdsvc_enable(mySess,server=SASApp)); SIGNON mySess; RSUBMIT mySess; data _NULL_;rc=sleep(5,1);run; ENDRSUBMIT; SIGNOFF mySess; #SASGF15

7 Simple SIGNON SIGNON RSUBMIT SIGNOFF CONNECT Client CONNECT Server(s)
Time #SASGF15

8 Multiple SIGNONs Synchronous
SIGNON mySess1; SIGNON mySess2; RSUBMIT mySess1; data _NULL_;rc=sleep(5,1);run; ENDRSUBMIT; RSUBMIT mySess2; SIGNOFF mySess1; SIGNOFF mySess2; Code runs on two CONNECT servers but is not parallel. SIGNON to mySess2 waits for SIGNON to mySess1 RSUBMIT to mySess1 waits for SIGNON to mySess2 RSUBMIT to mySess2 waits for SIGNON to mySess1 #SASGF15

9 Multiple SIGNONs Synchronous
RSUBMIT SIGNOFF Code runs on two CONNECT servers but is not parallel. SIGNON to mySess2 waits for SIGNON to mySess1 RSUBMIT to mySess1 waits for SIGNON to mySess2 RSUBMIT to mySess2 waits for SIGNON to mySess1 SIGNOFF to mySess2 waits for SIGNOFF to mySess1 #SASGF15

10 Multiple SIGNONs Asynchronous
SIGNON mySess1 SIGNONWAIT=NO; SIGNON mySess2 SIGNONWAIT=NO; RSUBMIT mySess1 WAIT=NO; data _NULL_;rc=sleep(5,1);run; ENDRSUBMIT; RSUBMIT mySess2 WAIT=NO; SIGNOFF _ALL_; Better, but still not ideal Everything blocks when RSUBMIT to mySess1 is encountered since RSUBMIT cannot do anything until SIGNON completes. RSUBMIT to mySess2 cannot occur even if SIGNON to mySess2 is ready if SIGNON to mySess1 has not completed. #SASGF15

11 Multiple SIGNONs Asynchronous
RSUBMIT SIGNOFF Better, but still not ideal Everything blocks when RSUBMIT to mySess1 is encountered since RSUBMIT cannot do anything until SIGNON completes. RSUBMIT to mySess2 cannot occur even if SIGNON to mySess2 is ready if SIGNON to mySess1 has not completed. #SASGF15

12 Multiple SIGNONs Asynchronous
RSUBMIT SIGNOFF Worst case – code for mySess2 has to wait for signon to mySess1 even though mySess2 is ready #SASGF15

13 Reusing a Session SIGNON mySess1 SWAIT=NO; SIGNON mySess2 SWAIT=NO; RSUBMIT mySess1 WAIT=NO; data _NULL_;rc=sleep(10,1);run; ENDRSUBMIT; RSUBMIT mySess2 WAIT=NO; data _NULL_;rc=sleep(5,1);run; WAITFOR _ALL_ mySess1; SIGNOFF _ALL_; Added WAITFOR to wait for _ALL_ executing sessions to finish Better parallelism but still suffers due to length of time mySess1 takes. Would be better if code Had used mySess2 instead of mySess1 Had only waited for mySess2 to finish #SASGF15

14 Reusing a Session SIGNON RSUBMIT SIGNOFF
Added WAITFOR to wait for _ALL_ executing sessions to finish Better parallelism but still suffers due to length of time mySess1 takes. Would be better if code Had used mySess2 instead of mySess1 Had only waited for mySess2 to finish #SASGF15

15 Reusing a Session SIGNON RSUBMIT SIGNOFF
Worst case where rsubmit is coded to go to mySess1 even though it could have been processed on mySess2 earlier #SASGF15

16 Reusing an Available Session
SIGNON mySess1 SWAIT=NO; SIGNON mySess2 SWAIT=NO; RSUBMIT mySess1 WAIT=NO CMACVAR=myVar1; data _NULL_;rc=sleep(10,1);run; ENDRSUBMIT; RSUBMIT mySess2 WAIT=NO CMACVAR=myVar2; data _NULL_;rc=sleep(5,1);run; WAITFOR _ANY_ mySess1 mySess2; %determineAvailableSession(2); RSUBMIT mySess&openSess WAIT=NO; SIGNOFF _ALL_; Added CMACVAR to tell what macro variable to update with RSUBMIT progress determineAvailableSession to determine which session completed. Macro Variable values 0 – SIGNON, RSUBMIT completed 1 – SIGNON, RSUBMIT failed 2 – Already signed on, RSUBMIT in progress 3 – SIGNON in progress Much better parallelism WAITFOR _ANY_ waits for first available session to complete determineAvailableSession tells which session to use next #SASGF15

17 Reusing an Available Session
%macro determineAvailableSession(numSessions); %global openSess; %do sess=1 %to &numSessions; %if (&&myVar&sess eq 0) %then %do; %let openSess=&sess; %let sess=&numSessions; %end; %mend; Loops through all session macro variables look for one that is complete #SASGF15

18 Reusing an Available Session
SIGNON RSUBMIT SIGNOFF Better parallelism but still suffers due to length of time SIGNONs take. #SASGF15

19 Reusing an Available Session
SIGNON RSUBMIT SIGNOFF Worse case – SIGNON to mySess2 is finished before mySess1 so initial code that is directed at specific host has to wait Would be better if code Had used mySess2 instead of mySess1 Had only waited for mySess2 to finish #SASGF15

20 Reusing the Best Available Session
SIGNON mySess1 SWAIT=NO CMACVAR=mySignonVar1; … SIGNON mySessN SWAIT=NO CMACVAR=mySignonVarN; %waitForAvailableSession(N); RSUBMIT mySess&openSess WAIT=NO CMACVAR=myVar&openSess; data _NULL_;rc=sleep(10,1);run; ENDRSUBMIT; data _NULL_;rc=sleep(1,1);run; SIGNOFF _ALL_; Best use of parallelism Code uses first available session in all cases PROC SCAPROC spits out code like this Challenges How to do one-time initialization #SASGF15

21 Reusing the Best Available Session
%macro waitForAvailableSession(numSessions); %global openSessID; %let sessFound=0; %do %while (&sessFound eq 0); %do sess=1 %to &numSessions; %if (&&mySignonVar&sess eq 0) %then %if (&&myVar&sess eq 0) %then %do; %let openSess=&sess; %let sess=&numSessions; %let sessFound=1; %end; %if (&sessFound eq 0) %then %let rc=%sysfunc(sleep(1,1)); %mend; This will need to make sure you initialize the macro variables mySignonVarX init to 3 myVarX init to 0 #SASGF15

22 Reusing the Best Available Session
SIGNON RSUBMIT SIGNOFF Best parallelism #SASGF15

23 How about a macro to do all of this…
Perform SIGNONs as needed SASCMD or Grid Retry SIGNONs if one fails Manage RSUBMITs to available hosts Retry RSUBMITs if one fails Display progress of RSUBMITs SIGNOFF hosts when no more work exists user when done

24 %Distribute Determine code that needs to be executed once when SIGNON completes LIBNAME, FILENAME Create code that can run at each iteration Base iteration differences on macro variables provided Rem_Host, Rem_iHost, Rem_Seed, Rem_NIterAll, Rem_Niter, Rem_JobIters, Rem_JobID, GlobalNSub Setup %Distribute parameters Run ** Predefined ** global macro Contents ** ** Rem_Host Name of this host ** Rem_iHost Index of this host out of all hosts ** Rem_Seed Random seed, rerandomized for each chunk ** Rem_NIterAll Total number of iterations ** Rem_NIter Number of Iterations in a chunk ** Rem_JobIters Number of Iterations in this chunk * ** Rem_JobID Chunk number * ** GlobalNSub Number of iterations already submitted * ** ** * = Available only in TaskRSub #SASGF15

25 %Distribute Signing on... GridDistribute: Maximum number of nodes is 4 Processing... GridDistribute: Signing on to Host #1 GridDistribute: Signing on to Host #2 GridDistribute: Signing on to Host #3 GridDistribute: Signing on to Host #4 Stat: [0:00:00] ???? (0/0) GridDistribute: Host #1 is host2.mydomain.com GridDistribute: Host #2 is host4.mydomain.com GridDistribute: Host #3 is host1.mydomain.com GridDistribute: Host #4 is host3.mydomain.com Stat: [0:00:02] !!!. (0/0) Stat: [0:00:02] .... (8000/0) Stat: [0:00:05] !!!! (8000/8000) <similar lines deleted> Stat: [0:00:14] ...! (100000/94000) #SASGF15

26 Summary Writing parallel SAS code can significantly speed up processing Some SAS products will do it for you See the paper for discussion of additional considerations Information movement Data movement Output management RSUBMIT and the SAS Macro Facility SCAPROC = SAS Code Analyzer #SASGF15

27 Questions?

28 Session ID #1935

29

30 Additional Considerations
Information movement %SYSLPUT / %SYSRPUT for macro variables %SYSLPUT remVar=&localVar /REMOTE=mySess1; RSUBMIT mySess1; %SYSRPUT localVar=&remVar; ENDRSUBMIT;

31 Additional Considerations
Data Movement Shared file system / RDBMS PROC UPLOAD/DOWNLOAD RLS Output Movement Log and List files LOG=, LIST= PROC PRINTTO #SASGF15

32 Additional Considerations
RSUBMIT and the SAS Macro Facility RSUBMIT mySess1; %SYSRPUT localVar=&remVar; ENDRSUBMIT; needs to be quoted %NRSTR(%%)SYSRPUT localVar=&remVar; or wrapped in a macro %MACRO updateVar; %MEND; %updateVar;


Download ppt "Doug Haigh, SAS Institute Inc."

Similar presentations


Ads by Google