Presentation is loading. Please wait.

Presentation is loading. Please wait.

Controlling Input and Output

Similar presentations


Presentation on theme: "Controlling Input and Output"— Presentation transcript:

1 Controlling Input and Output
3 Controlling Input and Output Outputting Multiple Observations Writing to Multiple SAS Data Sets Selecting Variables and Observations

2 Outputting Multiple Observations
3 Outputting Multiple Observations

3 A Forecasting Application
The growth rate of six departments is stored in the Increase variable in the data set orion.growth. If each department grows at its predicted rate for the next two years, how many employees will be in each department at the end of each year? libname orion "/folders/myfolders/sasdata/prg2" ; proc print data=orion.growth; run;

4 The output SAS data set, forecast, should contain 12 observations
The output SAS data set, forecast, should contain 12 observations. Two observations are written for each observation read.

5 The end of a DATA step iteration.
data forecast; set orion.growth; total_employees= total_employees * (1+increase); run; Implicit OUTPUT; Implicit RETURN; b. Implicit output; implicit return.

6 Explicit Output The explicit OUTPUT statement writes the contents of the program data vector (PDV) to the data set or data sets being created. The presence of an explicit OUTPUT statement overrides implicit output. data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; No Implicit OUTPUT;

7 Compilation and Execution of the Data Step

8 Compilation PDV – Program Data Vector Department $ 40 Total_ Employees
data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; PDV – Program Data Vector Department $ 40 Total_ Employees N 8 Increase ...

9 Compilation PDV Department $ 40 Total_ Employees N 8 Increase Year ...
data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; PDV Department $ 40 Total_ Employees N 8 Increase Year The program is scanned to see what other variables are needed. In this case Year is the only other variable. ...

10 Compilation work.forecast, descriptor portion PDV
data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Write descriptor portion of output data set PDV Department $ 40 Total_ Employees N 8 Increase Year At the end of compilation, the descriptor portion of the output data set is written. work.forecast, descriptor portion Department $ 40 Total_ Employees N 8 Increase Year ...

11 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV Initialize PDV PDV Department $ 40 Total_ Employees N 8 Increase Year . ...

12 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV Read first observation form the input data set. PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 34 0.25 . ...

13 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 34 0.25 1 ...

14 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV 34 * ( ) PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 42.5 0.25 1 ...

15 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Output current observation Initialize PDV PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 42.5 0.25 1 ...

16 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 42.5 0.25 2 ...

17 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV 42.5 * ( ) PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 53.125 0.25 2 ...

18 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV Output current observation PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 53.125 0.25 2

19 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV No implicit output because there is an explicit OUTPUT statement. No Implicit OUTPUT; PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 53.125 0.25 2

20 The forecast data set contains two observations after the first iteration of the DATA step.
work.forecast Department Total_ Employees Increase Year Administration 42.500 0.25 1 53.125 2

21 Prior to the second iteration of the DATA step, some variables in the program data vector will be reinitialized. data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 53.125 0.25 2

22 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Reinitialize PDV Initialize PDV Year is set to missing. PDV Department $ 40 Total_ Employees N 8 Increase Year Administration 53.125 0.25 . ...

23 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV Read the next observation from input data set orion.growth. PDV Department $ 40 Total_ Employees N 8 Increase Year Engineering 9 0.30 . ...

24 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Initialize PDV PDV Department $ 40 Total_ Employees N 8 Increase Year Engineering 9 0.30 1 ...

25 Execution: Explicit Output
orion.growth Department Total_Employees Increase Administration 34 0.25 Engineering 9 0.30 IS 25 0.10 data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); output; Year=2; run; Continue until EOF Initialize PDV PDV Department $ 40 Total_ Employees N 8 Increase Year Engineering 9 0.30 1

26 Modify the DATA step to write only one observation per department, showing the number of employees after two years. data forecast; set orion.growth; Year=1; Total_Employees=Total_Employees*(1+Increase); Year=2; output; run;

27 Writing to Multiple SAS Data Sets
3

28 Use the orion.employee_addresses data set as input to create three new data sets: usa, australia, and other. The usa data set will contain observations with a Country value of US. The australia data set will contain observations with a Country value of AU. Observations with any other Country value will be written to the other data set.

29 The Input Data Set libname orion "/folders/myfolders/sasdata/prg2";
proc print data=orion.employee_addresses; var employee_name city country; run;

30 Creating Multiple DATA Sets
Multiple data sets can be created in a DATA step by listing the names of the output data sets in the DATA statement. You can direct output to a specific data set or data sets by listing the data set names in the OUTPUT statement. An OUTPUT statement without arguments writes to every SAS data set listed in the DATA statement. DATA <SAS-data-set-1 … SAS-data-set-n>; OUTPUT <SAS-data-set-1 … SAS-data-set-n>;

31 Creating Multiple SAS Data Sets
libname orion "/folders/myfolders/sasdata/prg2"; data usa australia other; set orion.employee_addresses; if Country='AU' then output australia; else if Country='US' then output usa; else output other; run; Check the SAS Log

32 Efficient Conditional Processing
data usa australia other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other; run; It is more efficient to check values in order of decreasing frequency. NOTE: There were 424 observations read from the data set ORION.EMPLOYEE_ADDRESSES. NOTE: The data set WORK.USA has 311 observations and variables. NOTE: The data set WORK.AUSTRALIA has 105 observations and variables. NOTE: The data set WORK.OTHER has 8 observations and variables.

33 Displaying Multiple SAS Data Sets
The PRINT procedure can only print one data set. A separate PROC PRINT step is required for each data set. title 'Employees in the United States'; proc print data=usa; run; title 'Employees in Australia'; proc print data=australia; title 'Non US and AU Employees'; proc print data=other; title;

34 Using a SELECT Group An alternate form of conditional execution uses a SELECT group. The select-expression specifies any SAS expression that evaluates to a single value. Often a variable name is used as the select-expression. SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>) statement; <…WHEN-n (value-1 <…,value-n>) statement;> <OTHERWISE statement;> END; Use at least one WHEN expression in a SELECT group.

35 Using a SELECT Group The previous task could be rewritten using a SELECT group: SELECT processes the WHEN statements from top to bottom, so it is more efficient to check the values in order of decreasing frequency. data usa australia other; set orion.employee_addresses; select (Country); when ('US') output usa; when ('AU') output australia; otherwise output other; end; run;

36 The OTHERWISE Statement
The OTHERWISE statement is optional, but omitting it will result in an error when all WHEN conditions are false. Use OTHERWISE followed by a null statement to prevent SAS from issuing an error message. SELECT <(select-expression)>; WHEN-1 (value-1 <…,value-n>) statement; <…WHEN-n (value-1 <…,value-n>) statement;> <OTHERWISE statement;> END; otherwise;

37 set orion.employee_addresses; select (Country);
data usa australia; set orion.employee_addresses; select (Country); when ('US') output usa; when ('AU') output australia; end; run; An OTHERWISE statement is needed. There are values besides 'US' and 'AU', so there is a chance that none of the WHEN conditions will be true. In this case an OTHERWISE is required.

38 set orion.employee_addresses; select (Country);
data usa australia; set orion.employee_addresses; select (Country); when ('US') output usa; when ('AU') output australia; otherwise; end; run; An OTHERWISE statement is needed. There are values besides 'US' and 'AU', so there is a chance that none of the WHEN conditions will be true. In this case an OTHERWISE is required. An OTHERWISE statement is required.

39 Test for Multiple Values in a WHEN Statement
data usa australia other; set orion.employee_addresses; select (Country); when ('US','us') output usa; when ('AU','au') output australia; otherwise output other; end; run;

40 Using Functions in a Select Expression
data usa australia other; set orion.employee_addresses; select (upcase(Country)); when ('US') output usa; when ('AU') output australia; otherwise output other; end; run;

41 Omitting the Select Expression
The select-expression can be omitted in a SELECT group: Each WHEN expression evaluates to true or false. If true, the associated statement(s) are executed. If false, SAS proceeds to the next WHEN statement. If all WHEN expressions are false, then the statement(s) following the OTHERWISE statement will execute. SELECT; WHEN (expression-1) statement; <…WHEN (expression-n) statement;> <OTHERWISE statement;> END; If more than one WHEN expression is true, only the first WHEN statement is used; once a when-expression is true, no other when-expressions are evaluated.

42 Omitting the Select Expression
data usa australia other; set orion.employee_addresses; select; when (country='US') output usa; when (country='AU') output australia; otherwise output other; end; run;

43 Using DO-END in a SELECT Group
data usa australia other; set orion.employee_addresses; select (country); when ('US') do; Benefits=1; output usa; end; when ('AU') do; Benefits=2; output australia; otherwise do; Benefits=0; output other; run; Same results as previous examples.

44 Selecting Variables and Observations
3 Selecting Variables and Observations

45 Create three data sets that are subsets of orion
Create three data sets that are subsets of orion.employee_addresses based on the value of the Country variable. Name the data sets usa, australia, and other, and write different variables to each output data set.

46 Controlling Variable Output (Review)
By default, the SAS System writes all variables from the input data set to every output data set. In the DATA step, the DROP and KEEP statements can be used to control which variables are written to output data sets.

47 The DROP and KEEP Statements (Review)
Input SAS data set Raw data file PDV DROP and KEEP statements Output SAS data set

48 Display Information About the Variables
libname orion "/folders/myfolders/sasdata/prg2"; proc contents data=orion.employee_addresses; run;

49 The DROP Statement Use a DROP statement to drop the variable Street_ID. data usa australia other; drop Street_ID; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other; run;

50 Check the SAS Log Street_ID was dropped from every output data set. Each output data set has eight variables. Street-ID was dropped from every output data set.

51 Controlling Variable Output
What if you want to drop Street_ID and Country from usa, drop Street_ID, Country, and State from australia, and keep all variables in other?

52 The DROP and KEEP statements affect every data set named on the DATA statement. We need to use the DROP= or KEEP= data set option for more flexibility. NO – the DROP and KEEP statements apply to all data sets being created.

53 The DROP= Data Set Option
The DROP= data set option can be used to exclude variables from an output data set. General form of the DROP= data set option: The specified variables are not written to the output data set; however, all variables are available for processing. SAS-data-set(DROP=variable-1 <variable-2 …variable-n>) This slide refers to OUTPUT data sets. Input data sets will be mentioned later.

54 Using the DROP= Data Set Option
data usa(drop=Street_ID Country) australia(drop=Street_ID State Country) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other; run;

55 The KEEP= Data Set Option
The KEEP= data set option can be used to specify which variables to write to an output data set. Only the specified variables are written to the output data set; however, all variables are available for processing. SAS-data-set(KEEP=variable-1 <variable-2 …variable-n>) This slide refers to OUTPUT data sets. Input data sets will be mentioned later.

56 Using the DROP= and KEEP= Options
data usa(keep=Employee_Name City State) australia(drop=Street_ID State) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other; run; If F DROP and KEEP are both present, Drop takes effect first.

57 The data set orion. employee_addresses contains nine variables
The data set orion.employee_addresses contains nine variables. How many variables will be in the usa, australia, and other data sets? data usa(keep=Employee_Name City State Country) australia(drop=Street_ID State Country) other; set orion.employee_addresses; if Country='US' then output usa; else if Country='AU' then output australia; else output other; run; 4, 6, 9

58 Controlling Variable Input
When the KEEP= data set option is associated with an input data set, only the variables specified are read into the PDV and therefore are available for processing. data usa; set orion.employee_addresses (keep=Employee_ID Employee_Name City State Postal_Code); <additional SAS statements> run; Only specified variables are in PDV PDV Employee_ID Employee_Name City State Postal_ Code

59 Controlling Variable Input
When the DROP= data set option is associated with an input data set, the specified variables are not read into the PDV, and therefore are not available for processing. data usa; set orion.employee_addresses (drop=Street_ID Street_Number Street_Name Country); <additional SAS statements> run; Dropped variables are not in PDV PDV Employee_ID Employee_Name City State Postal_ Code

60 Affects contents of PDV
Input SAS data set Raw data file Affects contents of PDV DROP= and KEEP= on an input data set PDV DROP and KEEP statements If a DROP or KEEP statement is used at the same time as a data set option, the statement is applied first. DROP= and KEEP= on an output data set Output SAS data set

61 What happens? data usa australia(drop=State) other;
set orion.employee_addresses (drop=Country Street_ID Employee_ID); if Country='US' then output usa; else if Country='AU' then output australia; else output other; run; Country is dropped on input, so it is not available for processing. SAS creates a COUNTRY variable when it compiles the statement: if country= This variable is initialized to missing and is never assigned a value. As a result, Country will never be US or AU, and all observations will be written to the data set other.

62 data usa australia(drop=State) other;
set orion.employee_addresses (drop=Country Street_ID Employee_ID); if Country='US' then output usa; else if Country='AU' then output australia; else output other; run; Country is dropped on input, and therefore it is not available for processing. Every observation is written to other. Country is dropped on input, so it is not available for processing. SAS creates a COUNTRY variable when it compiles the statement: if country= This variable is initialized to missing and is never assigned a value. As a result, Country will never be US or AU, and all observations will be written to the data set other.

63 Controlling Which Observations Are Read
By default, SAS processes every observation in a SAS data set, from the first observation to the last. The FIRSTOBS= and OBS= data set options can be used to control which observations are processed. FIRSTOBS= and OBS= are used with input data sets. You cannot use either option with output data sets. The FIRSTOBS= and OBS= data set options override the corresponding SAS system options.

64 The OBS= Data Set Option
The OBS= data set option specifies an ending point for processing an input data set. General form of OBS= data set option: This option specifies the number of the last observation to process, not how many observations should be processed. SAS-data-set(OBS=n) The OBS= data set option overrides the OBS= system option for the individual data set. To guarantee that SAS processes all observations from a data set, you can use the following syntax: SAS-data-set(OBS=MAX)

65 data australia; set orion.employee_addresses (obs=100); if Country='AU' then output; run;

66 The FIRSTOBS= Data Set Option
The FIRSTOBS= data set option specifies a starting point for processing an input data set. This option specifies the number of the first observation to process. FIRSTOBS= and OBS= are often used together to define a range of observations to be processed. SAS-data-set (FIRSTOBS=n) N specifies a positive integer that is less than or equal to the number of observations in the data set. The FIRSTOBS= data set option overrides the FIRSTOBS= system option for the individual data set. To guarantee that SAS processes all observations from the start of a data set, you can use the following syntax: SAS-data-set(FIRSTOBS=1)

67 Using OBS= and FIRSTOBS= Data Set Options
proc contents data=orion.employee_addresses; run; data australia; set orion.employee_addresses (firstobs=50 obs=100); if Country='AU' then output;

68 Using OBS= and FIRSTOBS= in a PROC Step
The FIRSTOBS= and OBS= data set options can also be used in SAS procedures. The PROC PRINT below begins processing at observation 10 and ends after observation 15. proc print data=orion.employee_addresses (firstobs=10 obs=15); var Employee_Name City State Country; run; These options can also be used in procedures

69 Adding a WHERE Statement
When FIRSTOBS= or OBS= and WHERE are used together: The subsetting WHERE is applied first The FIRSTOBS= and OBS= are applied to the resulting observations. The following step includes a WHERE statement and an OBS= option. proc print data=orion.employee_addresses (obs=10); where Country='AU'; var Employee_Name City Country; run; Using a WHERE and OBS= data set option. The subsetting occurs first.

70 An Aside -- proc surveyselect
proc surveyselect data=orion.employee_addresses method=srs n=10 out=simplerandomsample seed=12345; run; title "Simple random sample"; proc print data=simplerandomsample; var employee_id country;

71 proc surveyselect data=orion.employee_addresses
method=srs n=10 out=simplerandomsample seed=12345; where country="US"; run; title "Simple Random sample of US employees"; proc print data=simplerandomsample; var employee_id country;

72 Bootstrap sampling data tmp; input x y @@; datalines;
; run; proc print data=tmp;

73 A single boot strap sample
proc surveyselect data=tmp method=urs n=7 outhits out=tmp1(drop=numberhits) seed= ; run; proc print data=tmp1;

74 Multiple bootstrap samples
proc surveyselect data=tmp method=urs n=7 outhits out=tmp1(drop=numberhits) seed= reps=3; run; proc print data=tmp1;


Download ppt "Controlling Input and Output"

Similar presentations


Ads by Google