Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 11 Reading SAS Data

Similar presentations


Presentation on theme: "Chapter 11 Reading SAS Data"— Presentation transcript:

1 Chapter 11 Reading SAS Data
4 Chapter 11 Reading SAS Data Objectives: Create new SAS data set from existing SAS data sets Use BY groups to process observations Read observations by observation number – Direct access Stop processing when necessary Explicit write observations to output Understanding how SAS data sets are read.

2 Reading a Single SAS data set
The simplest syntax for reading a SAS data set is: Data sas-data-set; SET sas-data-set; In this chapter, a various of techniques are discussed for reading a SAS Data set: General syntax: DATA sas-data-set <(drop = keep = )>; SET sas-data-set <(drop= keep= firstobs= obs= ) END= POINT= >; BY Variables; SAS statements; RUN; NOTE: We have discussed DROP=, KEEP =, FIRSTOBS= and OBS= previously in the SET statement.

3 Controlling Variable Input using KEEP=, DROP=
Input SAS data set Raw Data File DROP= and KEEP= data set options in “read” statement PDV DROP= and KEEP= data set options associated with an output data set DROP and KEEP statements Output SAS data set

4 Example of using DROP=, KEEP= in both Data Statement and SET statement.
SAS applies data set options to input data sets before it evaluates programming statements applies data set options to output data sets. In the following example: mylib.military is the input SAS data set, and army is the output SAS data set. (drop = City State Country) applies to INPUT data set, mylib.military. These variables are not read from military data set. (keep=Code Airport) applies to OUTPUT data set, Army. data army (keep=Code Airport); set mylib.military (drop=City State Country); if Type eq 'Army' then output; run;

5 Example of selecting observations in SET statement
data army (keep=Code Airport); set mylib.military (drop=City State Country firstobs=5 obs=15); if Type eq 'Army' then output; run; NOTE: Only input observations 5 to 15 from mylib.military data set. Variables City, State and Country are not read from mylib.military data set. Variables Code and Airport are not written to army data set.

6 Finding the first and last observations in a group when reading SAS data set: Using BY Group with SET statement; Data pilotn (drop = state); Set mylib.pilots (drop=id city homephone); BY Jobcode; Run; NOTE: The BY Jobcode statement requires a PROC SORT; BY Jobcode; When BY Jobcode is executed, SAS creates two system variables: FIRST.jobcode: assigns value 1 when the first observation in the BY group is read, and 0 otherwise. LAST.jobcode: assigns value 1 when the LAST observation in the BY group is read, and 0 otherwise.

7 Am example resulted from Jobcode
Suppose Jobcode consists of FAT, PLT and MAT three distinct observations. DATA Pilotn; SET Mylib.pilot; BY Jobcode; Creates the following results: NOTE: First.jobcode, Last.Jobcode are not stored as variables in SAS data set. But, they are system variables. We can use them in the program. Jobcode Salary First.Jobcode Last.Jobcode FAT 1 MAT PLT

8 Why we need to know the first and last observation for each group of a Variable?
Example : A company wants to compute the running total of payroll for each Department and include it as a new variable in the SAS data set.

9 Exercise 1 Open the following program: c11_1 (see the next three slides). Run the first data step up to proc print; run; Check the results (this uses Length statement, and SELECT WHEN statement to define agegp variable.) Run the next part of the program (from Proc Sort statement to run; statement). Go to SAS LOG to check the results from the PUT statement to see the values of FIRST.AGEGP and LAST.AGEGP Run the next set of program statements from Data diab3 to the last statement. Check the results and make sure you understand how the results are obtained from this SAS program.

10 C11_1 program Data diabetes; set mylib.diabetes; length agegp $ 10; select; when (age < 40) agegp='YOUNG'; when (40<= age < 50) agegp='MIDDLE'; when (age >=50) agegp='SENIOR'; otherwise agegp='UNDEFINED'; end; proc print; run;

11 /* Use SET, BY together to create two system variables first.var, last.var */
proc sort data = diabetes out=diab_s; by agegp sex; run; Data diab2 (drop= weight height age); set diab_s (keep=Sex agegp age height weight pulse); by agegp sex; bmi=weight/height**2 * 703; put agegp= sex= pulse= bmi= first.agegp= last.agegp= first.sex= last.sex=; run; /*The above PUT statement output the values of the variables to the SAS LOG. */ /* Go to SAS LOG to see how the variables FIRST.AGEGP, LAST.AGEGP, FIRST.SEX and LAST.SEX are defined. */

12 /. The following IF-THEN DO statements use first. agegp and last
/*The following IF-THEN DO statements use first.agegp and last.agegp to control computing the average BMI of each age group. */ Data diab3 (drop= weight height age); set diab_s (keep=Sex agegp age height weight pulse); by agegp sex; bmi=weight/height**2 * 703; if first.agegp=1 then do; sumbmi=0; ncase=0; end; sumbmi+bmi; ncase+1; if last.agegp then do; avgbmi=sumbmi/ncase; output; end; run; proc print label; var agegp ncase avgbmi; label agegp='Age Grouping‘ ncase = 'Num of Cases‘ avgbmi = 'Average BMI';

13 OUTPUT statement OUTPUT; statement forces SAS to output an observation to the SAS data set before it hits the RUN; statement at the end of the data step. By default, SAS will output an observation when the process reaches RUN; statement. This is called ‘IMPLICIT’ output. In many situations, we need to output observations earlier in order to output more observations or observations different from the one to be output implicitly, we will need the EXPLICIT OUTPUT statement: Output; NOTE: Once OUTPUT is used, SAS will no longer use implicit output. Since each execution will only output data once. For Exercise 1, we want to output the average BMI for each age group, which is computed inside the DO loop. Therefore, we use OUTPUT; to create observations for each age group.

14 Additional Exercise Delete OUTPUT statement, run the program to check the results. Note you see there are many missing values for Average BMI and Age Grouping is reported for every individual observation. However, we only want the average BMI for each age group. OUTPUT statement helps us to accomplish it.

15 Reading Observations Using Direct Access: POINT = option in SET statement
In general, observations are read sequentially. Using POINT = option in SET statement, you can read specific observations directly. Simplest General Syntax: Data output-sas-dataset; Var_name = n; SET sas-data-set POINT= var_name; OUTPUT; STOP; RUN;

16 Some cautions when using POINT = var_name in the SET statement
NOTE1: the var_name is a temporary variable that defines the observation numbers to be read. NOTE2: STOP statement is used to stop the input after reading the observations specified by the Temporary variable in the SET statement. This is needed because the direct access to observations DOES NOT go to the end-of-file marker and it will continue to read the data set in unlimited loop. NOTE3: By default, SAS automatically output the observation to the SAS data set when it reaches the end of the DATA step. This is IMPLICIT output. However, when using POINT = , the execution has never reached the end of the DATA Step, therefore, the IMPLICIT output is not executed to output the observation. It requires EXPLICIT OUTPUT statement to output the observation one by one to the SAS data set, even when the end of the DATA Step is not reached.

17 Exercise 2 Open the following program c11_2 (see the next slide)
Run the program and check the results. Change point = obsnum in the SET statement to POINT = 5 and run the program. Observe the results NOTE: POINT = var_name. Var_name can not be a numeric constant such as 5. Comment out the OUTPUT; statement. Run the program and see the results. NOTE: There is no observation in the data set. Why?

18 C11_2 program proc print data = mylib.admit; run; Data admitn0; obsnum=5; SET mylib.admit point = obsnum; output; stop; Run; Proc print data=admitn0; run;

19 Example of reading the odd number of observations
Data admitn; Do ; Obsnum = _N_*2 -1; SET mylib.admit point = obsnum ; If obsnum > 21 then stop; Output; End; Run; Proc print; run;

20 Detecting the END of a Data Set using END = option in SET statement
To identify the end of a data set allows for creating specific variables or reports that will only be conducted after the end of the data set. For example, One may be interested in only writing to output data set the final observation that consists of the sum of payroll for each subgroup of Jobcode. General Syntax: DATA SAS-dataset; SET sasdataset END= Var_name; NOTE: Var_name is a temporary variable contains end-of-file marker. Var_name = 0 until SET statement reads the last observation of the sas data set. When end-of-file marker is reached, Var_name = 1. This Variable is not added to the sas data set. DO NOT use both POINT= and END = in the same SET statement.

21 Exercise 3 Open the following program c11_3
Add the PUT statement to see the values of lastcase before if statement: PUT ID = age = lastcase =; Run the program and go to SAS LOG to observe the values of ID, age and lastcase. Comment out the OUTPUT; statement, run the program and observe the results. Make sure you understand the difference between having OUTPUT and without OUTPUT statement.

22 C11_3 program data diabe5; set mylib.diabetes (keep= id Sex age height weight) end=lastcase; bmi=weight/height**2 * 703; sumbmi+bmi; ncase+1; if lastcase then do; avgbmi=sumbmi/ncase; output; end; run; /*Use lastcase as the indicator and only output the results of the data processing at the last data case. */ proc print label; var ncase avgbmi; label ncase = 'Num of Cases‘ avgbmi = 'Average BMI';

23 Understanding How SAS Data Sets Are Read
Previously, we discussed how SAS handles reading external raw data set. We pointed out the major differences are: When reading external raw data set, prior to reading a new observation, the PDV is initialized to missing. When reading SAS data set, the PDV is initialized only once when the first observation is read. The data values are retained afterwards except the new variables created in the Data Step. The following slides demonstrate how this process is executed when reading SAS data sets.

24 A Forecasting Application
The growth rate of each division of an airline is forecast in mylib.growth. If each of the five divisions grows at its respective rate for the next three years, what will be the approximate size of each division at the end of each of the three years? Partial Listing of mylib.growth Num Division Emps Increase APTOPS FINACE FLTOPS

25 A Forecasting Application
The output SAS data set, forecast, should contain 15 observations. Partial Listing of forecast New Division Increase Year Total APTOPS APTOPS APTOPS FINACE FINACE

26 Automatic Output By default, every DATA step contains an automatic OUTPUT statement at the end of each iteration. This automatic OUTPUT statement tells the SAS System to write observations to the data set or data sets that are created. data forecast; set mylib.growth; <additional SAS statements>; run; 1. Automatic output 2. Automatic return Automatic output to forecast sas data set

27 The OUTPUT Statement The explicit OUTPUT statement writes the current contents of the PDV to a SAS data set. Placing an explicit OUTPUT statement in a DATA step overrides the automatic output, and SAS adds an observation to a data set only when an explicit OUTPUT statement is executed. OUTPUT <SAS-data-set-1 …SAS-data-set-n>;

28 A Forecasting Application
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run;

29 Explicit Output: Compilation
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; PDV NumEmps N 8 Increase Division $ 6 ...

30 Explicit Output: Compilation
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; PDV NumEmps N 8 Increase Division $ 6 Year N 8 ...

31 Explicit Output: Compilation
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; PDV NewTotal N 8 Year NumEmps Increase Division $ 6 ...

32 Explicit Output: Compilation
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; PDV NewTotal N 8 Year NumEmps Increase Division $ 6 D ...

33 Initialize PDV to missing
Explicit Output: Execution data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Initialize PDV to missing Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 . D . . . ...

34 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 . D APTOPS 205 0.075 . ...

35 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 . D APTOPS 205 0.075 1 ...

36 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth 205*( ) PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 220.38 D APTOPS 205 0.075 1 ...

37 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 220.38 D APTOPS 205 0.075 1 Write first observation to forecast. ...

38 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 220.38 D APTOPS 205 0.075 2 ...

39 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth 220.38*( ) PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 236.90 D APTOPS 205 0.075 2 ...

40 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 236.90 D APTOPS 205 0.075 2 Write second observation to forecast. ...

41 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 236.90 D APTOPS 205 0.075 3 ...

42 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth 236.90*( ) PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 254.67 D APTOPS 205 0.075 3 ...

43 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 254.67 D APTOPS 205 0.075 3 Write third observation to forecast. ...

44 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV No automatic output Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 254.67 D APTOPS 205 0.075 3 ...

45 Explicit Output: Execution
data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Automatic return Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 254.67 D APTOPS 205 0.075 3

46 Explicit Output: Execution
Contents of the FORECAST data set after the first execution of the DATA step: Division Increase Year NewTotal APTOPS APTOPS APTOPS

47 Reinitialize PDV to missing
Explicit Output: Execution data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth PDV Reinitialize PDV to missing Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 . D APTOPS 205 0.075 . ...

48 Continue executing DATA step until all observations
Explicit Output: Execution data forecast; drop NumEmps; set mylib.growth; Year=1; NewTotal=NumEmps*(1+Increase); output; Year=2; NewTotal=NewTotal*(1+Increase); Year=3; run; Num Division Emps Increase APTOPS FINACE FLTOPS mylib.growth Continue executing DATA step until all observations from mylib.growth are read PDV Division $ 6 NumEmps N 8 Increase N 8 Year N 8 NewTotal N 8 . D FINACE 198 0.040 .

49 Explicitly Controlling Output
Partial Log NOTE: There were 5 observations read from the data set MYLIB.GROWTH. NOTE: The data set WORK.FORECAST has 15 observations and 4 variables.

50 Explicitly Controlling Output
proc print data=forecast noobs; format NewTotal 6.; run; Partial PROC PRINT Output New Division Increase Year Total APTOPS APTOPS APTOPS FINACE FINACE


Download ppt "Chapter 11 Reading SAS Data"

Similar presentations


Ads by Google